Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: dual logging implementations in Controller Manager #1893

Closed
1 of 4 tasks
shaneutt opened this issue Oct 4, 2021 · 13 comments · Fixed by #4688
Closed
1 of 4 tasks

Spike: dual logging implementations in Controller Manager #1893

shaneutt opened this issue Oct 4, 2021 · 13 comments · Fixed by #4688

Comments

@shaneutt
Copy link
Member

shaneutt commented Oct 4, 2021

Is there an existing issue for this?

  • I have searched the existing issues

Problem Statement

As an interim fix to bridge the gap between our previous use of logrus and controller manager's requirements on using go-logr we implemented using both to reduce the changes for the 2.0 release.

We ended up using logrusr as middleware to allow us to allow for both implementations, but this leaves something to be desired in terms of code maintenance.

Proposed Solution

  • implement clear and documented loglevels for go-logr which match our current --log-level controller manager flag
  • remove all logrus implementations in favor of a raw go-logr implementation throughout the codebase

Additional information

It's theoretically reasonable to do the opposite of the proposed solution, as long as we end up with one clear and documented method and we establish all of our implementation requirements and ensure they are met.

Acceptance Criteria

  • as a contributor I have one clear and documented method available to me to emit and manage controller manager logs
@shaneutt shaneutt added priority/low area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Oct 4, 2021
@mflendrich mflendrich added area/debt onboarding and removed area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Mar 10, 2022
@jrsmroz jrsmroz self-assigned this Jun 1, 2022
@jrsmroz
Copy link
Contributor

jrsmroz commented Jun 3, 2022

I need a clarification whether we want to continue using logrus as a n implementation of logr or switch to something entirely different.

@shaneutt
Copy link
Member Author

shaneutt commented Jun 3, 2022

At this point it would seem best that we simply use sigs.k8s.io/controller-runtime/pkg/log which is a relevant go-logr implementation that is in use elsewhere in our ecosystem, and then ultimately drop logrus.

Here's the simplest entrypoint for creating these loggers: https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.1/pkg/log#FromContext

@jrsmroz
Copy link
Contributor

jrsmroz commented Jun 6, 2022

I've done some research and it seems sigs.k8s.io/controller-runtime/pkg/log actually is just an utility. That package delegates logging to the other implementation of go-logr that needs to be set by caller Reference in code

That means we still need to setup some kind of logger. There is klog in kubernetes, but it has different log format that logrus .

logrus:

time="2022-06-06T17:39:35+02:00" level=info msg="can't find port for backend kubernetes service default/foo-svc: no suitable port found" service_name=default.foo-svc.80

klog:

I0606 18:20:09.751630     850 parser.go:311] "can't find port for backend kubernetes service default/foo-svc: no suitable port found" service_name="default.foo-svc.80"

zap

2022-06-06T22:45:18.793+0200	INFO	parser/parser.go:311	can't find port for backend kubernetes service default/foo-svc: no suitable port found	{"service_name": "default.foo-svc.80"}

We might as well consider other packages, like zap

The other thing I've noticed is that go-logr does not provide a way to include rich verbosity levels in a log message itself. V levels are used for filtering, but are not part of the log message by a contract. As a result libraries are free to decide how to handle that. logrusr middleware that adapts logrus into go-logr not only squashes log levels 0-4 for filtering out messages, it actually prints everything as a info loglevel (level=info) or error (level=info).

I initially planned to replace logrus.FieldLogger with logr.Logger in order to have only one logging interface. But the result is lost of error level information from the log itself.

@shaneutt
Copy link
Member Author

shaneutt commented Jun 6, 2022

I've done some research and it seems sigs.k8s.io/controller-runtime/pkg/log actually is just an utility. That package delegates logging to the other implementation of go-logr that needs to be set by caller Reference in code

Perhaps I've missed something, but it also provides log.FromContext(ctx) which will use a default implementation of go-logr's basic Logger (and is what's being used by default in the operator controllers at the moment). Not necessarily trying to sway you that this is what we should use, but more just for your considerations.

That means we still need to setup some kind of logger. There is klog in kubernetes, but it has different log format that logrus .

logrus:

time="2022-06-06T17:39:35+02:00" level=info msg="can't find port for backend kubernetes service default/foo-svc: no suitable port found" service_name=default.foo-svc.80

klog:

I0606 18:20:09.751630 850 parser.go:311] "can't find port for backend kubernetes service default/foo-svc: no suitable port found" service_name="default.foo-svc.80"

We might as well consider other packages, like zap

The other thing I've noticed is that go-logr does not provide a way to include rich verbosity levels in a log message itself. V levels are used for filtering, but are not part of the log message by a contract. As a result libraries are free to decide how to handle that. logrusr middleware that adapts logrus into go-logr not only squashes log levels 0-4 for filtering out messages, it actually prints everything as a info loglevel (level=info) or error (level=info).

I initially planned to replace logrus.FieldLogger with logr.Logger in order to have only one logging interface. But the result is lost of error level information from the log itself.

I appreciate the explanation of some of the various drawbacks with different options. At this point are you simply reporting where you're at so far and there's more investigation to be done, do you have a preferred direction from here, or are you looking for input to move forward?

@jrsmroz
Copy link
Contributor

jrsmroz commented Jun 7, 2022

I don't think it should be implemented before 3.0.

Numeric V levels based logging as implemented in go-logr don't really go well with kic named levels. For example go-logr V levels are additive, while named levels are not. But there is more to consider.

Changing logging semantics or format (as a result of using a new library) can be considered a breaking change for log consumers.

I propose we create a document describing current logging state in kic, desired logging state, possible solutions. Then we review it and implement for 3.0.

@shaneutt
Copy link
Member Author

shaneutt commented Jun 7, 2022

I agree, this is going to get messy if we're not allowed to make some backwards incompatible changes. Let's hold for v3 👍
Also I personally think it's OK to hold on writing up a doc, we can just come back to the context of this issue when we're ready, but if you feel differently please feel free to say so.

@shaneutt shaneutt added area/maintenance (breaking) Maintenance requiring backwards-incompat changes (used to track the need for a new major release) and removed area/debt labels Jun 7, 2022
@shaneutt shaneutt added this to the Release v3.0.0 milestone Aug 12, 2022
@mflendrich mflendrich changed the title Resolve dual logging implementations in Controller Manager Spike: dual logging implementations in Controller Manager Nov 8, 2022
@mflendrich
Copy link
Member

This item requires understanding of the cost-to-benefit trade-off. Therefore, repurposed it as a spike.

The desired output of a spike is a recommendation "yes, this will pay off" vs "no, too much work for unclear gain or clear but small gain"

@rainest
Copy link
Contributor

rainest commented Sep 14, 2023

Idle since July so I'm stealing it. https://gist.github.com/rainest/4d82f944416903f448d49c52ac4ff5cb is a spike on the status quo and cost estimation. Major takeaways:

  • LoC work estimate is hard because it's hard to find logrus lines. Info(, Error(, Errorf( and the like appear in plenty of other places. Migration of individual calls is simple, just tedious.
  • The noted issue with log level display for logr currently appears to be a limitation of logrusr. It can possibly be fixed by forking that lib and modifying it or switching to some other engine to see if they handle it better.
  • If we don't switch to logr entirely, we should arguably stop using logrus directly for new code. Doing so will make any future transition more tedious.

I recommend working on conversion to logr for a day or so to see how far we get. It's probably not that hard once we actually start doing it. We should see if we can address the level printing issue with logrusr modifications or alternate logr backends first. If alternate backends fix it, the benefit of going all logr is greater.

@pmalek
Copy link
Member

pmalek commented Sep 14, 2023

2 cents from my side:

  • shared go has some utilities that we might want to look at https://github.com/Kong/shared-go/tree/main/logr-utils. They don't seem documented or too robust but they are there so we might want to at least consider using or expanding them for our use case (I'm not advocating for this but just leaving some information on this topic here)

  • I'd suggest to consider an approach like described in https://blog.verygoodsoftwarenotvirus.ru/posts/interfaces/ where we'd just use interface(s), and could then swap implementations as we please. This would IMO prove much more robust in the long run than what we're dealing currently (and what this issue treats about).

    We could then maintain our wrappers for particular implementations to those interfaces.

  • last but not least: there's still https://go.dev/blog/slog to consider as well.

Having said that I do realize those are all different than proposed or rather already used logging packages but while at it we might just do it right and at least think about those.

@rainest
Copy link
Contributor

rainest commented Sep 15, 2023

logr is an interface that supports multiple backends. Our problem is that not everything uses it, and that we keep adding new things that don't use it, and just use logrus directly. We may as well use it rather than rolling our own insofar as controller-runtime uses it and we don't have known things we want that it cannot handle.

The zapr wrapper does apparently fix the level problem out of the box. With our current logrusr:

time="2023-09-15T21:21:22Z" level=info msg="reconciling resource" NetV1Ingress="{\"Namespace\":\"default\",\"Name\":\"httpbin\"}" logger=controllers.Ingress.netv1 name=httpbin namespace=default

with the laziest possible zap replacement

-       logger := logrusr.New(deprecatedLogger)
+       z, _ := zap.NewDevelopment()
+       logger := zapr.NewLogger(z)

we get:

2023-09-15T21:17:26.070Z    DEBUG   controllers.Ingress.netv1   configuration/zz_generated_controllers.go:375   reconciling resource    {"NetV1Ingress": {"name":"httpbin","namespace":"default"}, "namespace": "default", "name": "httpbin"}

from

https://github.com/Kong/kubernetes-ingress-controller/blob/v2.11.1/internal/controllers/configuration/zz_generated_controllers.go#L356

So it at least solves that problem. We have some KIC-specific glue that apparently approximates the shared-go level int constants, but the main task is to migrate the exclusively logrus stuff anyway.

@rainest
Copy link
Contributor

rainest commented Sep 16, 2023

From attempting to actually do the conversion, starting "somewhere in the parser" was probably a poor choice since its logger gets shared to basically everything under internal/dataplane. As a brief example, the failures collector conversion looks like https://gist.github.com/rainest/4d82f944416903f448d49c52ac4ff5cb#file-failures-diff

This is a good example since it tests its log output. It does fail on the log level due to the "everything converts to info" problem:

17:45:22-0700 esenin $ go test ./internal/dataplane/failures                      
--- FAIL: TestResourceFailuresCollector (0.00s)
    --- FAIL: TestResourceFailuresCollector/does_not_crash_but_logs_warning_when_no_causing_objects_passed (0.00s)
        failures_test.go:101: 
            	Error Trace:	/home/traines/src/ingress-controller/internal/dataplane/failures/failures_test.go:101
            	Error:      	Not equal: 
            	            	expected: 0x3
            	            	actual  : 0x4
            	Test:       	TestResourceFailuresCollector/does_not_crash_but_logs_warning_when_no_causing_objects_passed
FAIL
FAIL	github.com/kong/kubernetes-ingress-controller/v2/internal/dataplane/failures	0.010s
FAIL

Log testing isn't that common, but requires using the underlying test library. logr does have a testing implementation, but it looks like that's for outputting to the test log rather than testing log content. If we want to convert to something other than logrus (for zap, using https://pkg.go.dev/go.uber.org/zap@v1.26.0/zaptest/observer), we should probably do it in one go.

Our shared-go is currently private, so I don't think we can use it unless we intend to open-source it. Maybe we can? It looks like it mostly adds the ability to handle both logging and tracing spans from the context, and uses zap for the logging portion. I don't think that we have any requests for tracing, and it's not obviously useful for KIC's role (we aren't in the middle of a request handling pipeline), so we could probably live without that.

Istio does provide an Apache-licensed klog shim that shunts it to zap if we want to try and deal with the client-go logs. Between that, the existing logr log level fix, and it being actively maintained and fairly ubiquitous, I'd say go with zap under go-logr.

@rainest
Copy link
Contributor

rainest commented Sep 22, 2023

zap's default encoders don't support anything that's exactly like the logrus text format with key=value pairs, and instead tab deliminates field values without printing keys.

logrus text:

time="2023-09-21T23:07:26Z" level=info msg="the ingress class name has been set" logger=setup value=kong

zap text:

2023-09-22T00:38:16.026Z        info    setup   the ingress class name has been set     {"value": "kong"}

There is an inactive PR that's asked to add this, but it looks like it likely won't make it into the core. I suspect that we could pull that in with https://pkg.go.dev/go.uber.org/zap#RegisterEncoder if we wanted. If not, the zap argument is that anyone that wants to actually parse the logs should be using JSON (or some other computer-readable format), which isn't really wrong--logrus default is maybe more computer readable, though I don't know if it's a proper format.

logrus JSON:

{"level":"info","logger":"setup","msg":"the ingress class name has been set","time":"2023-09-21T23:15:15Z","value":"kong"}

zap JSON:

{"level":"info","time":"2023-09-22T00:28:13.006Z","logger":"setup","msg":"the ingress class name has been set","value":"kong"}

@rainest
Copy link
Contributor

rainest commented Sep 22, 2023

Level handling actually seems a bit off in zapr. Regular zap levels start at -1 (debug), and then go up through 0 (info), 1 (warn), 2 (error), and so on.

logr's V() raises any value <0 to 0.

zapr Info() flips the sign of the V-level to determine the zap level. Since -1 is the only negative integer value that maps to a zap level, V(1) uses debug, but higher verbosity uses info (unknown values default to info), which seems broken. zapr Error() hard-codes the error level.

The negative levels also don't work great with the basic custom check function I had to add for custom output since an n >= zapLevel discards most negative values at the most verbose possible level.

zapr had a request to allow mapping V levels to arbitrary zap levels, but did not fulfill it. The recommendation there is sorta to actually drop the native zap level field, since it doesn't really correspond to the expectations of V levels. controller-runtime has no expectations around semantic levels, so any added mapper wouldn't really work there.

We weren't using WARN that often (there were some existing logr calls using it, but those were already getting squashed to INFO), so converting to logging the actual V level instead shouldn't be too much of a break:

$ git grep -rE "Warn\(|Warnf\(" 
internal/dataplane/kong_client.go:                      c.logger.WithError(err).Warn("Skipped pushing configuration to Konnect")
internal/dataplane/kong_client.go:                      c.logger.WithError(err).Warn("Failed pushing configuration to Konnect")
internal/dataplane/kong_client.go:                      logger.Warn("exceeded Kong API timeout, consider increasing --proxy-timeout-seconds")
internal/dataplane/kongstate/route.go:          }).Warn("KongIngress annotation is not allowed on Gateway API objects.")
internal/dataplane/kongstate/service.go:                                        Warn("KongIngress annotation is not allowed on Services " +
internal/store/store.go:                        s.logger.Warnf("listIngressesV1: dropping object of unexpected type: %#v", item)
internal/store/store.go:                        s.logger.Warnf("listIngressClassesV1: dropping object of unexpected type: %#v", item)
internal/store/store.go:                        s.logger.Warnf("listIngressClassParametersV1alpha1: dropping object of unexpected type: %#v", item)

controller-runtime mostly logs at either 0 (info) or 1 (debug), though there are a smattering of V4 and V5 lines.

@mflendrich mflendrich added breaking change and removed area/maintenance (breaking) Maintenance requiring backwards-incompat changes (used to track the need for a new major release) labels Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants