Consistent Logging #3152

pydctw · 2022-02-04T18:01:52Z

I've found that the use of log format and logging levels are inconsistent in the repo.

A few examples

Different log levels are used across reconcilers.

s.scope.V(2).Info("Reconciling network for cluster", "cluster-name", s.scope.Name(), "cluster-namespace", s.scope.Namespace())
s.scope.V(2).Info("Reconciling VPC")
s.scope.Info("Reconciling subnets")
s.scope.V(2).Info("Reconciling security groups")

Log formats are different for the main controllers.

I0203 23:10:34.968187    1165 awsmachine_controller.go:414]  "msg"="Reconciling AWSMachine" 

I0203 23:12:40.884859    1165 awscluster_controller.go:202] controller/awscluster "msg"="Reconciling AWSCluster" "cluster"="quick-start-cghvfu" "name"="quick-start-cghvfu" "namespace"="quick-start-d0tfno" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"

Logging for AWS resource creation/deletion is inconsistent. The reason for using V(0) for Deleted association between route table and subnet while using V(2) for Deleted security group is not clear to me.

s.scope.Info("Deleted association between route table and subnet", "route-table-id", *rt.RouteTableId, "subnet-id", *as.SubnetId)
s.scope.V(2).Info("Deleted security group", "security-group-id", sg.ID, "kind", typ)

See #3143.

What do you expect?

We should have a guideline/documentation about logging format and level to improve consistency.
Create follow up issues if there are improvements that can be implemented.

Related Issues

3001

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-02-04T18:01:59Z

@pydctw: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

richardcase · 2022-02-07T10:09:11Z

@pydctw - we also have #3101 which is related. I know @Skarlso was looking at standardizing log levels as part of that work. It would be worth you both chatting.

Skarlso · 2022-02-07T10:25:56Z

Oh nice! I have been working on a POC to replace that stuff. It's a bit difficult to find out though which was supposed to do what. And it will be a rather large change I'm afraid.

pydctw · 2022-02-07T20:53:17Z

@Skarlso, I would love to hear what you are thinking. Let's sync up.

Skarlso · 2022-02-07T20:55:13Z

@pydctw Sure, will write up a detailed POC and link a WIP PR tomorrow. :)

sedefsavas · 2022-02-09T18:19:07Z

We should also consider adding JSON support during this refactoring, which is being discussed in CAPI:
kubernetes-sigs/cluster-api#5571 (comment)

Skarlso · 2022-02-10T08:00:24Z

@sedefsavas I would rather do a single thing / PR. This will be a bit of a pain as is anyways. :D I worked out a clean API for logging but I have to replace a LOT of log lines. :))))) We can do it in a followup PR? :)

Skarlso · 2022-02-10T08:01:45Z

Hmm, I read the whole thing. I guess I can use a different logger, like zerolog?

Skarlso · 2022-02-10T19:20:07Z

@pydctw added the WIP here: #3191 with a description and an API I'm suggesting. :)

sedefsavas · 2022-02-10T19:36:15Z

Thanks for making progress on this @Skarlso

Since we are doing refactoring, it might be a good time to think about having consistency with cluster-api so that users won't have to grep differently to collect logs from different managers in the same management cluster.

It might worth further discussing this including other providers to see if a common approach could be adopted instead of having a custom logic here.
cc @sbueringer given that you started a discussion already during community meeting.

Skarlso · 2022-02-10T19:49:37Z

For sure, I'm open for suggestions as to which logger to adopt. :) I can use whatever we want as long as it's consistent with everything else, I guess.

Skarlso · 2022-02-10T19:51:40Z

To be clear, the reason for the clear interface is, of course, abstraction. I think it's sane to keep some kind of common interface and use whatever we want in the background. I would like to avoid having to change the logger again in so many places if that's okay. We can change the implementation underneath it easily if we keep a sane common interface.

sbueringer · 2022-02-10T20:34:26Z

Hey,
just a short info for now. In core CAPI we are only (or at least in almost all cases) logging with the standard log interface used in controller-runtime (which is logr.Logger).

So we don't really have our own log interface/implementation in core CAPI like CAPA has (as far as I know).

Core CAPI currently uses klog as underlying implementation of that interface (here we set the klogr as controller-runtime logger).

We are currently thinking about changing that underlying implementation to component-base/logs (which is used in kube-apiserver, kube-scheduler, ...). The effect of that is that as underlying implementation (behind logr.Logger) we would still be logging via klog, but in case the JSON logging format is enabled we are using the JSON logger implementation from component-base/logs. We are also inheriting the log flags from component-base/logs.

I think the best summary from a core CAPI perspective is this comment: kubernetes-sigs/cluster-api#5571 (comment)

To be honest, I don't know the reason why CAPA currently has an own log interface or the requirements why that is (probably) necessary.

Skarlso · 2022-02-10T20:38:40Z

@sbueringer Hey Stefan. We didn't have one until now. :) CAPA is also using klog and context logger and logr.

However, it's not very user friendly. It's difficult to grep for keywords, and the notion of doing this: log.V(5).Info is just not very developer friendly. And they you'll have to remember at which log level you used what output.

I was in the process of making a unified interface which is still using logr in the background, but at least from logging perspective, we use things like log.Debug, log.Info, log.Warning, and then just add a grep-able key value pair to it.

sbueringer · 2022-02-11T08:20:49Z

Ah, got your point.

So it's about having a thin layer on top of go-logr which essentially provides the usual info/debug/... log levels instead of V(i) (in the code as well as an additional "level": "x", k/v pair). I think I need more time to think this over :). We were kind of happy that we potentially get rid of our own thin layer on top of go-logr we've used in the ClusterClass implementation.

cc @fabriziopandini

P.S. I think this is orthogonal to replacing the logger "below" the interface, but I think it's relevant regarding some other log improvements we're thinking about to make in core CAPI.

fabriziopandini · 2022-02-11T14:41:03Z

I don't have strong opinions about logging interfaces, because all of them have both pros and cons.
However, I'm strongly in favor of consistency across the ecosystems and in CAPI we are using the following documents as a north star

Skarlso · 2022-02-11T14:54:53Z

Regarding this bit:

klog.InfoS() has multiple levels:
- klog.V(0) - Generally useful for this to ALWAYS be visible to an operator
  - Programmer errors
  - Logging extra info about a panic
  - CLI argument handling
- klog.V(1) - A reasonable default log level if you don't want verbosity.
  - Information about config (listening on X, watching Y)
  - Errors that repeat frequently that relate to conditions that can be corrected (pod detected as unhealthy)
- klog.V(2) - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
  - Logging HTTP requests and their exit code
  - System state changing (killing pod)
  - Controller state change events (starting pods)
  - Scheduler log messages
- klog.V(3) - Extended information about changes
  - More info about system state changes
- klog.V(4) - Debug level verbosity
  - Logging in particularly thorny parts of code where you may want to come back later and check it
- klog.V(5) - Trace level verbosity
  - Context to understand the steps leading up to errors and warnings
  - More information for troubleshooting reported issues

I'm struggling to see the benefit of this. :) You trade readability with what exactly? Having switches like --v 4 or --v 3 asks the user to remember at what level a logging should occur as opposed to something like --v debug or --v trace which are more easy to remember. And in the code, you'll have to explicitly remember this mapping. Why put this burden on the user AND the developer too? I've read Dave's post, and I understand what he means by proliferation of information in the logs. But in this case then, there is no point in using these either. :) Also, these log entries are anything but grepabble or searchable for information such as, names, actions, filtering for errors, logs, relevant information. But maybe there is a point to make to not do that at all, or not care at all. Which would be more inline with Dave's post.

And I've read the other point why klog disagrees with Dave, but I don't actually like that direction because it's not very human friendly, I believe.

In any case. I bow my head to conform, but I feel like I had to share my opinion about the matter and why I wrote a wrapper interface in the first place.

fabriziopandini · 2022-02-11T16:48:18Z

@Skarlso I personally appreciate and I'm truly thankful for your opinion, and I don't have problems in saying that there are a couple of points I agree with. My intent above was just to provide some context on how we are tackling this problem in CAPI. The final call here is clearly up to the CAPA community, not to me.

sedefsavas · 2022-02-11T17:19:51Z

@Skarslo I have some concerns regarding diverging from the cluster-api ecosystem. But I think it'd be worth to discuss this during our community meeting on Feb 21st at 9 am PT.

sedefsavas · 2022-02-11T17:22:24Z

Thanks all for chiming in and sharing your opinions on this.

Skarlso · 2022-02-11T18:12:46Z

Cool, and thanks to all! 😊👍

sbueringer · 2022-02-17T10:26:10Z

I'm with Fabrizio on that one. I also see some benefit in other log levels, but I prefer to stay consistent with the Kubernetes ecosystem.

We also have to take into account that we are using a lot of upstream libraries like controller-runtime and client-go. They are all logging with the .V(x) option. I'm not sure if client-go is doing it already but they at least want to use the logger we pass into calls with ctx (https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging). Of course we could pass in a logger which translates the ~ V(0)-V(10) messages to our log levels, but I personally would really prefer not doing that in favor of conformance.

But I also agree with Fabrizio, it's a good discussion to have, it's just my opinion and I think it's something that the CAPA community should decide.

pydctw · 2022-03-08T20:44:12Z

@Skarlso, we wanted to discuss this issue during yesterday's CAPA office hour but since you were not there, following up on the issue.

This is my understanding reading comments above but to confirm - you are ok for CAPA to follow k8s ecosystem's logging convention of using V(0)-V(10) messages instead of a logging interface from the PoC?

Skarlso · 2022-03-10T09:42:33Z

Thanks, sorry, I was out on vacation. :) will take a look.

sedefsavas · 2022-03-11T19:04:18Z

A related issue:
#2944

Skarlso · 2022-03-13T19:13:28Z

@sedefsavas Okay, so... I'm okay with following this convention, but that will result in a not very searchable log entry which is what the initial problem also concluded. So I will have to wrap it in something that sets up searchable entries anyways. :)

richardcase · 2022-03-15T14:51:28Z

And even if we follow the verbosity levels, we still need guidance and constants in the code that help people know which verbosity level they should use for their situation.

Skarlso · 2022-03-15T19:08:39Z

Absolutely.

richardcase · 2022-03-16T08:43:51Z

@Skarlso - will ping you on slack.

k8s-triage-robot · 2022-06-14T09:08:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

richardcase · 2022-06-14T09:18:12Z

/remove-lifecycle stale

k8s-triage-robot · 2022-09-12T10:02:13Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-10-12T10:18:34Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Skarlso · 2022-10-12T10:25:43Z

/remove-lifecycle rotten

Okay, so now that we conform more with CAPI and use proper logging formats and initiation, I think we can refactor some internal things and have a unified logging interface. Whether that's just a bunch of const-ants like:

    const (
        logLevelInfo  = iota
        logLevelError
        logLevelWarning
        logLevelDebug
        logLevelTrace
    )

or a complete interface like using Debug instead of V(logLevelDebug) is for us to decide. :)

richardcase · 2022-10-12T14:04:42Z

I like the idea of Debug() instead of V(logLevelDebug) personally.

Skarlso · 2022-10-12T14:12:51Z

I second that.

Skarlso · 2022-10-13T07:25:56Z

/assign

k8s-ci-robot added needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 4, 2022

pydctw mentioned this issue Feb 4, 2022

Make logging for creating/deleting AWS resources consistent #3144

Merged

4 tasks

sedefsavas added the kind/documentation Categorizes issue or PR as related to documentation. label Feb 4, 2022

sedefsavas added this to the v1.x milestone Feb 4, 2022

sedefsavas added priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 4, 2022

sedefsavas mentioned this issue Feb 9, 2022

Include searchable severity level in log lines #3101

Closed

Skarlso mentioned this issue Feb 10, 2022

[WIP] Adding a grep-able logger interface #3191

Closed

4 tasks

sedefsavas mentioned this issue Mar 11, 2022

Migrate to updated go-logr interface #2944

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2022

Ankitasw mentioned this issue Sep 12, 2022

Align logging practices with cluster-api #3722

Merged

4 tasks

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 12, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 12, 2022

k8s-ci-robot assigned Skarlso Oct 13, 2022

Skarlso mentioned this issue Oct 13, 2022

Unify the logger interface in CAPA #3776

Merged

4 tasks

k8s-ci-robot closed this as completed in #3776 Oct 25, 2022

Consistent Logging #3152

Consistent Logging #3152

Comments

pydctw commented Feb 4, 2022 • edited

What do you expect?

Related Issues

k8s-ci-robot commented Feb 4, 2022

richardcase commented Feb 7, 2022

Skarlso commented Feb 7, 2022

pydctw commented Feb 7, 2022

Skarlso commented Feb 7, 2022

sedefsavas commented Feb 9, 2022

Skarlso commented Feb 10, 2022

Skarlso commented Feb 10, 2022

Skarlso commented Feb 10, 2022

sedefsavas commented Feb 10, 2022

Skarlso commented Feb 10, 2022

Skarlso commented Feb 10, 2022

sbueringer commented Feb 10, 2022 • edited

Skarlso commented Feb 10, 2022

sbueringer commented Feb 11, 2022 • edited

fabriziopandini commented Feb 11, 2022

Skarlso commented Feb 11, 2022 • edited

fabriziopandini commented Feb 11, 2022

sedefsavas commented Feb 11, 2022 • edited

sedefsavas commented Feb 11, 2022

Skarlso commented Feb 11, 2022

sbueringer commented Feb 17, 2022 • edited

pydctw commented Mar 8, 2022

Skarlso commented Mar 10, 2022

sedefsavas commented Mar 11, 2022

Skarlso commented Mar 13, 2022

richardcase commented Mar 15, 2022

Skarlso commented Mar 15, 2022

richardcase commented Mar 16, 2022

k8s-triage-robot commented Jun 14, 2022

richardcase commented Jun 14, 2022

k8s-triage-robot commented Sep 12, 2022

k8s-triage-robot commented Oct 12, 2022

Skarlso commented Oct 12, 2022

richardcase commented Oct 12, 2022

Skarlso commented Oct 12, 2022

Skarlso commented Oct 13, 2022

pydctw commented Feb 4, 2022 •

edited

sbueringer commented Feb 10, 2022 •

edited

sbueringer commented Feb 11, 2022 •

edited

Skarlso commented Feb 11, 2022 •

edited

sedefsavas commented Feb 11, 2022 •

edited

sbueringer commented Feb 17, 2022 •

edited