Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to improve the readability of the log #14

Closed
denkensk opened this issue Feb 18, 2022 · 26 comments · Fixed by #448
Closed

Need to improve the readability of the log #14

denkensk opened this issue Feb 18, 2022 · 26 comments · Fixed by #448
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@denkensk
Copy link
Member

1.6451684909657109e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
1.6451684909663508e+09	INFO	setup	starting manager
1.6451684909665146e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
1.645168490966593e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0218 07:14:51.066639       1 leaderelection.go:248] attempting to acquire leader lease kueue-system/c1f6bfd2.gke-internal.googlesource.com...
I0218 07:15:07.705977       1 leaderelection.go:258] successfully acquired lease kueue-system/c1f6bfd2.gke-internal.googlesource.com
1.6451685077060497e+09	DEBUG	events	Normal	{"object": {"kind":"ConfigMap","namespace":"kueue-system","name":"c1f6bfd2.gke-internal.googlesource.com","uid":"e70e4b9b-54f4-4782-a904-e57d3001c8e6","apiVersion":"v1","resourceVersion":"264201"}, "reason": "LeaderElection", "message": "kueue-controller-manager-7ff7b759bf-nszmb_05445f7f-a871-4a4c-83c1-af075b850e49 became leader"}
1.6451685077061899e+09	DEBUG	events	Normal	{"object": {"kind":"Lease","namespace":"kueue-system","name":"c1f6bfd2.gke-internal.googlesource.com","uid":"72b48bf0-20e0-42a4-823b-2a6edcb3288a","apiVersion":"coordination.k8s.io/v1","resourceVersion":"264202"}, "reason": "LeaderElection", "message": "kueue-controller-manager-7ff7b759bf-nszmb_05445f7f-a871-4a4c-83c1-af075b850e49 became leader"}
1.6451685077062488e+09	INFO	controller.queue	Starting EventSource	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Queue", "source": "kind source: *v1alpha1.Queue"}
1.645168507706281e+09	INFO	controller.queue	Starting Controller	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Queue"}
1.6451685077062566e+09	INFO	controller.queuedworkload	Starting EventSource	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "QueuedWorkload", "source": "kind source: *v1alpha1.QueuedWorkload"}
1.6451685077063015e+09	INFO	controller.queuedworkload	Starting Controller	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "QueuedWorkload"}
1.6451685077062776e+09	INFO	controller.capacity	Starting EventSource	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Capacity", "source": "kind source: *v1alpha1.Capacity"}
1.6451685077063189e+09	INFO	controller.capacity	Starting Controller	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Capacity"}
1.6451685077064047e+09	INFO	controller.job	Starting EventSource	{"reconciler group": "batch", "reconciler kind": "Job", "source": "kind source: *v1.Job"}
1.6451685077064307e+09	INFO	controller.job	Starting EventSource	{"reconciler group": "batch", "reconciler kind": "Job", "source": "kind source: *v1alpha1.QueuedWorkload"}
1.6451685077064393e+09	INFO	controller.job	Starting Controller	{"reconciler group": "batch", "reconciler kind": "Job"}
1.6451685078075259e+09	INFO	controller.queuedworkload	Starting workers	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "QueuedWorkload", "worker count": 1}
1.6451685078075113e+09	INFO	controller.capacity	Starting workers	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Capacity", "worker count": 1}
1.645168507807566e+09	INFO	controller.queue	Starting workers	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Queue", "worker count": 1}
1.6451685078076618e+09	INFO	controller.job	Starting workers	{"reconciler group": "batch", "reconciler kind": "Job", "worker count": 1}
1.645168507807886e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"ingress-nginx-admission-create","namespace":"kube-system"}}
1.645168507808418e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"ingress-nginx-admission-patch","namespace":"kube-system"}}
1.6451685078085716e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"kube-eventer-init-v1.6-a92aba6-aliyun","namespace":"kube-system"}}
1.6451706903900485e+09	LEVEL(-2)	capacity-reconciler	Capacity create event	{"capacity": {"name":"cluster-total"}}
1.6451706904384277e+09	LEVEL(-2)	queue-reconciler	Queue create event	{"queue": {"name":"main","namespace":"default"}}
1.6451707150770907e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"sample-job-jjbq2","namespace":"default"}}
1.6451707150895817e+09	LEVEL(-2)	queued-workload-reconciler	QueuedWorkload create event	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "queue": "main", "status": "pending"}
1.645170715089716e+09	LEVEL(-2)	scheduler	Workload assumed in the cache	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "capacity": "cluster-total"}
1.6451707150901928e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"sample-job-jjbq2","namespace":"default"}}
1.6451707150984285e+09	LEVEL(-2)	scheduler	Successfully assigned capacity and resource flavors to workload	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "capacity": "cluster-total"}
1.6451707150985863e+09	LEVEL(-2)	queued-workload-reconciler	QueuedWorkload update event	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "queue": "main", "capacity": "cluster-total", "status": "assigned", "prevStatus": "pending", "prevCapacity": ""}
1.6451707150986767e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"sample-job-jjbq2","namespace":"default"}}

We can chose to switch to klog/v2.

@denkensk
Copy link
Member Author

/cc @ahg-g @alculquicondor

@denkensk
Copy link
Member Author

denkensk commented Feb 18, 2022

/kind cleanup

@ahg-g ahg-g added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 18, 2022
@ArangoGutierrez
Copy link
Contributor

much agree, klog is soo much better

@ArangoGutierrez
Copy link
Contributor

/assign

@alculquicondor
Copy link
Contributor

alculquicondor commented Feb 18, 2022

Let's reach to some level of agreement before we jump into implementing :)

If this is juts about format, I believe there are different zap configurations we can use. This is actually my first time using zap, so I'm not really sure what's possible.

While klog format might be better, the framework doesn't currently support contextual logging, which simplifies building structured logging a lot.
But this is changing soon https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging

In the meantime, if we stick with logr for now, I think it will be easier to switch to klog (v3?) once it supports contextual logging.

@ArangoGutierrez
Copy link
Contributor

let's defer this to klog/v3 but keep circling back here. I am a klog fan for sure

@alculquicondor
Copy link
Contributor

is there a better format we can use for zap?

@ArangoGutierrez
Copy link
Contributor

This is why I don't like logr

1.6454820377420967e+09  ERROR   job-reconciler  Unsuspending job        {"job": {"name":"sample-job","namespace":"default"}, "error": "Operation cannot be fulfilled on jobs.batch \"sample-jo
b\": the object has been modified; please apply your changes to the latest version and try again"}                                                                                            sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile                 
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114                                
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler          
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem       
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2             
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227
1.6454820377421455e+09  ERROR   controller.job  Reconciler error        {"reconciler group": "batch", "reconciler kind": "Job", "name": "sample-job", "namespace": "default", "error": "Operat
ion cannot be fulfilled on jobs.batch \"sample-job\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem       
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2             
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227

had the similar erro/logs on NFD when using logr, and got rid by moving to klog/v2

@alculquicondor
Copy link
Contributor

Can you try to migrate to https://github.com/kubernetes/klog/tree/main/klogr?

@alculquicondor
Copy link
Contributor

I'm fine waiting for v3 or replacing zapper with klogr.

I don't want to loose contextual logging.

@ArangoGutierrez
Copy link
Contributor

I'm fine waiting for v3 or replacing zapper with klogr.

I don't want to loose contextual logging.

Since we are on a hold here
/unassign

@ArangoGutierrez ArangoGutierrez removed their assignment Mar 16, 2022
@alculquicondor
Copy link
Contributor

You could always give klogr a try

@ahg-g
Copy link
Contributor

ahg-g commented Apr 5, 2022

/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Apr 5, 2022
@ahg-g
Copy link
Contributor

ahg-g commented Apr 13, 2022

What is the status of this?

@alculquicondor
Copy link
Contributor

I disabled the dev mode for the binary, so the output is all structured now:

{"level":"Level(-2)","ts":1649793642.0015297,"logger":"controller.clusterqueue","msg":"Reconciling ClusterQueue","reconciler group":"kueue.x-k8s.io","reconciler kind":"ClusterQueue","name":"cluster-total","namespace":"","clusterQueue":{"name":"cluster-total"}}

It looks like klog made progress towards contextual logging https://github.com/kubernetes/klog/blob/main/contextual.go

Let me ask what's the status of it.

@alculquicondor
Copy link
Contributor

Documentation is up here:

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md

Kueue using it will help it mature.

/help

@k8s-ci-robot
Copy link
Contributor

@alculquicondor:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Documentation is up here:

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md

Kueue using it will help it mature.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022
@alculquicondor
Copy link
Contributor

/help
to migrate to klog's contextual logging.

@alculquicondor
Copy link
Contributor

/remove-lifecycle stale
/priority backlog

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 12, 2022
@alculquicondor
Copy link
Contributor

I just realized that kubernetes uses zap to do json output through klog.

klog's contextual logging implements the logr.Logger interface. So it doesn't look like we benefit much migrating to klog, given that controller-runtime already expects a Logger.

However, there are some logs for which we are receiving the wrong call reference. I'll see if there is something we can do kubernetes-sigs/controller-runtime#1737

@alculquicondor
Copy link
Contributor

I managed to fix it, but we'll have to leave this until there's a controller-runtime release kubernetes-sigs/controller-runtime#1975

@alculquicondor
Copy link
Contributor

/help cancel

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2022
@alculquicondor
Copy link
Contributor

/lifecycle stale

We should be able to upgrade controller-runtime to have the fix

/help

@kannon92
Copy link
Contributor

I'll take a look at this. /assign @kannon92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants