New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: scheduler: contextual logging #110833
Conversation
This replaces all log calls through the global klog logger with a logger that gets passed in by the caller, either through an explicit parameter or the context. This makes it possible to produce more informative log output by adding WithName and/or WithValues instrumentation (not done yet!). It also makes "go test" failures more useful because the log output gets associated with the test which produced it. Usually, context is the more future-proof option. It has happened repeatedly throughout the life of this commit that a function that originally had no context later got changed in the master branch to accept for reasons unrelated to contextual logging. In several places, passing a context can replace passing a stop channel and then serves two purposes (access to logger and cancellation). But klog.FromContext has some overhead, so for simple functions a logger is passed.
This enables benchmarking with JSON selected as output.
This adds a prefix to each log entry that shows which operation and/or component produced the log entry (WithName) and ensures that relevant additional values, in particular the pod that gets scheduled, are always included (WithValues). This makes log entries easier to understand, in particular when multiple pods get scheduled in parallel. The downside is the constant overhead for these additional calls: they need to do some work, whether log entries will be printed or not.
@pohly: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@pohly: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I want to take this over. Should we discuss the tradeoff between information in log entries versus runtime performance in the issue or here? |
The issue might be a better place for summarizing different options and giving the corresponding performance overhead. When publishing numbers, it is useful to link to some tagged code that was used for testing. Linking to a branch will work less well because those typically get modified to accommodate for review feedback. |
Beware that a newer klog will be needed to prevent panics when a test leaks goroutines that continue to log through ktesting, see kubernetes/klog#337 |
/remove-sig api-machinery |
@pohly: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@knelasevero is continuing with this work. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This makes kube-scheduler log output more useful.
Which issue(s) this PR fixes:
Related-to: #91633
Special notes for your reviewer:
The basic work is in the initial commit. The commits at the end start with full instrumentation and then gradually reduce the runtime overhead.
Someone needs to decide what the right tradeoff between information in log entries and runtime performance is.
Does this PR introduce a user-facing change?