Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: scheduler: contextual logging #110833

Closed
wants to merge 5 commits into from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jun 28, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This makes kube-scheduler log output more useful.

Which issue(s) this PR fixes:

Related-to: #91633

Special notes for your reviewer:

The basic work is in the initial commit. The commits at the end start with full instrumentation and then gradually reduce the runtime overhead.

Someone needs to decide what the right tradeoff between information in log entries and runtime performance is.

Does this PR introduce a user-facing change?

kube-scheduler log output is more informative.

This replaces all log calls through the global klog logger with a logger that
gets passed in by the caller, either through an explicit parameter or the
context. This makes it possible to produce more informative log output by
adding WithName and/or WithValues instrumentation (not done yet!). It also
makes "go test" failures more useful because the log output gets associated
with the test which produced it.

Usually, context is the more future-proof option. It has happened repeatedly
throughout the life of this commit that a function that originally had no
context later got changed in the master branch to accept for reasons unrelated
to contextual logging. In several places, passing a context can replace passing
a stop channel and then serves two purposes (access to logger and cancellation).

But klog.FromContext has some overhead, so for simple functions a logger is
passed.
This enables benchmarking with JSON selected as output.
This adds a prefix to each log entry that shows which operation and/or
component produced the log entry (WithName) and ensures that relevant
additional values, in particular the pod that gets scheduled, are always
included (WithValues).

This makes log entries easier to understand, in particular when multiple pods
get scheduled in parallel.

The downside is the constant overhead for these additional calls: they need to
do some work, whether log entries will be printed or not.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 28, 2022
@k8s-ci-robot
Copy link
Contributor

@pohly: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 28, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please assign ahg-g, mikedanese after the PR has been reviewed.
You can assign the PR to them by writing /assign @ahg-g @mikedanese in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/cloudprovider area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 28, 2022
@k8s-ci-robot
Copy link
Contributor

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-conformance-kind-ga-only-parallel f7ac1e3 link true /test pull-kubernetes-conformance-kind-ga-only-parallel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@knelasevero
Copy link
Contributor

I want to take this over.

Should we discuss the tradeoff between information in log entries versus runtime performance in the issue or here?

@pohly
Copy link
Contributor Author

pohly commented Jun 28, 2022

The issue might be a better place for summarizing different options and giving the corresponding performance overhead. When publishing numbers, it is useful to link to some tagged code that was used for testing. Linking to a branch will work less well because those typically get modified to accommodate for review feedback.

@pohly
Copy link
Contributor Author

pohly commented Jun 30, 2022

Beware that a newer klog will be needed to prevent panics when a test leaks goroutines that continue to log through ktesting, see kubernetes/klog#337

@leilajal
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 30, 2022
@k8s-ci-robot
Copy link
Contributor

@pohly: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor Author

pohly commented Aug 3, 2022

@knelasevero is continuing with this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants