WIP: scheduler: contextual logging #110833

pohly · 2022-06-28T10:37:59Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This makes kube-scheduler log output more useful.

Which issue(s) this PR fixes:

Related-to: #91633

Special notes for your reviewer:

The basic work is in the initial commit. The commits at the end start with full instrumentation and then gradually reduce the runtime overhead.

Someone needs to decide what the right tradeoff between information in log entries and runtime performance is.

Does this PR introduce a user-facing change?

kube-scheduler log output is more informative.

This replaces all log calls through the global klog logger with a logger that gets passed in by the caller, either through an explicit parameter or the context. This makes it possible to produce more informative log output by adding WithName and/or WithValues instrumentation (not done yet!). It also makes "go test" failures more useful because the log output gets associated with the test which produced it. Usually, context is the more future-proof option. It has happened repeatedly throughout the life of this commit that a function that originally had no context later got changed in the master branch to accept for reasons unrelated to contextual logging. In several places, passing a context can replace passing a stop channel and then serves two purposes (access to logger and cancellation). But klog.FromContext has some overhead, so for simple functions a logger is passed.

This enables benchmarking with JSON selected as output.

This adds a prefix to each log entry that shows which operation and/or component produced the log entry (WithName) and ensures that relevant additional values, in particular the pod that gets scheduled, are always included (WithValues). This makes log entries easier to understand, in particular when multiple pods get scheduled in parallel. The downside is the constant overhead for these additional calls: they need to do some work, whether log entries will be printed or not.

k8s-ci-robot · 2022-06-28T10:38:07Z

@pohly: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-06-28T10:38:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please assign ahg-g, mikedanese after the PR has been reviewed.
You can assign the PR to them by writing /assign @ahg-g @mikedanese in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cmd/kube-controller-manager/OWNERS
cmd/kube-scheduler/OWNERS
hack/OWNERS
pkg/scheduler/OWNERS
staging/src/k8s.io/cloud-provider/OWNERS
test/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2022-06-28T11:10:47Z

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-conformance-kind-ga-only-parallel	`f7ac1e3`	link	true	`/test pull-kubernetes-conformance-kind-ga-only-parallel`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

knelasevero · 2022-06-28T12:30:06Z

I want to take this over.

Should we discuss the tradeoff between information in log entries versus runtime performance in the issue or here?

pohly · 2022-06-28T12:42:06Z

The issue might be a better place for summarizing different options and giving the corresponding performance overhead. When publishing numbers, it is useful to link to some tagged code that was used for testing. Linking to a branch will work less well because those typically get modified to accommodate for review feedback.

pohly · 2022-06-30T07:42:36Z

Beware that a newer klog will be needed to prevent panics when a test leaks goroutines that continue to log through ktesting, see kubernetes/klog#337

leilajal · 2022-06-30T16:38:42Z

/remove-sig api-machinery

k8s-ci-robot · 2022-07-05T02:34:00Z

@pohly: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2022-08-03T07:46:57Z

@knelasevero is continuing with this work.

pohly added 5 commits June 28, 2022 12:32

scheduler_perf: support JSON

408a28e

This enables benchmarking with JSON selected as output.

scheduler: only modify loggers at v>=3

32596ed

scheduler: only add pod and node, no WithName

f7ac1e3

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 28, 2022

k8s-ci-robot requested review from ahg-g and cheftako June 28, 2022 10:38

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 30, 2022

knelasevero mentioned this pull request Jul 5, 2022

Do contextual logging for scheduler #91633

Open

knelasevero mentioned this pull request Jul 14, 2022

scheduler: contextual logging - WithName and WithValue #111155

Closed

pohly closed this Aug 3, 2022

mengjiao-liu mentioned this pull request May 29, 2023

scheduler: update the scheduler interface and cache methods to use contextual logging #116849

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: scheduler: contextual logging #110833

WIP: scheduler: contextual logging #110833

pohly commented Jun 28, 2022

k8s-ci-robot commented Jun 28, 2022

k8s-ci-robot commented Jun 28, 2022

k8s-ci-robot commented Jun 28, 2022

knelasevero commented Jun 28, 2022

pohly commented Jun 28, 2022

pohly commented Jun 30, 2022

leilajal commented Jun 30, 2022

k8s-ci-robot commented Jul 5, 2022

pohly commented Aug 3, 2022

WIP: scheduler: contextual logging #110833

WIP: scheduler: contextual logging #110833

Conversation

pohly commented Jun 28, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Jun 28, 2022

k8s-ci-robot commented Jun 28, 2022

k8s-ci-robot commented Jun 28, 2022

knelasevero commented Jun 28, 2022

pohly commented Jun 28, 2022

pohly commented Jun 30, 2022

leilajal commented Jun 30, 2022

k8s-ci-robot commented Jul 5, 2022

pohly commented Aug 3, 2022