JSON output streams #104873

pohly · 2021-09-09T14:31:21Z

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

Split-routing and buffering increase the scalability of logging. See also kubernetes/enhancements#2912 (comment) and other comments in that PR.

Special notes for your reviewer:

Includes #105480

Does this PR introduce a user-facing change?

JSON log output is configurable and now supports writing info messages to stdout and error messages to stderr. Info messages can be buffered in memory. The default is to write both to stdout without buffering, as before.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components/README.md

New command line flags:

     --log-json-info-buffer-size quantity  [Experimental] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi).
     --log-json-split-stream               [Experimental] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout.

New options for kubelet config:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
logging:
  format: json
  options:
    json:
      splitStream: true
      infoBufferSize: 4Mi

thockin · 2021-09-09T16:42:36Z

There were concerns about split the streams on the KEP - are those resolved?

…

On Thu, Sep 9, 2021, 8:27 AM Kubernetes Prow Robot ***@***.***> wrote: @pohly <https://github.com/pohly>: The following tests *failed*, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests: Test name Commit Details Rerun command pull-kubernetes-unit 6f15421 <6f15421> link <https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/104873/pull-kubernetes-unit/1435974116841623552> /test pull-kubernetes-unit pull-kubernetes-verify 6f15421 <6f15421> link <https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/104873/pull-kubernetes-verify/1435974119110742016> /test pull-kubernetes-verify Full PR test history <https://prow.k8s.io/pr-history?org=kubernetes&repo=kubernetes&pr=104873>. Your PR dashboard <https://prow.k8s.io/pr?query=is%3Apr%20state%3Aopen%20author%3Apohly>. Please help us cut down on flakes by linking to <https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md#filing-issues-for-flaky-tests> an open issue <https://github.com/kubernetes/kubernetes/issues?q=is:issue+is:open> when you hit one in your PR. Instructions for interacting with me using PR comments are available here <https://git.k8s.io/community/contributors/guide/pull-requests.md>. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra <https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:> repository. I understand the commands that are listed here <https://go.k8s.io/bot-commands>. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#104873 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVBQXXB24DVLDDLBJ5DUBDG7TANCNFSM5DXI6MPA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

fedebongio · 2021-09-09T20:13:28Z

/remove-sig api-machinery

thockin

Overall this seems sane, given my limited depth in this subsystem. Nothing majorly obviously wrong that I see.

thockin · 2021-10-07T00:14:48Z

staging/src/k8s.io/component-base/logs/config.go

+	// config file API however always has them.
+	if _, err := registry.LogRegistry.Get("json"); err == nil {
+		fs.BoolVar(&c.Options.JSON.SplitStream, "log-json-split-stream", false, "In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout.")
+		fs.Var(&c.Options.JSON.InfoBufferSize, "log-json-info-buffer-size", "In JSON format with split output streams, the info messages can be buffered for a while to increase performance.")


specify the default behavior, and maybe an example to show quantity-parsing (e.g. examples: 4096, 16Ki, 1Mi)

Also in the release notes it says "int"

"Int" was before I enabled using resource.Quantity. I've updated the description.

the PR release note still says "int"

The release note is "JSON log output is configurable and now supports writing info messages to stdout and error messages to stderr. Info messages can be buffered in memory. The default is to write both to stdout without buffering, as before."

I had "int" in the "Additional documentation" example, but not anymore.

I've also updated the usage text.

thockin · 2021-10-07T17:01:43Z

staging/src/k8s.io/component-base/logs/json/json.go

+		return NewJSONLogger(infoStream, zapcore.Lock(os.Stderr))
+	}
+	out := zapcore.Lock(os.Stdout)
+	return NewJSONLogger(out, out)


Suggestion, optional: pass nil instead of duplicating out - it feels more natural and less perilous over time.

Good idea, done.

thockin · 2021-10-07T17:05:10Z

staging/src/k8s.io/component-base/logs/json/json.go

+	other zapcore.WriteSyncer
+}
+
+func (f flushTwoWriters) Write(bs []byte) (int, error) {


Some comments here would really help - why does this call Write on one but not the other?

thockin · 2021-10-07T17:05:19Z

staging/src/k8s.io/component-base/logs/json/json.go

+	return NewJSONLogger(out, out)
+}
+
+type flushTwoWriters struct {


Done, and renamed the struct.

pohly · 2021-10-07T18:45:44Z

throw a hold on it if you want it not to merge?

I'm undecided. We add new fields here without a clear indication of their stability level and I am wondering whether we should be doing this better somehow.

That they appear in v1alpha1.LoggingConfiguration doesn't help because that is not visible to the user. The discussion in #105448 is about that.

Let's finish the review of this change (coding style, naming) but not merge quite yet, so:

/hold

pohly · 2021-10-07T18:49:06Z

@thockin: do you have an opinion whether the approach from 4ca3a97 (= embed arbitrary, logging backend specific parameters) is better than the one from fae0253 (= hard-code support for JSON in the LoggingConfiguration, regardless whether that backend is registered)? The commit messages explain both.

You probably looked at the overall change, which is using the approach with hard-coding because that is the more recent commit. I'll squash once I know which approach is preferred.

thockin · 2021-10-07T19:48:31Z

I will almost always take a strongly-typed solution over a map. In this case, I have a hard time making up reasons why a map would be reasonable - we're not going to add a dozen formats, right?

pohly · 2021-10-08T06:08:35Z

I will almost always take a strongly-typed solution over a map. In this case, I have a hard time making up reasons why a map would be reasonable - we're not going to add a dozen formats, right?

Probably not. It's mostly a conceptual issue: component-base/logs jumps through quite some hoops to be completely independent of the "json" format, and now this PR breaks that abstraction.

But I guess that's okay. In practice, there will only be binaries with "text+json" and those where "json" hasn't been enabled yet, which should become fewer over time. The --log-json-* command line flags only get added when "json" is enabled and those commands which have a config file also have "json" support, so the current PR should be fine.

pohly · 2021-10-08T06:47:25Z

Let's keep the discussion around versioning of the struct separate, i.e. proceed with merging this PR.

To ensure that there's no confusion about the stability level of the new command line flags and configuration fields, I added an [Experimental] tag. I chose [Experimental] for consistency with how sanitization is presented. [Alpha] would have been more consistent with how it is described elsewhere (the --logging-format help text uses "alpha" and also other Kubernetes APIs). 🤷

I did not insert experimental into the flag names, based on my own preference for not breaking users during graduation and the conformation for that position from neolit123 in #105448 (comment)

I rebased because the registry refactoring was merged earlier in a separate PR, so now this PR became a bit simpler because it only needs to change the LogFormatFactory.Create signature.

/hold cancel

The Quantity type itself cannot be used because the Set method has the wrong signature. Embedding Quantity inside a new QuantityValue type makes it possible to inherit most of the methods while overriding the Set method.

k8s-ci-robot · 2021-10-08T07:36:23Z

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-node-kubelet-serial-containerd	d18ec24dcdbda8abeb0aa995f71b0fe30c30494c	link	false	`/test pull-kubernetes-node-kubelet-serial-containerd`
pull-kubernetes-node-kubelet-serial	d18ec24dcdbda8abeb0aa995f71b0fe30c30494c	link	false	`/test pull-kubernetes-node-kubelet-serial`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

pohly · 2021-10-08T09:03:42Z

/retest

Unrelated test flake.

This implements the replacement of klog output to different files per level with optionally splitting JSON output into two streams: one for info messages on stdout, one for error messages on stderr. The info messages can get buffered to increase performance. Because stdout and stderr might be merged by the consumer, the info stream gets flushed before writing an error, to ensure that the order of messages is preserved. This also ensures that the following code pattern doesn't leak info messages: klog.ErrorS(err, ...) os.Exit(1) Commands explicitly have to flush before exiting via logs.FlushLogs. Most already do. But buffered info messages can still get lost during an unexpected program termination, therefore buffering is off by default. The new options get added to the v1alpha1 LoggingConfiguration with new command line flags. Because it is an alpha field, changing it inside the v1beta kubelet config should be okay as long as the fields are clearly marked as alpha.

thockin · 2021-10-11T00:02:27Z

Thanks!

/lgtm
/approve

k8s-ci-robot · 2021-10-11T00:02:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/apis/config/OWNERS~~ [thockin]
~~staging/src/k8s.io/apimachinery/pkg/OWNERS~~ [thockin]
~~staging/src/k8s.io/component-base/config/OWNERS~~ [thockin]
~~staging/src/k8s.io/component-base/logs/OWNERS~~ [thockin]
~~staging/src/k8s.io/kubelet/config/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from andrewsykim, caesarxuchao and a team September 9, 2021 14:32

k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 9, 2021

thockin reviewed Oct 7, 2021

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2021

pohly force-pushed the json-output-stream branch from 19098e8 to d64da23 Compare October 8, 2021 06:47

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2021

pohly force-pushed the json-output-stream branch from d64da23 to 56c2b58 Compare October 8, 2021 06:49

resource: support using Quantity as command line value

963d3c1

The Quantity type itself cannot be used because the Set method has the wrong signature. Embedding Quantity inside a new QuantityValue type makes it possible to inherit most of the methods while overriding the Set method.

pohly force-pushed the json-output-stream branch from 56c2b58 to 6f1b990 Compare October 8, 2021 07:06

serathius mentioned this pull request Oct 8, 2021

Make info logging non-blocking kubernetes/klog#209

Closed

pohly force-pushed the json-output-stream branch from 6f1b990 to b22263d Compare October 9, 2021 08:10

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2021

k8s-ci-robot merged commit fb82a0d into kubernetes:master Oct 11, 2021

SIG Node CI/Test Board automation moved this from Archive-it to Done Oct 11, 2021

SIG Node PR Triage automation moved this from Needs Reviewer to Done Oct 11, 2021

k8s-ci-robot added this to the v1.23 milestone Oct 11, 2021

SIG Auth Old automation moved this from Needs Triage to Closed / Done Oct 11, 2021

liggitt removed this from Assigned in API Reviews Oct 28, 2021

serathius mentioned this pull request Nov 1, 2021

Remove klog specific command line arguments from Kubernetes components kubernetes/enhancements#2845

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON output streams #104873

JSON output streams #104873

pohly commented Sep 9, 2021 •

edited

thockin commented Sep 9, 2021 via email

fedebongio commented Sep 9, 2021

thockin left a comment

thockin Oct 7, 2021

pohly Oct 7, 2021

thockin Oct 7, 2021

pohly Oct 9, 2021

thockin Oct 7, 2021

pohly Oct 8, 2021

thockin Oct 7, 2021

thockin Oct 7, 2021

pohly Oct 8, 2021

pohly commented Oct 7, 2021

pohly commented Oct 7, 2021

thockin commented Oct 7, 2021

pohly commented Oct 8, 2021

pohly commented Oct 8, 2021

k8s-ci-robot commented Oct 8, 2021 •

edited

pohly commented Oct 8, 2021

thockin commented Oct 11, 2021

k8s-ci-robot commented Oct 11, 2021

JSON output streams #104873

JSON output streams #104873

Conversation

pohly commented Sep 9, 2021 • edited

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

thockin commented Sep 9, 2021 via email

fedebongio commented Sep 9, 2021

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Oct 7, 2021

pohly commented Oct 7, 2021

thockin commented Oct 7, 2021

pohly commented Oct 8, 2021

pohly commented Oct 8, 2021

k8s-ci-robot commented Oct 8, 2021 • edited

pohly commented Oct 8, 2021

thockin commented Oct 11, 2021

k8s-ci-robot commented Oct 11, 2021

pohly commented Sep 9, 2021 •

edited

k8s-ci-robot commented Oct 8, 2021 •

edited