Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON output streams #104873

Merged
merged 2 commits into from Oct 11, 2021
Merged

JSON output streams #104873

merged 2 commits into from Oct 11, 2021

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 9, 2021

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

Split-routing and buffering increase the scalability of logging. See also kubernetes/enhancements#2912 (comment) and other comments in that PR.

Special notes for your reviewer:

Includes #105480

Does this PR introduce a user-facing change?

JSON log output is configurable and now supports writing info messages to stdout and error messages to stderr. Info messages can be buffered in memory. The default is to write both to stdout without buffering, as before.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components/README.md

New command line flags:

     --log-json-info-buffer-size quantity  [Experimental] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi).
     --log-json-split-stream               [Experimental] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout.

New options for kubelet config:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
logging:
  format: json
  options:
    json:
      splitStream: true
      infoBufferSize: 4Mi

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 9, 2021
@k8s-ci-robot k8s-ci-robot added area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/storage Categorizes an issue or PR as relevant to SIG Storage. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 9, 2021
@thockin
Copy link
Member

thockin commented Sep 9, 2021 via email

@fedebongio
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 9, 2021
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems sane, given my limited depth in this subsystem. Nothing majorly obviously wrong that I see.

// config file API however always has them.
if _, err := registry.LogRegistry.Get("json"); err == nil {
fs.BoolVar(&c.Options.JSON.SplitStream, "log-json-split-stream", false, "In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout.")
fs.Var(&c.Options.JSON.InfoBufferSize, "log-json-info-buffer-size", "In JSON format with split output streams, the info messages can be buffered for a while to increase performance.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify the default behavior, and maybe an example to show quantity-parsing (e.g. examples: 4096, 16Ki, 1Mi)

Also in the release notes it says "int"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Int" was before I enabled using resource.Quantity. I've updated the description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR release note still says "int"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release note is "JSON log output is configurable and now supports writing info messages to stdout and error messages to stderr. Info messages can be buffered in memory. The default is to write both to stdout without buffering, as before."

I had "int" in the "Additional documentation" example, but not anymore.

I've also updated the usage text.

return NewJSONLogger(infoStream, zapcore.Lock(os.Stderr))
}
out := zapcore.Lock(os.Stdout)
return NewJSONLogger(out, out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, optional: pass nil instead of duplicating out - it feels more natural and less perilous over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

other zapcore.WriteSyncer
}

func (f flushTwoWriters) Write(bs []byte) (int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments here would really help - why does this call Write on one but not the other?

return NewJSONLogger(out, out)
}

type flushTwoWriters struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and renamed the struct.

@pohly
Copy link
Contributor Author

pohly commented Oct 7, 2021

throw a hold on it if you want it not to merge?

I'm undecided. We add new fields here without a clear indication of their stability level and I am wondering whether we should be doing this better somehow.

That they appear in v1alpha1.LoggingConfiguration doesn't help because that is not visible to the user. The discussion in #105448 is about that.

Let's finish the review of this change (coding style, naming) but not merge quite yet, so:

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2021
@pohly
Copy link
Contributor Author

pohly commented Oct 7, 2021

@thockin: do you have an opinion whether the approach from 4ca3a97 (= embed arbitrary, logging backend specific parameters) is better than the one from fae0253 (= hard-code support for JSON in the LoggingConfiguration, regardless whether that backend is registered)? The commit messages explain both.

You probably looked at the overall change, which is using the approach with hard-coding because that is the more recent commit. I'll squash once I know which approach is preferred.

@thockin
Copy link
Member

thockin commented Oct 7, 2021

I will almost always take a strongly-typed solution over a map. In this case, I have a hard time making up reasons why a map would be reasonable - we're not going to add a dozen formats, right?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2021
@pohly
Copy link
Contributor Author

pohly commented Oct 8, 2021

I will almost always take a strongly-typed solution over a map. In this case, I have a hard time making up reasons why a map would be reasonable - we're not going to add a dozen formats, right?

Probably not. It's mostly a conceptual issue: component-base/logs jumps through quite some hoops to be completely independent of the "json" format, and now this PR breaks that abstraction.

But I guess that's okay. In practice, there will only be binaries with "text+json" and those where "json" hasn't been enabled yet, which should become fewer over time. The --log-json-* command line flags only get added when "json" is enabled and those commands which have a config file also have "json" support, so the current PR should be fine.

@pohly
Copy link
Contributor Author

pohly commented Oct 8, 2021

Let's keep the discussion around versioning of the struct separate, i.e. proceed with merging this PR.

To ensure that there's no confusion about the stability level of the new command line flags and configuration fields, I added an [Experimental] tag. I chose [Experimental] for consistency with how sanitization is presented. [Alpha] would have been more consistent with how it is described elsewhere (the --logging-format help text uses "alpha" and also other Kubernetes APIs). 🤷

I did not insert experimental into the flag names, based on my own preference for not breaking users during graduation and the conformation for that position from neolit123 in #105448 (comment)

I rebased because the registry refactoring was merged earlier in a separate PR, so now this PR became a bit simpler because it only needs to change the LogFormatFactory.Create signature.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2021
The Quantity type itself cannot be used because the Set method has the wrong
signature. Embedding Quantity inside a new QuantityValue type makes it possible
to inherit most of the methods while overriding the Set method.
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 8, 2021

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-containerd d18ec24dcdbda8abeb0aa995f71b0fe30c30494c link false /test pull-kubernetes-node-kubelet-serial-containerd
pull-kubernetes-node-kubelet-serial d18ec24dcdbda8abeb0aa995f71b0fe30c30494c link false /test pull-kubernetes-node-kubelet-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pohly
Copy link
Contributor Author

pohly commented Oct 8, 2021

/retest

Unrelated test flake.

This implements the replacement of klog output to different files per level
with optionally splitting JSON output into two streams: one for info messages
on stdout, one for error messages on stderr. The info messages can get buffered
to increase performance. Because stdout and stderr might be merged by the
consumer, the info stream gets flushed before writing an error, to ensure that
the order of messages is preserved.

This also ensures that the following code pattern doesn't leak info messages:
   klog.ErrorS(err, ...)
   os.Exit(1)

Commands explicitly have to flush before exiting via logs.FlushLogs. Most
already do. But buffered info messages can still get lost during an unexpected
program termination, therefore buffering is off by default.

The new options get added to the v1alpha1 LoggingConfiguration with new command
line flags. Because it is an alpha field, changing it inside the v1beta kubelet
config should be okay as long as the fields are clearly marked as alpha.
@thockin
Copy link
Member

thockin commented Oct 11, 2021

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2021
@k8s-ci-robot k8s-ci-robot merged commit fb82a0d into kubernetes:master Oct 11, 2021
SIG Node CI/Test Board automation moved this from Archive-it to Done Oct 11, 2021
SIG Node PR Triage automation moved this from Needs Reviewer to Done Oct 11, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 11, 2021
SIG Auth Old automation moved this from Needs Triage to Closed / Done Oct 11, 2021
@liggitt liggitt removed this from Assigned in API Reviews Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
Archived in project
SIG Auth Old
Closed / Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants