Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trace message output #79

Merged
merged 8 commits into from Jan 8, 2020
Merged

Add trace message output #79

merged 8 commits into from Jan 8, 2020

Conversation

scaat
Copy link

@scaat scaat commented Jul 31, 2019

What this PR does / why we need it:

Add trace message output when severity is fatalLog and -logtostderr has been specified.

When -logtostderr is specified, I called klog.Fatal("fatal msg"), only

F0731 10:36:27.954792   63349 main.go:30] fatal msg

will be output on the console, and no trace information will be output on the console. But more often, I need that.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

klog now always prints a trace of all goroutines on stderr on klog.Fatal(...). Previously this wasn't the case, the stack trace was omitted when the -logtostderr flag was used. (#79)

@k8s-ci-robot
Copy link

Welcome @scaat!

It looks like this is your first PR to kubernetes/klog 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/klog has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 31, 2019
@scaat
Copy link
Author

scaat commented Jul 31, 2019

/check-cla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 31, 2019
@scaat
Copy link
Author

scaat commented Jul 31, 2019

/assign @tallclair

@hoegaarden
Copy link

Some observations:

  • The original comment

    klog/klog.go

    Lines 812 to 814 in 6a023d6

    // First, make sure we see the trace for the current goroutine on standard error.
    // If -logtostderr has been specified, the loop below will do that anyway
    // as the first stack in the full dump.

    suggests that the trace should always be logged to stdErr.

    If -logtostderr has been specified, the loop below will do that anyway as the first stack in the full dump.

    I don't see where this is happening and cannot reproduce that locally. (Or am I misreading that comment?)

  • Right now, when -logtostderr is not specified, a stack trace (for the current goroutine) is printed to stdErr

  • Test coverage on different combinations of -alsologtostderr, -logtostderr, -log_dir, and -log_file and what should get printed where isn't great.

So I suggest (see below) to move to printing the trace to stdErr on Fatal -- always, no matter which flags are specified.

This is however a change to how klog behaves -- just calling that out.

What do you all think? @kubernetes/klog-maintainers


diff --git a/klog.go b/klog.go
index 17d2975..108e0b0 100644
--- a/klog.go
+++ b/klog.go
@@ -809,14 +809,13 @@ func (l *loggingT) output(s severity, buf *buffer, file string, line int, alsoTo
 			os.Exit(1)
 		}
 		// Dump all goroutine stacks before exiting.
-		// First, make sure we see the trace for the current goroutine on standard error.
-		// If -logtostderr has been specified, the loop below will do that anyway
-		// as the first stack in the full dump.
-		if !l.toStderr {
-			os.Stderr.Write(stacks(false))
-		}
-		// Write the stack trace for all goroutines to the files.
 		trace := stacks(true)
+
+		// We make sure we see the trace for all goroutines on standard
+		// error in any case.
+		os.Stderr.Write(trace)
+
+		// Write the stack trace for all goroutines to the files.
 		logExitFunc = func(error) {} // If we get a write error, we'll still exit below.
 		for log := fatalLog; log >= infoLog; log-- {
 			if f := l.file[log]; f != nil { // Can be nil if -logtostderr is set.
``

@hoegaarden
Copy link

Hello @scaat ,

Now with #80 merged, your PR won't pass the tests anymore. Can you please rebase & adapt the test-cases regarding to the new behaviour your change is introducing?

Thanks, Hannes

@scaat
Copy link
Author

scaat commented Aug 5, 2019

Hello @hoegaarden
The expected results of testing conflicts with the behavior I wanted.
With_logtostderr_only, with_log_dir_and_logtostderr, and default_flags do not want stackTraceRE to appear.
Now, what I want to confirm is whether the e2e test represents the final default behavior of klog. If not, what should I do? Thanks.

@hoegaarden
Copy link

I added the tests, so we have a harness to make deliberate changes to klog's behaviour. Specifically the printing of traces is a good example: It is not entirely clear when those are printed right now. I would like to see the traces on stderr either always or only if e.g. a flag tracesonstderr is provided -- but not the current case, where it sometimes prints traces to stderr and sometimes it doesn't. But ultimately I need to defer that to the approvers of this repo.

But in general, when there is consensus on how klog should behave, we can just:

  • encode that expected behaviour in the tests
  • adapt the implementations to the agreed upon behaviour
  • see the tests go green

@dims
Copy link
Member

dims commented Aug 8, 2019

/assign @hoegaarden

@hoegaarden
Copy link

@scaat -- Thinking a bit more about this, I think we should print the traces whenever we print a fatal log message. This is the behaviour that is consistent with logging to files (log_file & log_dir). So I think this is exactly what you implementation does in the first place. Therefore it should be enough if you just change the tests for that.

The only thing that I still don't really understand is -stderrthreshold: It seems to be ignored if either -logtostderr or -alsologtostderr are provided and only is considered if both of them are turned off / set to false ... 🤷

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 9, 2019
@k8s-ci-robot
Copy link

@scaat: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@scaat
Copy link
Author

scaat commented Aug 9, 2019

@hoegaarden Thanks.
I made some adjustments to make sure the test passed normally.

@hoegaarden
Copy link

Any of the approvers wanna approve this? Or is there still something to sort out?

@dims
Copy link
Member

dims commented Sep 6, 2019

@hoegaarden deferring to @DirectXMan12 for approve :)

@DirectXMan12
Copy link

Was waiting until we have a release note listed in the PR description.

@hoegaarden
Copy link

@scaat Can you please add a release note into the PR description ... either the one I proposed or something else/clearer/better.

@DirectXMan12 I don't think we have any tooling in place here in k/klog which picks up the ```release-note stuff yet. But I agree, we should use the same thing over here and try to get the same automation eventually.

@scaat
Copy link
Author

scaat commented Sep 20, 2019

@hoegaarden I have already added the one you proposed into the PR description. What else do I need to do?

@hoegaarden
Copy link

@hoegaarden I have already added the one you proposed into the PR description. What else do I need to do?

@DirectXMan12 are we good to go from your perspective?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2020
@scaat scaat requested a review from hoegaarden January 3, 2020 03:22
@hoegaarden
Copy link

/assign @dims @DirectXMan12
for final approval, finally ;)

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2020
@dims
Copy link
Member

dims commented Jan 8, 2020

/approve
/lgtm

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, hoegaarden, scaat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member

liggitt commented Feb 9, 2022

This change made consumers of klog start outputting excessive amounts of stack trace logs when invoking a fatal error (xref kubernetes/kubernetes#107665 and kubernetes/kubernetes#94663)

That seems like a breaking change, or at least one that reduces usability for existing consumers.

@dims
Copy link
Member

dims commented Feb 9, 2022

@hoegaarden please see @liggitt 's concern above

cc @serathius @pohly for additional thoughts

@pohly
Copy link

pohly commented Feb 9, 2022

I agree that this change reduces the usefulness of klog. In the default configuration the stack dumps go to stderr. If that is a console and there are many goroutines, the console scrollback buffer overflows an the actual error at the beginning of the dump gets lost.

@pohly
Copy link

pohly commented Mar 23, 2022

Let's revert this. To handle the case where all backtraces are wanted, I am proposing some new API to include that. See #316.

pohly added a commit to pohly/klog that referenced this pull request Jun 13, 2022
This is a revert of kubernetes#79.

Dumping the stack backtraces of all goroutines to stderr is not useful for
processes which have many goroutines. The console buffer overflows and the
original error which got dumped earlier is no longer visible. Even in CI
systems where stderr is captured the full dump is often not useful.

This was pointed out as a potential problem during the original PR review but
only got more attention after the updated klog got merged into Kubernetes and
developers there started to see the longer output.
pohly added a commit to pohly/klog that referenced this pull request Jun 13, 2022
After reverting kubernetes#79 users who want the
full dump of all goroutines can still get it by calling

   github.com/go-logr/lib.Backtrace(lib.BacktraceAll(true)).
pohly added a commit to pohly/klog that referenced this pull request Jun 13, 2022
This is a revert of kubernetes#79.

Dumping the stack backtraces of all goroutines to stderr is not useful for
processes which have many goroutines. The console buffer overflows and the
original error which got dumped earlier is no longer visible. Even in CI
systems where stderr is captured the full dump is often not useful.

This was pointed out as a potential problem during the original PR review but
only got more attention after the updated klog got merged into Kubernetes and
developers there started to see the longer output.
pohly added a commit to pohly/klog that referenced this pull request Jun 13, 2022
After reverting kubernetes#79 users who want the
full dump of all goroutines can still get it by calling

   github.com/go-logr/lib.Backtrace(lib.BacktraceAll(true)).
pohly added a commit to pohly/klog that referenced this pull request Jun 14, 2022
After reverting kubernetes#79 users who want the
full dump of all goroutines can still get it by calling

   github.com/go-logr/lib.Backtrace(lib.BacktraceAll(true)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants