Add trace message output #79

scaat · 2019-07-31T02:45:55Z

What this PR does / why we need it:

Add trace message output when severity is fatalLog and -logtostderr has been specified.

When -logtostderr is specified, I called klog.Fatal("fatal msg"), only

F0731 10:36:27.954792   63349 main.go:30] fatal msg

will be output on the console, and no trace information will be output on the console. But more often, I need that.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

klog now always prints a trace of all goroutines on stderr on klog.Fatal(...). Previously this wasn't the case, the stack trace was omitted when the -logtostderr flag was used. (#79)

…as been specified

k8s-ci-robot · 2019-07-31T02:45:56Z

Welcome @scaat!

It looks like this is your first PR to kubernetes/klog 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/klog has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-07-31T02:45:57Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: login-issues@jira.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

scaat · 2019-07-31T04:50:26Z

/check-cla

scaat · 2019-07-31T04:52:48Z

/assign @tallclair

hoegaarden · 2019-07-31T11:09:15Z

Some observations:

The original comment

klog/klog.go

Lines 812 to 814 in 6a023d6

    
           // First, make sure we see the trace for the current goroutine on standard error. 
        
           // If -logtostderr has been specified, the loop below will do that anyway 
        
           // as the first stack in the full dump.

suggests that the trace should always be logged to stdErr.

If -logtostderr has been specified, the loop below will do that anyway as the first stack in the full dump.

I don't see where this is happening and cannot reproduce that locally. (Or am I misreading that comment?)

Right now, when -logtostderr is not specified, a stack trace (for the current goroutine) is printed to stdErr
Test coverage on different combinations of -alsologtostderr, -logtostderr, -log_dir, and -log_file and what should get printed where isn't great.

So I suggest (see below) to move to printing the trace to stdErr on Fatal -- always, no matter which flags are specified.

This is however a change to how klog behaves -- just calling that out.

What do you all think? @kubernetes/klog-maintainers

diff --git a/klog.go b/klog.go
index 17d2975..108e0b0 100644
--- a/klog.go
+++ b/klog.go
@@ -809,14 +809,13 @@ func (l *loggingT) output(s severity, buf *buffer, file string, line int, alsoTo
 			os.Exit(1)
 		}
 		// Dump all goroutine stacks before exiting.
-		// First, make sure we see the trace for the current goroutine on standard error.
-		// If -logtostderr has been specified, the loop below will do that anyway
-		// as the first stack in the full dump.
-		if !l.toStderr {
-			os.Stderr.Write(stacks(false))
-		}
-		// Write the stack trace for all goroutines to the files.
 		trace := stacks(true)
+
+		// We make sure we see the trace for all goroutines on standard
+		// error in any case.
+		os.Stderr.Write(trace)
+
+		// Write the stack trace for all goroutines to the files.
 		logExitFunc = func(error) {} // If we get a write error, we'll still exit below.
 		for log := fatalLog; log >= infoLog; log-- {
 			if f := l.file[log]; f != nil { // Can be nil if -logtostderr is set.
``

hoegaarden · 2019-08-05T08:29:34Z

Hello @scaat ,

Now with #80 merged, your PR won't pass the tests anymore. Can you please rebase & adapt the test-cases regarding to the new behaviour your change is introducing?

Thanks, Hannes

scaat · 2019-08-05T10:05:46Z

Hello @hoegaarden
The expected results of testing conflicts with the behavior I wanted.
With_logtostderr_only, with_log_dir_and_logtostderr, and default_flags do not want stackTraceRE to appear.
Now, what I want to confirm is whether the e2e test represents the final default behavior of klog. If not, what should I do? Thanks.

hoegaarden · 2019-08-05T10:27:24Z

I added the tests, so we have a harness to make deliberate changes to klog's behaviour. Specifically the printing of traces is a good example: It is not entirely clear when those are printed right now. I would like to see the traces on stderr either always or only if e.g. a flag tracesonstderr is provided -- but not the current case, where it sometimes prints traces to stderr and sometimes it doesn't. But ultimately I need to defer that to the approvers of this repo.

But in general, when there is consensus on how klog should behave, we can just:

encode that expected behaviour in the tests
adapt the implementations to the agreed upon behaviour
see the tests go green

dims · 2019-08-08T10:39:50Z

/assign @hoegaarden

hoegaarden · 2019-08-08T13:54:05Z

@scaat -- Thinking a bit more about this, I think we should print the traces whenever we print a fatal log message. This is the behaviour that is consistent with logging to files (log_file & log_dir). So I think this is exactly what you implementation does in the first place. Therefore it should be enough if you just change the tests for that.

The only thing that I still don't really understand is -stderrthreshold: It seems to be ignored if either -logtostderr or -alsologtostderr are provided and only is considered if both of them are turned off / set to false ... 🤷

…as been specified

k8s-ci-robot · 2019-08-09T03:45:46Z

@scaat: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

scaat · 2019-08-09T05:01:31Z

@hoegaarden Thanks.
I made some adjustments to make sure the test passed normally.

hoegaarden · 2019-09-06T15:28:38Z

Any of the approvers wanna approve this? Or is there still something to sort out?

dims · 2019-09-06T18:40:20Z

@hoegaarden deferring to @DirectXMan12 for approve :)

DirectXMan12 · 2019-09-09T14:18:16Z

Was waiting until we have a release note listed in the PR description.

hoegaarden · 2019-09-20T09:42:14Z

@scaat Can you please add a release note into the PR description ... either the one I proposed or something else/clearer/better.

@DirectXMan12 I don't think we have any tooling in place here in k/klog which picks up the ```release-note stuff yet. But I agree, we should use the same thing over here and try to get the same automation eventually.

scaat · 2019-09-20T11:03:54Z

@hoegaarden I have already added the one you proposed into the PR description. What else do I need to do?

hoegaarden · 2019-10-04T07:57:03Z

@hoegaarden I have already added the one you proposed into the PR description. What else do I need to do?

@DirectXMan12 are we good to go from your perspective?

fejta-bot · 2020-01-02T08:47:59Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

hoegaarden · 2020-01-07T11:54:37Z

/assign @dims @DirectXMan12
for final approval, finally ;)

/remove-lifecycle stale

dims · 2020-01-08T02:21:35Z

/approve
/lgtm

k8s-ci-robot · 2020-01-08T02:21:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, hoegaarden, scaat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liggitt · 2022-02-09T16:51:38Z

This change made consumers of klog start outputting excessive amounts of stack trace logs when invoking a fatal error (xref kubernetes/kubernetes#107665 and kubernetes/kubernetes#94663)

That seems like a breaking change, or at least one that reduces usability for existing consumers.

dims · 2022-02-09T17:21:47Z

@hoegaarden please see @liggitt 's concern above

cc @serathius @pohly for additional thoughts

pohly · 2022-02-09T19:33:31Z

I agree that this change reduces the usefulness of klog. In the default configuration the stack dumps go to stderr. If that is a console and there are many goroutines, the console scrollback buffer overflows an the actual error at the beginning of the dump gets lost.

pohly · 2022-03-23T15:34:49Z

Let's revert this. To handle the case where all backtraces are wanted, I am proposing some new API to include that. See #316.

This is a revert of kubernetes#79. Dumping the stack backtraces of all goroutines to stderr is not useful for processes which have many goroutines. The console buffer overflows and the original error which got dumped earlier is no longer visible. Even in CI systems where stderr is captured the full dump is often not useful. This was pointed out as a potential problem during the original PR review but only got more attention after the updated klog got merged into Kubernetes and developers there started to see the longer output.

After reverting kubernetes#79 users who want the full dump of all goroutines can still get it by calling github.com/go-logr/lib.Backtrace(lib.BacktraceAll(true)).

This is a revert of kubernetes#79. Dumping the stack backtraces of all goroutines to stderr is not useful for processes which have many goroutines. The console buffer overflows and the original error which got dumped earlier is no longer visible. Even in CI systems where stderr is captured the full dump is often not useful. This was pointed out as a potential problem during the original PR review but only got more attention after the updated klog got merged into Kubernetes and developers there started to see the longer output.

After reverting kubernetes#79 users who want the full dump of all goroutines can still get it by calling github.com/go-logr/lib.Backtrace(lib.BacktraceAll(true)).

Add trace message output when severity is fatalLog and -logtostderr h…

9c5b78c

…as been specified

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 31, 2019

k8s-ci-robot requested review from hoegaarden and yagonobre July 31, 2019 02:46

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 31, 2019

k8s-ci-robot assigned tallclair Jul 31, 2019

hoegaarden mentioned this pull request Aug 1, 2019

Backfill integration tests for selecting log destinations #80

Merged

k8s-ci-robot assigned hoegaarden Aug 8, 2019

scaat force-pushed the master branch from f134ac3 to 9c5b78c Compare August 9, 2019 02:04

scaat added 2 commits August 9, 2019 10:08

Merge remote-tracking branch 'upstream/master'

ccbccb0

Add trace message output when severity is fatalLog and -logtostderr h…

8189c37

…as been specified

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 9, 2019

Reformat the code

7707414

scaat added 2 commits August 9, 2019 12:47

Add trace message output

d5e1ba9

Use lockAndFlushAll instead of flushAll

b3e5612

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2020

scaat requested a review from hoegaarden January 3, 2020 03:22

k8s-ci-robot assigned dims Jan 7, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2020

k8s-ci-robot merged commit c4f7487 into kubernetes:master Jan 8, 2020

soltysh mentioned this pull request Sep 9, 2020

Change at which level klog.Fatal is invoked kubernetes/kubernetes#94663

Merged

knight42 mentioned this pull request Dec 24, 2020

kubectl get pods --sort-by=status makes kubectl crash (error not handled) kubernetes/kubectl#993

Closed

pohly mentioned this pull request Jan 24, 2022

kube-controller-manager yields whole stack trace when it loses leaderelection kubernetes/kubernetes#107665

Closed

pohly mentioned this pull request Mar 23, 2022

dumping stacks during klog.Fatal #316

Closed

pohly mentioned this pull request Jun 13, 2022

klog.Fatal backtrace revert #328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trace message output #79

Add trace message output #79

scaat commented Jul 31, 2019 •

edited

k8s-ci-robot commented Jul 31, 2019

k8s-ci-robot commented Jul 31, 2019

scaat commented Jul 31, 2019

scaat commented Jul 31, 2019

hoegaarden commented Jul 31, 2019

hoegaarden commented Aug 5, 2019

scaat commented Aug 5, 2019

hoegaarden commented Aug 5, 2019

dims commented Aug 8, 2019

hoegaarden commented Aug 8, 2019

k8s-ci-robot commented Aug 9, 2019

scaat commented Aug 9, 2019

hoegaarden commented Sep 6, 2019

dims commented Sep 6, 2019

DirectXMan12 commented Sep 9, 2019

hoegaarden commented Sep 20, 2019

scaat commented Sep 20, 2019

hoegaarden commented Oct 4, 2019

fejta-bot commented Jan 2, 2020

hoegaarden commented Jan 7, 2020

dims commented Jan 8, 2020

k8s-ci-robot commented Jan 8, 2020

liggitt commented Feb 9, 2022

dims commented Feb 9, 2022

pohly commented Feb 9, 2022

pohly commented Mar 23, 2022

Add trace message output #79

Add trace message output #79

Conversation

scaat commented Jul 31, 2019 • edited

k8s-ci-robot commented Jul 31, 2019

k8s-ci-robot commented Jul 31, 2019

scaat commented Jul 31, 2019

scaat commented Jul 31, 2019

hoegaarden commented Jul 31, 2019

hoegaarden commented Aug 5, 2019

scaat commented Aug 5, 2019

hoegaarden commented Aug 5, 2019

dims commented Aug 8, 2019

hoegaarden commented Aug 8, 2019

k8s-ci-robot commented Aug 9, 2019

scaat commented Aug 9, 2019

hoegaarden commented Sep 6, 2019

dims commented Sep 6, 2019

DirectXMan12 commented Sep 9, 2019

hoegaarden commented Sep 20, 2019

scaat commented Sep 20, 2019

hoegaarden commented Oct 4, 2019

fejta-bot commented Jan 2, 2020

hoegaarden commented Jan 7, 2020

dims commented Jan 8, 2020

k8s-ci-robot commented Jan 8, 2020

liggitt commented Feb 9, 2022

dims commented Feb 9, 2022

pohly commented Feb 9, 2022

pohly commented Mar 23, 2022

scaat commented Jul 31, 2019 •

edited