Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structured logging: replacing Fatal/Exit/etc. without loss of flushing #303

Merged
merged 1 commit into from Mar 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
69 changes: 69 additions & 0 deletions exit.go
@@ -0,0 +1,69 @@
// Go support for leveled logs, analogous to https://code.google.com/p/google-glog/
//
// Copyright 2013 Google Inc. All Rights Reserved.
dims marked this conversation as resolved.
Show resolved Hide resolved
// Copyright 2022 The Kubernetes Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package klog

import (
"fmt"
"os"
"time"
)

var (

// ExitFlushTimeout is the timeout that klog has traditionally used during
// calls like Fatal or Exit when flushing log data right before exiting.
// Applications that replace those calls and do not have some specific
// requirements like "exit immediately" can use this value as parameter
// for FlushAndExit.
//
// Can be set for testing purpose or to change the application's
// default.
ExitFlushTimeout = 10 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are end users able to modify these directly? they shouldn't right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a bit need for modifying it, but also no good reason why we shouldn't allow it. Being able to modify them is mostly (just?) useful in tests.

This timeout can become a constant.

The OsExit function has to be a variable, but we could unexport it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of OsExit, if we allow modifying it, should Fatal and friends also call it?

There is a feature request where someone is asking for Fatal to not call os.Exit. We cannot change the default behavior and not call os.Exit (that behavior is part of the stable API), but if we export OsExit and use it in Fatal, then someone who knows his application can perhaps change it into a function which panics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a version that uses OsExit everywhere instead of calling os.Exit directly and also updated the comments: https://github.com/kubernetes/klog/compare/dbc5aa051e765a4216e0fb89e0b54fae92ad8ab0..56b8a4ea35ba0fc3be9fc82182e71c2f0116adc1


// OsExit is the function called by FlushAndExit to terminate the program.
//
// Can be set for testing purpose or to change the application's
// default behavior. Note that the function should not simply return
// because callers of functions like Fatal will not expect that.
OsExit = os.Exit
)

// FlushAndExit flushes log data for a certain amount of time and then calls
// os.Exit. Combined with some logging call it provides a replacement for
// traditional calls like Fatal or Exit.
func FlushAndExit(flushTimeout time.Duration, exitCode int) {
timeoutFlush(flushTimeout)
OsExit(exitCode)
}

// timeoutFlush calls Flush and returns when it completes or after timeout
// elapses, whichever happens first. This is needed because the hooks invoked
// by Flush may deadlock when klog.Fatal is called from a hook that holds
// a lock. Flushing also might take too long.
func timeoutFlush(timeout time.Duration) {
done := make(chan bool, 1)
go func() {
Flush() // calls logging.lockAndFlushAll()
done <- true
}()
select {
case <-done:
case <-time.After(timeout):
fmt.Fprintln(os.Stderr, "klog: Flush took longer than", timeout)
}
}
47 changes: 47 additions & 0 deletions exit_test.go
@@ -0,0 +1,47 @@
// Copyright 2022 The Kubernetes Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package klog_test

import (
"flag"
"fmt"
"os"

"k8s.io/klog/v2"
)

func ExampleFlushAndExit() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we can't have a TestFlushAndExit (Similar to functions in TestErrorS in klog_test.go) ?

We can check against the following format.

format: "\"exiting...\"\nos.Exit(10)"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have such a test function, but an example is more useful:

  • it gets tested by go test
  • it gets included in the documentation

// Set up klog so that we can test it below.
var fs flag.FlagSet
klog.InitFlags(&fs)
fs.Set("skip_headers", "true")
defer flag.Set("skip_headers", "false")
klog.SetOutput(os.Stdout)
defer klog.SetOutput(nil)
klog.OsExit = func(exitCode int) {
fmt.Printf("os.Exit(%d)\n", exitCode)
}

// If we were to return or exit without flushing, this message would
// get lost because it is buffered in memory by klog when writing to
// files. Output to stderr is not buffered.
klog.InfoS("exiting...")
exitCode := 10
klog.FlushAndExit(klog.ExitFlushTimeout, exitCode)

// Output:
// "exiting..."
// os.Exit(10)
}
39 changes: 11 additions & 28 deletions klog.go
Expand Up @@ -847,8 +847,8 @@ func (l *loggingT) output(s severity.Severity, log *logr.Logger, buf *buffer.Buf
if atomic.LoadUint32(&fatalNoStacks) > 0 {
l.mu.Unlock()
isLocked = false
timeoutFlush(10 * time.Second)
os.Exit(1)
timeoutFlush(ExitFlushTimeout)
OsExit(1)
}
// Dump all goroutine stacks before exiting.
trace := stacks(true)
Expand All @@ -865,8 +865,8 @@ func (l *loggingT) output(s severity.Severity, log *logr.Logger, buf *buffer.Buf
}
l.mu.Unlock()
isLocked = false
timeoutFlush(10 * time.Second)
os.Exit(255) // C++ uses -1, which is silly because it's anded(&) with 255 anyway.
timeoutFlush(ExitFlushTimeout)
OsExit(255) // C++ uses -1, which is silly because it's anded with 255 anyway.
}
l.bufferCache.PutBuffer(buf)

Expand All @@ -876,23 +876,6 @@ func (l *loggingT) output(s severity.Severity, log *logr.Logger, buf *buffer.Buf
}
}

// timeoutFlush calls Flush and returns when it completes or after timeout
// elapses, whichever happens first. This is needed because the hooks invoked
// by Flush may deadlock when klog.Fatal is called from a hook that holds
// a lock.
func timeoutFlush(timeout time.Duration) {
done := make(chan bool, 1)
go func() {
Flush() // calls logging.lockAndFlushAll()
done <- true
}()
select {
case <-done:
case <-time.After(timeout):
fmt.Fprintln(os.Stderr, "klog: Flush took longer than", timeout)
}
}

// stacks is a wrapper for runtime.Stack that attempts to recover the data for all goroutines.
func stacks(all bool) []byte {
// We don't know how big the traces are, so grow a few times if they don't fit. Start large, though.
Expand Down Expand Up @@ -929,7 +912,7 @@ func (l *loggingT) exit(err error) {
return
}
l.flushAll()
os.Exit(2)
OsExit(2)
}

// syncBuffer joins a bufio.Writer to its underlying file, providing access to the
Expand Down Expand Up @@ -1518,7 +1501,7 @@ func ErrorSDepth(depth int, err error, msg string, keysAndValues ...interface{})
}

// Fatal logs to the FATAL, ERROR, WARNING, and INFO logs,
// including a stack trace of all running goroutines, then calls os.Exit(255).
// including a stack trace of all running goroutines, then calls OsExit(255).
// Arguments are handled in the manner of fmt.Print; a newline is appended if missing.
func Fatal(args ...interface{}) {
logging.print(severity.FatalLog, logging.logr, logging.filter, args...)
Expand All @@ -1531,7 +1514,7 @@ func FatalDepth(depth int, args ...interface{}) {
}

// Fatalln logs to the FATAL, ERROR, WARNING, and INFO logs,
// including a stack trace of all running goroutines, then calls os.Exit(255).
// including a stack trace of all running goroutines, then calls OsExit(255).
// Arguments are handled in the manner of fmt.Println; a newline is always appended.
func Fatalln(args ...interface{}) {
logging.println(severity.FatalLog, logging.logr, logging.filter, args...)
Expand All @@ -1544,7 +1527,7 @@ func FatallnDepth(depth int, args ...interface{}) {
}

// Fatalf logs to the FATAL, ERROR, WARNING, and INFO logs,
// including a stack trace of all running goroutines, then calls os.Exit(255).
// including a stack trace of all running goroutines, then calls OsExit(255).
// Arguments are handled in the manner of fmt.Printf; a newline is appended if missing.
func Fatalf(format string, args ...interface{}) {
logging.printf(severity.FatalLog, logging.logr, logging.filter, format, args...)
Expand All @@ -1560,7 +1543,7 @@ func FatalfDepth(depth int, format string, args ...interface{}) {
// It allows Exit and relatives to use the Fatal logs.
var fatalNoStacks uint32

// Exit logs to the FATAL, ERROR, WARNING, and INFO logs, then calls os.Exit(1).
// Exit logs to the FATAL, ERROR, WARNING, and INFO logs, then calls OsExit(1).
// Arguments are handled in the manner of fmt.Print; a newline is appended if missing.
func Exit(args ...interface{}) {
atomic.StoreUint32(&fatalNoStacks, 1)
Expand All @@ -1574,7 +1557,7 @@ func ExitDepth(depth int, args ...interface{}) {
logging.printDepth(severity.FatalLog, logging.logr, logging.filter, depth, args...)
}

// Exitln logs to the FATAL, ERROR, WARNING, and INFO logs, then calls os.Exit(1).
// Exitln logs to the FATAL, ERROR, WARNING, and INFO logs, then calls OsExit(1).
func Exitln(args ...interface{}) {
atomic.StoreUint32(&fatalNoStacks, 1)
logging.println(severity.FatalLog, logging.logr, logging.filter, args...)
Expand All @@ -1587,7 +1570,7 @@ func ExitlnDepth(depth int, args ...interface{}) {
logging.printlnDepth(severity.FatalLog, logging.logr, logging.filter, depth, args...)
}

// Exitf logs to the FATAL, ERROR, WARNING, and INFO logs, then calls os.Exit(1).
// Exitf logs to the FATAL, ERROR, WARNING, and INFO logs, then calls OsExit(1).
// Arguments are handled in the manner of fmt.Printf; a newline is appended if missing.
func Exitf(format string, args ...interface{}) {
atomic.StoreUint32(&fatalNoStacks, 1)
Expand Down
3 changes: 3 additions & 0 deletions klog_test.go
Expand Up @@ -1506,6 +1506,7 @@ func TestCallDepthLogrInfoS(t *testing.T) {
logger.resetCallDepth()
l := logr.New(logger)
SetLogger(l)
defer ClearLogger()

// Add wrapper to ensure callDepthTestLogr +2 offset is correct.
logFunc := func() {
Expand All @@ -1528,6 +1529,7 @@ func TestCallDepthLogrErrorS(t *testing.T) {
logger.resetCallDepth()
l := logr.New(logger)
SetLogger(l)
defer ClearLogger()

// Add wrapper to ensure callDepthTestLogr +2 offset is correct.
logFunc := func() {
Expand All @@ -1550,6 +1552,7 @@ func TestCallDepthLogrGoLog(t *testing.T) {
logger.resetCallDepth()
l := logr.New(logger)
SetLogger(l)
defer ClearLogger()
CopyStandardLogTo("INFO")

// Add wrapper to ensure callDepthTestLogr +2 offset is correct.
Expand Down