Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write custom errors package with stack trace functionality #5239

Merged
merged 19 commits into from May 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
142 changes: 142 additions & 0 deletions pkg/errors/errors.go
@@ -0,0 +1,142 @@
// Copyright (c) The Thanos Authors.
// Licensed under the Apache License 2.0.

//nolint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why nolint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tast project

spellchecker thinks the project name should be Taste instead of the keyword Tast 🙃

// The idea of writing errors package in thanos is highly motivated from the Tast project of Chromium OS Authors. However, instead of
// copying the package, we end up writing our own simplified logic borrowing some ideas from the errors and github.com/pkg/errors.
// A big thanks to all of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to mention Chromium's license here too. I believe we do it for some Prometheus code that we use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the review. That's what I am also wondering too. To be honest, the current program is not an exact replication (here and there I have got rid of some unnecessary codes, interfaces, used string Builder for efficient formatting, changed the return signature of a few functions) of the Chromium OS Package.
Here's their license https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast/LICENSE

What do you think we should do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if we just add a line pointing to this license here, as we don't copy but rather adapt code from Chromium. But I don't feel too strongly about this either way. 🙂
WDYT @matej-g?


// Package errors provides basic utilities to manipulate errors with a useful stacktrace. It combines the
// benefits of errors.New and fmt.Errorf world into a single package.
package errors

import (
//lint:ignore faillint Custom errors package needs to import standard library errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

"errors"
"fmt"
"strings"
)

// base is the fundamental struct that implements the error interface and the acts as the backbone of this errors package.
type base struct {
// info contains the error message passed through calls like errors.Wrap, errors.New.
info string
// stacktrace stores information about the program counters - i.e. where this error was generated.
stack stacktrace
// err is the actual error which is being wrapped with a stacktrace and message information.
err error
}

// Error implements the error interface.
func (b *base) Error() string {
if b.err != nil {
return fmt.Sprintf("%s: %s", b.info, b.err.Error())
}
return b.info
}

// Unwrap implements the error Unwrap interface.
func (b *base) Unwrap() error {
return b.err
}

// Format implements the fmt.Formatter interface to support the formatting of an error chain with "%+v" verb.
// Whenever error is printed with %+v format verb, stacktrace info gets dumped to the output.
func (b *base) Format(s fmt.State, verb rune) {
if verb == 'v' && s.Flag('+') {
s.Write([]byte(formatErrorChain(b)))
return
}

s.Write([]byte(b.Error()))
}

// Newf formats according to a format specifier and returns a new error with a stacktrace
// with recent call frames. Each call to New returns a distinct error value even if the text is
// identical. An alternative of the errors.New function.
//
// If no args have been passed, it is same as `New` function without formatting. Character like
// '%' still has to be escaped in that scenario.
func Newf(format string, args ...interface{}) error {
return &base{
info: fmt.Sprintf(format, args...),
stack: newStackTrace(),
err: nil,
}
}

// Wrapf returns a new error by formatting the error message with the supplied format specifier
// and wrapping another error with a stacktrace containing recent call frames.
//
// If cause is nil, this is the same as fmt.Errorf. If no args have been passed, it is same as `Wrap`
// function without formatting. Character like '%' still has to be escaped in that scenario.
func Wrapf(cause error, format string, args ...interface{}) error {
return &base{
info: fmt.Sprintf(format, args...),
stack: newStackTrace(),
err: cause,
}
}

// Cause returns the result of repeatedly calling the Unwrap method on err, if err's
// type implements an Unwrap method. Otherwise, Cause returns the last encountered error.
// The difference between Unwrap and Cause is the first one performs unwrapping of one level
// but Cause returns the last err (whether it's nil or not) where it failed to assert
// the interface containing the Unwrap method.
// This is a replacement of errors.Cause without the causer interface from pkg/errors which
// actually can be sufficed through the errors.Is function. But considering some use cases
// where we need to peel off all the external layers applied through errors.Wrap family,
// it is useful ( where external SDK doesn't use errors.Is internally).
func Cause(err error) error {
for err != nil {
e, ok := err.(interface {
Unwrap() error
})
if !ok {
return err
}
err = e.Unwrap()
}
return nil
}

// formatErrorChain formats an error chain.
func formatErrorChain(err error) string {
var buf strings.Builder
for err != nil {
if e, ok := err.(*base); ok {
buf.WriteString(fmt.Sprintf("%s\n%v", e.info, e.stack))
err = e.err
} else {
buf.WriteString(fmt.Sprintf("%s\n", err.Error()))
err = nil
}
}
return buf.String()
}

// The functions `Is`, `As` & `Unwrap` provides a thin wrapper around the builtin errors
// package in go. Just for sake of completeness and correct autocompletion behaviors from
// IDEs they have been wrapped using functions instead of using variable to reference them
// as first class functions (eg: var Is = errros.Is ).

// Is is a wrapper of built-in errors.Is. It reports whether any error in err's
// chain matches target. The chain consists of err itself followed by the sequence
// of errors obtained by repeatedly calling Unwrap.
func Is(err, target error) bool {
return errors.Is(err, target)
}
Comment on lines +123 to +128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if alias var Is = errors.Is would not work? (and same for As).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work, updating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry, I have reverted back to the old one.
Reason: Though using var Is = errors.Is would definitely make things work but users will have a slightly poor IDE experience as the ide will treat those functions as variables (as actually, they are in the modified pkg). So the auto-completion works a bit weird way
see the screenshot
image

I dug in a little bit to find an alternative approach to tackle this, but it seems popular packages do use the wrapping while exposing their client-side APIs from the internal packages.
ref: https://github.com/temporalio/sdk-go/blob/fd0d1eb548eb0621a5395581cfe2c418704b007c/client/client.go#L435-L476

I hope you are okay with it. @bwplotka?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but worth adding comment why we did this this way 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure : )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I know why it did not work for you. We can do type Is = errors.Is I think. Let's merge and iterate (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi sure 👍 thanks for the suggestion, we can tackle it on the next pr. On second thought I doubt if Go will allow putting a new type over a function definition (for function signature it can be done).


// As is a wrapper of built-in errors.As. It finds the first error in err's
// chain that matches target, and if one is found, sets target to that error
// value and returns true. Otherwise, it returns false.
func As(err error, target interface{}) bool {
return errors.As(err, target)
}

// Unwrap is a wrapper of built-in errors.Unwrap. Unwrap returns the result of
// calling the Unwrap method on err, if err's type contains an Unwrap method
// returning error. Otherwise, Unwrap returns nil.
func Unwrap(err error) error {
return errors.Unwrap(err)
}
143 changes: 143 additions & 0 deletions pkg/errors/errors_test.go
@@ -0,0 +1,143 @@
// Copyright (c) The Thanos Authors.
// Licensed under the Apache License 2.0.

package errors

import (
//lint:ignore faillint Custom errors package tests need to import standard library errors.
stderrors "errors"
"fmt"
"regexp"
"strconv"
"testing"

"github.com/thanos-io/thanos/pkg/testutil"
)

const msg = "test_error_message"
const wrapper = "test_wrapper"

func TestNewf(t *testing.T) {
err := Newf(msg)
testutil.Equals(t, err.Error(), msg, "the root error message must match")

reg := regexp.MustCompile(msg + `[ \n]+> github\.com\/thanos-io\/thanos\/pkg\/errors\.TestNewf .*\/pkg\/errors\/errors_test\.go:\d+`)
testutil.Equals(t, reg.MatchString(fmt.Sprintf("%+v", err)), true, "matching stacktrace in errors.New")
}

func TestNewfFormatted(t *testing.T) {
fmtMsg := msg + " key=%v"
expectedMsg := msg + " key=value"

err := Newf(fmtMsg, "value")
testutil.Equals(t, err.Error(), expectedMsg, "the root error message must match")
reg := regexp.MustCompile(expectedMsg + `[ \n]+> github\.com\/thanos-io\/thanos\/pkg\/errors\.TestNewfFormatted .*\/pkg\/errors\/errors_test\.go:\d+`)
testutil.Equals(t, reg.MatchString(fmt.Sprintf("%+v", err)), true, "matching stacktrace in errors.New with format string")
}

func TestWrapf(t *testing.T) {
err := Newf(msg)
err = Wrapf(err, wrapper)

expectedMsg := wrapper + ": " + msg
testutil.Equals(t, err.Error(), expectedMsg, "the root error message must match")

reg := regexp.MustCompile(`test_wrapper[ \n]+> github\.com\/thanos-io\/thanos\/pkg\/errors\.TestWrapf .*\/pkg\/errors\/errors_test\.go:\d+
[[:ascii:]]+test_error_message[ \n]+> github\.com\/thanos-io\/thanos\/pkg\/errors\.TestWrapf .*\/pkg\/errors\/errors_test\.go:\d+`)

testutil.Equals(t, reg.MatchString(fmt.Sprintf("%+v", err)), true, "matching stacktrace in errors.Wrap")
}

func TestUnwrap(t *testing.T) {
// test with base error
err := Newf(msg)

for i, tc := range []struct {
err error
expected string
isNil bool
}{
{
// no wrapping
err: err,
isNil: true,
},
{
err: Wrapf(err, wrapper),
expected: "test_error_message",
},
{
err: Wrapf(Wrapf(err, wrapper), wrapper),
expected: "test_wrapper: test_error_message",
},
// check primitives errors
{
err: stderrors.New("std-error"),
isNil: true,
},
{
err: Wrapf(stderrors.New("std-error"), wrapper),
expected: "std-error",
},
{
err: nil,
isNil: true,
},
} {
t.Run("TestCase"+strconv.Itoa(i), func(t *testing.T) {
unwrapped := Unwrap(tc.err)
if tc.isNil {
testutil.Equals(t, unwrapped, nil)
return
}
testutil.Equals(t, unwrapped.Error(), tc.expected, "Unwrap must match expected output")
})
}
}

func TestCause(t *testing.T) {
// test with base error that implements interface containing Unwrap method
err := Newf(msg)

for i, tc := range []struct {
err error
expected string
isNil bool
}{
{
// no wrapping
err: err,
isNil: true,
},
{
err: Wrapf(err, wrapper),
isNil: true,
},
{
err: Wrapf(Wrapf(err, wrapper), wrapper),
isNil: true,
},
// check primitives errors
{
err: stderrors.New("std-error"),
expected: "std-error",
},
{
err: Wrapf(stderrors.New("std-error"), wrapper),
expected: "std-error",
},
{
err: nil,
isNil: true,
},
} {
t.Run("TestCase"+strconv.Itoa(i), func(t *testing.T) {
cause := Cause(tc.err)
if tc.isNil {
testutil.Equals(t, cause, nil)
return
}
testutil.Equals(t, cause.Error(), tc.expected, "Cause must match expected output")
})
}
}
53 changes: 53 additions & 0 deletions pkg/errors/stacktrace.go
@@ -0,0 +1,53 @@
// Copyright (c) The Thanos Authors.
// Licensed under the Apache License 2.0.

package errors

import (
"fmt"
"runtime"
"strings"
)

// stacktrace holds a snapshot of program counters.
type stacktrace []uintptr

// newStackTrace captures a stack trace. It skips first 3 frames to record the
// snapshot of the stack trace at the origin of a particular error. It tries to
// record maximum 16 frames (if available).
func newStackTrace() stacktrace {
const stackDepth = 16 // record maximum 16 frames (if available).

pc := make([]uintptr, stackDepth)
Copy link

@sthaha sthaha Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation has the same issue as pkg/errors i.e. the pc escapes to heap thus the entirety of the 16 uintptr is kept in memory even if only the stack is only 1 call frame deep. I have explained the issue in detail here; based on a real-world scenario.

Please try go build -gcflags='-m -m' ./pkg/errors/stacktrace.go and you would find

pkg/errors/stacktrace.go:21:12: make([]uintptr, stackDepth) escapes to heap:
pkg/errors/stacktrace.go:21:12:   flow: pc = &{storage for make([]uintptr, stackDepth)}:
pkg/errors/stacktrace.go:21:12:     from make([]uintptr, stackDepth) (spill) at pkg/errors/stacktrace.go:21:12
pkg/errors/stacktrace.go:21:12:     from pc := make([]uintptr, stackDepth) (assign) at 

The solution is to allocate the buffer pc on stack like the original implementation and then copy whats needed off it to prevent escaping to heap.

Suggested change
pc := make([]uintptr, stackDepth)
const maxDepth = 16
var pcs [maxDepth]uintptr. // allocate on stack
n := runtime.Callers(3, pcs[:])
st := make(stack, n)
copy(st, pcs[:n])
return st

Another thought is if we really need to hold on to the entire CallFrame until String() is called? At the expense of some compute, we could calculate Stacktrace string immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice nitpick and great catch @sthaha.

the pc escapes to heap thus the entirety of the 16 uintptr is kept in memory even if only the stack is only 1 call frame deep.

Yes, you are right - there is no point in holding an extra chunk of unused heap memory, And I think you are copying to another temporary slice because even though we are returning stacktrace(pc[:n]) the capacity of PC still remains 16 (allocated memory), right?

But this additional copy adds some extra complexity : ) (though it's very minor and negligible as we are dealing with length of 16) From escape analysis the function complexity gets increased from 88 -> 96

./stacktrace.go:20:6: cannot inline newStackTrace: function too complex: cost 96 exceeds budget 80
.
.
./stacktrace.go:20:6: cannot inline newStackTrace: function too complex: cost 88 exceeds budget 80

Since go1.2 we have 3 index slice capability where the third param can define the capacity of the newly created slice.

So just updating the return statement to return stacktrace(pc[:n:n]) would yield the same effect of the suggestion you proposed.

Thanks a lot. That was really awesome.

Another thought is if we really need to hold on to the entire CallFrame until String() is called? At the expense of some compute, we could calculate Stacktrace string immediately.

I think we shouldn't do it for the following reasons

  • Its compute expensive - runtime needs to retrieve caller information
  • More memory usage - the String method is meant for human consumption so naturally, this adds a lot of extra information, text etc compared to the 16 element slice of type unitptr. During error changing it would be worse.
  • Not every error needs the stacktrace. For specific format verbs like %+v stacktrace gets dumped recursively (only then String gets called).

Copy link

@sthaha sthaha Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since go1.2 we have 3 index slice capability where the third param can define the capacity of the newly created slice.

💯 .. Thank you, I wasn't aware :)

I think we shouldn't do it for the following reasons ...

Its compute expensive - runtime needs to retrieve caller information

Have you tried a benchmark?

More memory usage - the String method is meant for human consumption so naturally, this adds a lot of extra information, text etc compared to the 16 element slice of type unitptr. During error changing it would be worse.

What I meant was to only hold on to information that you need than have uintptr which now points to the callframe and IIUC, holding onto the callframe will now prevent GC from cleaning the callframe info until the error is garbage collected as well.

It may be worth seeing the inuse allocation if we don't hold on to the callframes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't do it for the following reasons ...

Its compute expensive - runtime needs to retrieve caller information

Have you tried a benchmark?

IIUC, in the new approach that you are suggesting we should call the String immediately at the expense of an extra call which might not be required if the error is not logged with an "%+v" format verb. So we are adhering to extra computation which might not be of any use. So obviously it will be expensive in time against the current approach as the current one only calls string when it's required.

More memory usage - the String method is meant for human consumption so naturally, this adds a lot of extra information, text etc compared to the 16 element slice of type unitptr. During error chaining it would be worse.

What I meant was to only hold on to information that you need than have uintptr which now points to the callframe and IIUC, holding onto the callframe will now prevent GC from cleaning the callframe info until the error is garbage collected as well.

Now coming to the memory footprint, IMHO storing unstructured data (the stack trace output string) is generally a bad idea. If we change the output string format from fmt.Sprintf("> %s\t%s:%d\n", frame.Func.Name(), frame.File, frame.Line) this to something else memory footprint gets changed. So I think this optimization won't be valid in longer run.

After giving it a thought, I think it's a tradeoff, why?
Memory usage:

  • current approach: 16*4 bytes uintptr + unreleased memory from gc (not sure what it stores on that PC address, might be some function related metadata to populate the callFrames. definitely releases all the resources used inside the function)
  • proposed approach: large unstructured string + still some unreleased memory (not sure when will the next cycle of go GC will kick off and it's dependent on the config).

I am not sure if we can benchmark this anyhow or not.

the error is garbage collected as well.

If you see, in Go world err has the comparatively shortest lifespan than anything else. In a well written program they gets handled immediately by being returned to the caller function or logged into the sink. So I think we are good here. After all it's an internal package and if we found some untoward runtime behaviour we can always do different optimization.

Thank you : )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see, in Go world err has the comparatively shortest lifespan than anything else.

Not always. Check our fixed issue: #5257

Anyway, I asked @sthaha to remind us about the optimization emperor did - to make sure we are aware. As usually we can iterate over it, we won't do it perfectly over time - the main thing is to get APIs as best as possible - it's hard to change them later.

// using skip=3 for not to count the program counter address of
bisakhmondal marked this conversation as resolved.
Show resolved Hide resolved
// 1. the respective function from errors package (eg. errors.New)
// 2. newStacktrace itself
// 3. the function used in runtime.Callers
n := runtime.Callers(3, pc)

// this approach is taken to reduce long term memory footprint (obtained through escape analysis).
// We are returning a new slice by re-slicing the pc with the required length and capacity (when the
// no of returned callFrames is less that stackDepth). This uses less memory compared to pc[:n] as
// the capacity of new slice is inherited from the parent slice if not specified.
return stacktrace(pc[:n:n])
}

// String implements the fmt.Stringer interface to provide formatted text output.
func (s stacktrace) String() string {
var buf strings.Builder

// CallersFrames takes the slice of Program Counter addresses returned by Callers to
// retrieve function/file/line information.
cf := runtime.CallersFrames(s)
for {
// more indicates if the next call will be successful or not.
frame, more := cf.Next()
// used formatting scheme <`>`space><function name><tab><filepath><:><line><newline> for example:
// > testing.tRunner /home/go/go1.17.8/src/testing/testing.go:1259
buf.WriteString(fmt.Sprintf("> %s\t%s:%d\n", frame.Func.Name(), frame.File, frame.Line))
if !more {
break
}
}
return buf.String()
}
30 changes: 30 additions & 0 deletions pkg/errors/stacktrace_test.go
@@ -0,0 +1,30 @@
// Copyright (c) The Thanos Authors.
// Licensed under the Apache License 2.0.

package errors

import (
"strings"
"testing"
)

func caller() stacktrace {
return newStackTrace()
}

func TestStacktraceOutput(t *testing.T) {
st := caller()
expectedPhrase := "/pkg/errors/stacktrace_test.go:16"
if !strings.Contains(st.String(), expectedPhrase) {
t.Fatalf("expected %v phrase into the stacktrace, received stacktrace: \n%v", expectedPhrase, st.String())
}
}

func TestStacktraceProgramCounterLen(t *testing.T) {
st := caller()
output := st.String()
lines := len(strings.Split(strings.TrimSuffix(output, "\n"), "\n"))
if len(st) != lines {
t.Fatalf("output lines vs program counter size mismatch: program counter size %v, output lines %v", len(st), lines)
}
}