Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎁 Deterministic exit code for zap's Fatal for sharedmain #2496

Closed

Conversation

cardil
Copy link
Contributor

@cardil cardil commented Apr 22, 2022

Changes

  • 🎁 Deterministic exit code for zap's Fatal for sharedmain

/kind enhancement

Fixes #2495

@knative-prow knative-prow bot added kind/enhancement size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 22, 2022
@knative-prow
Copy link

knative-prow bot commented Apr 22, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cardil
To complete the pull request process, please assign julz after the PR has been reviewed.
You can assign the PR to them by writing /assign @julz in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cardil cardil force-pushed the feature/retcode-from-error-message branch from 8e3b800 to e5e9a4a Compare April 22, 2022 16:38
@codecov
Copy link

codecov bot commented Apr 22, 2022

Codecov Report

Merging #2496 (e5e9a4a) into main (12be060) will increase coverage by 0.02%.
The diff coverage is 80.00%.

❗ Current head e5e9a4a differs from pull request most recent head 682c51d. Consider uploading reports for the commit 682c51d to get more accurate results

@@            Coverage Diff             @@
##             main    #2496      +/-   ##
==========================================
+ Coverage   81.71%   81.74%   +0.02%     
==========================================
  Files         163      165       +2     
  Lines        9653     9688      +35     
==========================================
+ Hits         7888     7919      +31     
- Misses       1529     1533       +4     
  Partials      236      236              
Impacted Files Coverage Δ
logging/exiting_zapcore.go 76.66% <76.66%> (ø)
system/posix/retcode/retcode.go 100.00% <100.00%> (ø)
controller/controller.go 88.78% <0.00%> (+0.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12be060...682c51d. Read the comment docs.

@cardil cardil force-pushed the feature/retcode-from-error-message branch from e5e9a4a to 682c51d Compare April 22, 2022 17:12
@cardil
Copy link
Contributor Author

cardil commented Apr 25, 2022

Github's Unit test fail looks like an unrelated flake.

@cardil
Copy link
Contributor Author

cardil commented Apr 26, 2022

/assign @knative/technical-oversight-committee

@cardil
Copy link
Contributor Author

cardil commented Apr 26, 2022

/assign @dprotaso
/assign @rhuss
/assign @julz
/assign @vagababov
/assign @vaikas
/assign @knative/knative-release-leads
/assign @nader-ziada
/assign @ikvmw

@@ -282,7 +282,7 @@ func MainWithConfig(ctx context.Context, component string, cfg *rest.Config, cto
}

func flush(logger *zap.SugaredLogger) {
logger.Sync()
_ = logger.Sync() // we care about stdout and stderr only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change

@dprotaso
Copy link
Member

zap's godoc says the Fatal exit code is 1

See: https://pkg.go.dev/go.uber.org/zap#Logger.Fatal

Are you seeing something different?

@dprotaso
Copy link
Member

dprotaso commented Apr 26, 2022

I'm not particularly sold on this change - as the exit code is dependent on the log message.

Plus a range of exit codes are meant to reserved:
https://man7.org/linux/man-pages/man1/exit.1p.html#RATIONALE

Comment on lines +51 to +53
if entry.Level >= zapcore.DPanicLevel {
return ce.AddCore(entry, r)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this separately from the base?

Comment on lines +58 to +60
if err := r.base.Write(entry, fields); err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other implementations won't exit during Write, correct?

})

type retcodeCore struct {
base zapcore.Core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much would you get for free from embedding zapcore.Core?

type retcodeCore struct {
  zapcore.Core
  fields []zapcore.Field
}


type retcodeCore struct {
base zapcore.Core
fields []zapcore.Field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on why we're tracking the fields here specially?

}
code := r.calculateRetcode(entry, fields)
_ = r.Sync()
exit.Exit(code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to call go.uber.org/zap/internal/exit to enable working with zap's testing spy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've copied the zap spy into the pkg codebase, though I don't understand why we need that rather than adopting / improving the zap one.


import "hash/crc32"

// Calc will calculate an POSIX retcode from an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we want to do this -- it looks like we're collapsing all errors into a range of code 1-255.

https://unix.stackexchange.com/a/418802 suggests that the range 1-125 might be more reasonable, or 1 - (2^31-1) if you're willing to work with waitid and some of the newer posix calls.

)

func TestCalc(t *testing.T) {
cases := testCases()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inline this declaration (as we do elsewhere)?

cases := []struct {
  name string
  err error
  want int
}{{
  name: "nil",
  err: nil,
  want: 0,
}, {
  ...
}}

Comment on lines +53 to +54
assert.Check(t, cmp.Contains(logs, `"message":"foo"`))
assert.Check(t, cmp.Contains(logs, `"error":"bar"`))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned about adding a new dependency here to save a line of code:

if ! strings.Contains(logs, `"message":"foo"`) {
  t.Errorf("Expected %q to contain message tag", logs)
}

(To put it simply, the cost of adding the dependency seems higher than the value that it adds in terms of clarity.)

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2022
@knative-prow-robot
Copy link
Contributor

@cardil: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rhuss
Copy link
Contributor

rhuss commented Jun 30, 2022

Where do we stand with this PR ? Seems to be dormant since quite some time.

@dprotaso
Copy link
Member

dprotaso commented Jun 30, 2022

I don't think we should merge this change - changing the exit code based on log messages I think will just make things more confusing as people google exit code values and they have different meaning.

We should probably write the correct log line to the k8s termination path

https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/

@dprotaso
Copy link
Member

Going to close this out

@dprotaso dprotaso closed this Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For sharedmain zap's Fatal should deterministically calculate the POSIX return code
10 participants