Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes exited messages when leaderelection lost #107724

Merged
merged 1 commit into from Mar 28, 2022

Conversation

kkkkun
Copy link
Member

@kkkkun kkkkun commented Jan 24, 2022

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #107665

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 24, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @kkkkun. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 24, 2022
@k8s-ci-robot k8s-ci-robot added area/cloudprovider sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 24, 2022
@binacs
Copy link
Member

binacs commented Jan 25, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 25, 2022
@kkkkun
Copy link
Member Author

kkkkun commented Jan 25, 2022

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 25, 2022
@MadhavJivrajani
Copy link
Contributor

/remove-kind flake
/kind cleanup
/priority backlog
@pohly FYI, since you had commented on the issue.

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed kind/flake Categorizes issue or PR as related to a flaky test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 25, 2022
@@ -275,7 +275,8 @@ func Run(c *config.CompletedConfig, stopCh <-chan struct{}) error {
run(ctx, startSATokenController, initializersFunc)
},
OnStoppedLeading: func() {
klog.Fatalf("leaderelection lost")
klog.ErrorS(nil, "leaderelection lost")
os.Exit(255)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How quickly do you want to exit here?

Originally I had implemented JSON output handling so that ErrorS flushes buffered info messages automatically. @serathius later removed that because flushing also did an fsync, which caused a performance issue.

If you want to ensure that info messages are written, you have to add klog.Flush before os.Exit, but only if a (probably small) delay is acceptable.

Copy link
Member Author

@kkkkun kkkkun Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I met this case several time.
When it crash, i think the currect message is enough. It doesn't need to print all groutine stacks.

If you want to ensure that info messages are written, you have to add klog.Flush before os.Exit, but only if a (probably small) delay is acceptable.

Added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you take a look at the official migration guide here?
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#replacing-fatal-calls

If we really need a Flush, then we are to add flush everywhere considering how many klog.Fatalf we had previously.

@serathius your input is greatly appreciated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you take a look at the official migration guide here? https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#replacing-fatal-calls

If we really need a Flush, then we are to add flush everywhere considering how many klog.Fatalf we had previously.

@serathius your input is greatly appreciated.

Thanks. I fixed according to #replacing-fatal-calls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really need a Flush, then we are to add flush everywhere considering how many klog.Fatalf we had previously.

The question is whether you want a Flush. Fatalf flushes for 10 seconds before exiting:
https://github.com/kubernetes/klog/blob/d78dad3276f1f4139a6614b08b02e4678d2c3393/klog.go#L1108-L1111

The migration guide doesn't cover that aspect; I don't know whether that was intentional or an oversight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Exit APIs fits the bill

You mean klog.Exitf and friends? That is part of the old-style structured logging API. We need a replacement for that when migrating to structured logging, just as we need a replacement for klog.Fatalf.

Copy link
Member Author

@kkkkun kkkkun Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly Hi. I notice kubernetes/klog#303 merged.
So in here, we should exit by klog.FlushAndExit(klog.ExitFlushTimeout, 1) ?
May be we need a new tag in klog and replace by kubernetes ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in here, we should exit by klog.FlushAndExit(klog.ExitFlushTimeout, 1) ?

Yes, that would be good.

May be we need a new tag in klog and replace by kubernetes ?

I'll get to that as soon as possible. There's still kubernetes/klog#297 pending, then klog should be ready for a new release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get to that as soon as possible. There's still kubernetes/klog#297 pending, then klog should be ready for a new release.

Look forward to it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #108725

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2022
@kkkkun
Copy link
Member Author

kkkkun commented Feb 7, 2022

/assign deads2k
/assign mikedanese
/assign sig-scheduling-maintainers

Could you please approve this ?

@k8s-ci-robot
Copy link
Contributor

@kkkkun: GitHub didn't allow me to assign the following users: sig-scheduling-maintainers.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign deads2k
/assign mikedanese
/assign sig-scheduling-maintainers

Could you please giva a approved ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2022
@kkkkun kkkkun requested a review from jiahuif February 8, 2022 03:05
@kkkkun kkkkun changed the title Fix exit message when leaderelection lost Fixes exited messages when leaderelection lost Feb 8, 2022
@kkkkun
Copy link
Member Author

kkkkun commented Mar 24, 2022

/test pull-kubernetes-unit

@kkkkun
Copy link
Member Author

kkkkun commented Mar 24, 2022

/retest

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 24, 2022
@kkkkun
Copy link
Member Author

kkkkun commented Mar 24, 2022

/retest

@kkkkun
Copy link
Member Author

kkkkun commented Mar 24, 2022

The failed case is not change of this.
add an issue to track #108975

@kkkkun
Copy link
Member Author

kkkkun commented Mar 24, 2022

/test pull-kubernetes-unit

@kkkkun
Copy link
Member Author

kkkkun commented Mar 28, 2022

/assign @liggitt @dims @jiahuif

Please cc

@dims
Copy link
Member

dims commented Mar 28, 2022

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, kkkkun, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2022
@k8s-ci-robot k8s-ci-robot merged commit b5f8d9e into kubernetes:master Mar 28, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-controller-manager yields whole stack trace when it loses leaderelection