Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote kubelet graceful node shutdown to beta #99735

Merged
merged 1 commit into from Mar 6, 2021

Conversation

bobbypage
Copy link
Member

@bobbypage bobbypage commented Mar 3, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

Promoting kubelet graceful node shutdown to beta.

  • Change the feature gate from alpha to beta and enable it by default
    • Update a few of the unit tests due to feature gate being enabled by default
  • Small refactor in nodeshutdown_manager which adds featureEnabled function (which checks that feature gate and that kubeletConfig.ShutdownGracePeriod > 0).
    • Use featureEnabled() to exit early from shutdown manager in the case that the feature is disabled
  • Update kubelet config defaulting to be explicit that ShutdownGracePeriod and ShutdownGracePeriodCriticalPods default to zero and update the godoc comments.

With this feature now in beta and the feature gate enabled by default, to enable graceful shutdown all that will be required is to configure ShutdownGracePeriod and ShutdownGracePeriodCriticalPods in the kubelet config. If not configured, they will be defaulted to zero, and graceful shutdown will effectively be disabled.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Enhancement issue: kubernetes/enhancements#2000

Does this PR introduce a user-facing change?

Kubelet Graceful Node Shutdown feature is now beta. 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2000-graceful-node-shutdown/README.md

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 3, 2021
@bobbypage
Copy link
Member Author

/triage accepted
/sig node
/cc @SergeyKanzhelev @mrunalp @wzshiming

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 3, 2021
@bobbypage
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2021
@SergeyKanzhelev
Copy link
Member

/priority important-soon
/triage accepted
/lgtm

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 3, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2021
@pacoxu
Copy link
Member

pacoxu commented Mar 3, 2021

/retest
a kubectl timeout

@wzshiming
Copy link
Member

/lgtm

@ehashman ehashman added this to Needs Approver in SIG Node PR Triage Mar 4, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 5, 2021
@bobbypage
Copy link
Member Author

/assign @thockin

for api review (there's a few small changes)

/assign @derekwaynecarr

for approvals

@derekwaynecarr
Copy link
Member

thanks for following up from slack

/approve
/lgtm

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the apis part Seems OK

@@ -379,10 +379,10 @@ type KubeletConfiguration struct {
// EnableSystemLogHandler enables /logs handler.
EnableSystemLogHandler bool
// ShutdownGracePeriod specifies the total duration that the node should delay the shutdown and total grace period for pod termination during a node shutdown.
// Defaults to 30 seconds, requires GracefulNodeShutdown feature gate to be enabled.
// Defaults to 0 seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding a new +featureGate=GracefulNodeShutdown tag - if you put it here, later tooling will use it automatically.

Copy link
Member Author

@bobbypage bobbypage Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I can add that.

Should the featureGate=GracefulNodeShutdown tag be added here or in https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubelet/config/v1beta1/types.go (where I see there's already existing tags like +optional)?

@@ -252,4 +252,10 @@ func SetDefaults_KubeletConfiguration(obj *kubeletconfigv1beta1.KubeletConfigura
if obj.EnableDebugFlagsHandler == nil {
obj.EnableDebugFlagsHandler = utilpointer.BoolPtr(true)
}
if obj.ShutdownGracePeriod == zeroDuration {
obj.ShutdownGracePeriod = metav1.Duration{Duration: 0}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is metav1.Duration{Duration: 0} different from zeroDuration ? They seem identical to me?

Copy link
Member Author

@bobbypage bobbypage Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the same, I just added it to be explicit that we actually want to the default value here (since all the existing kubelet config time duration fields compared against zeroDuration here and set the appropriate default).

So I think it can be dropped, but was thinking would be better to be explicit about the default. Let me know what you think.

@thockin
Copy link
Member

thockin commented Mar 5, 2021 via email

@thockin
Copy link
Member

thockin commented Mar 5, 2021 via email

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2021
- Change the feature gate from alpha to beta and enable it by default

- Update a few of the unit tests due to feature gate being enabled by
  default

- Small refactor in `nodeshutdown_manager` which adds `featureEnabled`
  function (which checks that feature gate and that
  `kubeletConfig.ShutdownGracePeriod > 0`).

- Use `featureEnabled()` to exit early from shutdown manager in the case
  that the feature is disabled

- Update kubelet config defaulting to be explicit that
  `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` default to
  zero and update the godoc comments.

- Update defaults and add featureGate tag in api config godoc.

With this feature now in beta and the feature gate enabled by default,
to enable graceful shutdown all that will be required is to configure
`ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` in the
kubelet config. If not configured, they will be defaulted to zero, and
graceful shutdown will effectively be disabled.
@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 5, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobbypage, derekwaynecarr, mrunalp, thockin, wzshiming

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bobbypage
Copy link
Member Author

bobbypage commented Mar 5, 2021

Updated PR by adding +featureGate=GracefulNodeShutdown tag and removal of the explicit time duration defaulting

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 5, 2021
@bobbypage
Copy link
Member Author

thanks all!

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 6, 2021
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants