kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

dghubble · 2022-10-23T02:30:30Z

What happened?

When the Kubelet GracefulNodeShutdown feature (beta as of v1.25.3) evicts pods before shutdown, it marks them as "Failed" as part of its documented behavior. These failed Pods persist until they're eventually removed by pod garbage collection (default 12500).

Status:       Failed
Reason:       Terminated
Message:      Pod was terminated in response to imminent node shutdown.

Enabling the GracefulNodeShutdown feature means Failed pods accumulate on clusters (terminated-pod-gc-threshold defaults to 12500 pods). These need to be cleaned up manually and at scale its cumbersome. Especially since there isn't always much use in seeing pods that were simply evicted because the node rebooted. We've had to disable the GracefulNodeShutdown feature over this small detail and others have seen the same (@rptaylor kubernetes/enhancements#2000 (comment))

Back in #104531 (comment), there was a plan to make the termination behavior configurable. #108991 and #108941 started in that direction, but stalled out. I'm not sure the background cc @bobbypage

@pacoxu can you add to kubernetes/enhancements#2000 (comment)
/sig node

What did you expect to happen?

Is there still a plan to toggle/configure GracefulNodeShutdown termination behavior? The original plan sounded like there would be an option of just evicting the Pods normally (without setting their status to failed or persisting them) would help a lot for adoptability.

Is there another UX approach for avoiding leaving all these failed pods around? GracefulNodeShutdown is so close to being awesome, save for this.

In this usage style, we effectively consider shutdowns a very normal behavior. Pods that are evicted due to shutdown aren't noteworthy or in need of persistence or further investigation.

How can we reproduce it (as minimally and precisely as possible)?

Enable the GracefulNodeShutdown feature in KubeletConfiguration. Shutdown nodes and watch Failed pods accumulate. Its apparently part of the intended behavior at this time.

shutdownGracePeriod: 45s
shutdownGracePeriodCriticalPods: 30s

Anything else we need to know?

No response

Kubernetes version

Kubernetes v1.25.3

Cloud provider

AWS, Azure, GCP, DigitalOcean

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-10-23T02:30:37Z

@dghubble: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pacoxu · 2022-10-23T12:29:19Z

/cc @wzshiming

mwoodson-cb · 2022-11-01T18:13:23Z

I don't have more to add than originally posted, but we have recently upgraded to 1.24.7. We have many pods across the fleet that will go into a "Failed" Status, as other replacement pods are created.

From "kubectl describe pod":

Status:           Failed
Reason:           Terminated
Message:          Pod was terminated in response to imminent node shutdown.

Version:

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.7", GitCommit:"e6f35974b08862a23e7f4aad8e5d7f7f2de26c15", GitTreeState:"clean", BuildDate:"2022-10-12T10:50:21Z", GoVersion:"go1.18.7", Compiler:"gc", Platform:"linux/amd64"}

dghubble · 2022-11-01T22:09:51Z

@mwoodson-coinbasecloud I wrote about these rough edges and solutions in https://www.psdn.io/posts/kubelet-graceful-shutdown/ if its of interest to you

SergeyKanzhelev · 2022-11-09T18:56:59Z

/remove-kind bug
/kind feature

@bobbypage do we have this documented? Cannot find it with the quick google search

SergeyKanzhelev · 2022-11-09T19:17:01Z

I found documentation on how to delete these pods on GKE docs: https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms#graceful-shutdown May be worth it to copy to the kubernetes.io.

In this usage style, we effectively consider shutdowns a very normal behavior. Pods that are evicted due to shutdown aren't noteworthy or in need of persistence or further investigation.

this is questionable. Sometimes this information may be needed. I agree with the sentiment though

dghubble · 2022-12-04T03:00:02Z

I've seen GKE's workaround note. Although these days GracefulNodeShutdown will generally leave a Pod Completed or Failed (like two of us have mentioned). Its important to look deeper when cleaning up gracefully terminated pods (as opposed to general failed pods)

GKE workaround won't do what you want:

kubectl get pods --all-namespaces | grep -i NodeShutdown | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
kubectl get pods --all-namespaces | grep -i Terminated | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n

You probably want to be looking for the actual message "Pod was terminated in response to imminent node shutdown"

kubectl get pods --all-namespaces -o go-template='{{range .items}}{{printf "%s %s %s\n" .metadata.namespace .metadata.name .status.message}}{{end}}' | grep -i "Pod was terminated in response to imminent node shutdown" | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n

Ultimately, we'd like for Kubernetes to handle cleanup of the gracefully terminated pods. For the same reason CronJobs have job history limits. Or for the same reason a manual drain doesn't leave a bunch of Pods around in failed state. Because in practice, when cluster users look at pods, they don't always need to see every Pod that has ever run in the past on a machine that happened to reboot. With nodes are always updating, over time you end up hundreds of these "terminated in response to imminent node shutdown" Pods lying around.

For now, we use our own hacky mechanism to clean up the pods left around by GracefulNodeShutdown.

aslafy-z · 2023-01-27T10:15:37Z

We've started to observe this behavior on our recently upgraded clusters after an upgrade on the bundled kube-prometheus-stack alerts (kubernetes-monitoring/kubernetes-mixin#784
, prometheus-operator/kube-prometheus#1877, prometheus-community/helm-charts#2410) was done.

Descheduler's RemoveFailedPod can be used to cleanup these terminated pods until something is done in Kubernetes.
Shouldn't these pods be set as Evicted instead of Completed/Failed?

I opened kubernetes-monitoring/kubernetes-mixin#821 to revert the KubePodNotReady change.

k8s-triage-robot · 2023-04-27T10:55:49Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rptaylor · 2023-04-29T10:31:56Z

/remove-lifecycle stale

k8s-triage-robot · 2024-01-19T09:59:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

dghubble · 2024-01-19T15:56:22Z

/remove-lifecycle stale

k8s-triage-robot · 2024-04-18T16:22:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rptaylor · 2024-04-18T17:52:49Z

Any updates on this?

/remove-lifecycle stale

dghubble added the kind/bug Categorizes issue or PR as related to a bug. label Oct 23, 2022

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 23, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 23, 2022

dghubble mentioned this issue Oct 23, 2022

[Node Graceful Shutdown] kubelet sometimes doesn't finish kill pods before node shutdown within 30s and left the pods in running state still #110755

Open

pacoxu added this to Triage in SIG Node Bugs Oct 24, 2022

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 9, 2022

SergeyKanzhelev removed this from Triage in SIG Node Bugs Dec 14, 2022

aslafy-z mentioned this issue Jan 27, 2023

fix: revert failed pods marked as NotReady kubernetes-monitoring/kubernetes-mixin#821

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024

kwilczynski mentioned this issue May 13, 2024

GracefulNodeShutdown fail to update Pod status for system critical pods. #124448

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

dghubble commented Oct 23, 2022 •

edited

k8s-ci-robot commented Oct 23, 2022

pacoxu commented Oct 23, 2022

mwoodson-cb commented Nov 1, 2022

dghubble commented Nov 1, 2022

SergeyKanzhelev commented Nov 9, 2022

SergeyKanzhelev commented Nov 9, 2022

dghubble commented Dec 4, 2022 •

edited

aslafy-z commented Jan 27, 2023 •

edited

k8s-triage-robot commented Apr 27, 2023

rptaylor commented Apr 29, 2023

k8s-triage-robot commented Jan 19, 2024

dghubble commented Jan 19, 2024

k8s-triage-robot commented Apr 18, 2024

rptaylor commented Apr 18, 2024

kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

Comments

dghubble commented Oct 23, 2022 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Oct 23, 2022

pacoxu commented Oct 23, 2022

mwoodson-cb commented Nov 1, 2022

dghubble commented Nov 1, 2022

SergeyKanzhelev commented Nov 9, 2022

SergeyKanzhelev commented Nov 9, 2022

dghubble commented Dec 4, 2022 • edited

aslafy-z commented Jan 27, 2023 • edited

k8s-triage-robot commented Apr 27, 2023

rptaylor commented Apr 29, 2023

k8s-triage-robot commented Jan 19, 2024

dghubble commented Jan 19, 2024

k8s-triage-robot commented Apr 18, 2024

rptaylor commented Apr 18, 2024

dghubble commented Oct 23, 2022 •

edited

dghubble commented Dec 4, 2022 •

edited

aslafy-z commented Jan 27, 2023 •

edited