-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278
Comments
@dghubble: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @wzshiming |
I don't have more to add than originally posted, but we have recently upgraded to 1.24.7. We have many pods across the fleet that will go into a "Failed" Status, as other replacement pods are created. From "kubectl describe pod":
Version:
|
@mwoodson-coinbasecloud I wrote about these rough edges and solutions in https://www.psdn.io/posts/kubelet-graceful-shutdown/ if its of interest to you |
/remove-kind bug @bobbypage do we have this documented? Cannot find it with the quick google search |
I found documentation on how to delete these pods on GKE docs: https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms#graceful-shutdown May be worth it to copy to the kubernetes.io.
this is questionable. Sometimes this information may be needed. I agree with the sentiment though |
I've seen GKE's workaround note. Although these days GracefulNodeShutdown will generally leave a Pod Completed or Failed (like two of us have mentioned). Its important to look deeper when cleaning up gracefully terminated pods (as opposed to general failed pods) GKE workaround won't do what you want:
You probably want to be looking for the actual message "Pod was terminated in response to imminent node shutdown"
Ultimately, we'd like for Kubernetes to handle cleanup of the gracefully terminated pods. For the same reason CronJobs have job history limits. Or for the same reason a manual drain doesn't leave a bunch of Pods around in failed state. Because in practice, when cluster users look at pods, they don't always need to see every Pod that has ever run in the past on a machine that happened to reboot. With nodes are always updating, over time you end up hundreds of these "terminated in response to imminent node shutdown" Pods lying around. For now, we use our own hacky mechanism to clean up the pods left around by GracefulNodeShutdown. |
We've started to observe this behavior on our recently upgraded clusters after an upgrade on the bundled kube-prometheus-stack alerts (kubernetes-monitoring/kubernetes-mixin#784 Descheduler's RemoveFailedPod can be used to cleanup these terminated pods until something is done in Kubernetes. I opened kubernetes-monitoring/kubernetes-mixin#821 to revert the KubePodNotReady change. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Any updates on this? /remove-lifecycle stale |
What happened?
When the Kubelet GracefulNodeShutdown feature (beta as of v1.25.3) evicts pods before shutdown, it marks them as "Failed" as part of its documented behavior. These failed Pods persist until they're eventually removed by pod garbage collection (default 12500).
Enabling the GracefulNodeShutdown feature means
Failed
pods accumulate on clusters (terminated-pod-gc-threshold
defaults to 12500 pods). These need to be cleaned up manually and at scale its cumbersome. Especially since there isn't always much use in seeing pods that were simply evicted because the node rebooted. We've had to disable the GracefulNodeShutdown feature over this small detail and others have seen the same (@rptaylor kubernetes/enhancements#2000 (comment))Back in #104531 (comment), there was a plan to make the termination behavior configurable. #108991 and #108941 started in that direction, but stalled out. I'm not sure the background cc @bobbypage
@pacoxu can you add to kubernetes/enhancements#2000 (comment)
/sig node
What did you expect to happen?
Is there still a plan to toggle/configure GracefulNodeShutdown termination behavior? The original plan sounded like there would be an option of just evicting the Pods normally (without setting their status to failed or persisting them) would help a lot for adoptability.
Is there another UX approach for avoiding leaving all these failed pods around? GracefulNodeShutdown is so close to being awesome, save for this.
In this usage style, we effectively consider shutdowns a very normal behavior. Pods that are evicted due to shutdown aren't noteworthy or in need of persistence or further investigation.
How can we reproduce it (as minimally and precisely as possible)?
Enable the GracefulNodeShutdown feature in KubeletConfiguration. Shutdown nodes and watch Failed pods accumulate. Its apparently part of the intended behavior at this time.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
AWS, Azure, GCP, DigitalOcean
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: