Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

Open
dghubble opened this issue Oct 23, 2022 · 14 comments
Open

kubelet: Configurable failed/term mode with GracefulNodeShutdown #113278

dghubble opened this issue Oct 23, 2022 · 14 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@dghubble
Copy link
Contributor

dghubble commented Oct 23, 2022

What happened?

When the Kubelet GracefulNodeShutdown feature (beta as of v1.25.3) evicts pods before shutdown, it marks them as "Failed" as part of its documented behavior. These failed Pods persist until they're eventually removed by pod garbage collection (default 12500).

Status:       Failed
Reason:       Terminated
Message:      Pod was terminated in response to imminent node shutdown.

Enabling the GracefulNodeShutdown feature means Failed pods accumulate on clusters (terminated-pod-gc-threshold defaults to 12500 pods). These need to be cleaned up manually and at scale its cumbersome. Especially since there isn't always much use in seeing pods that were simply evicted because the node rebooted. We've had to disable the GracefulNodeShutdown feature over this small detail and others have seen the same (@rptaylor kubernetes/enhancements#2000 (comment))

Back in #104531 (comment), there was a plan to make the termination behavior configurable. #108991 and #108941 started in that direction, but stalled out. I'm not sure the background cc @bobbypage

@pacoxu can you add to kubernetes/enhancements#2000 (comment)
/sig node

What did you expect to happen?

Is there still a plan to toggle/configure GracefulNodeShutdown termination behavior? The original plan sounded like there would be an option of just evicting the Pods normally (without setting their status to failed or persisting them) would help a lot for adoptability.

Is there another UX approach for avoiding leaving all these failed pods around? GracefulNodeShutdown is so close to being awesome, save for this.

In this usage style, we effectively consider shutdowns a very normal behavior. Pods that are evicted due to shutdown aren't noteworthy or in need of persistence or further investigation.

How can we reproduce it (as minimally and precisely as possible)?

Enable the GracefulNodeShutdown feature in KubeletConfiguration. Shutdown nodes and watch Failed pods accumulate. Its apparently part of the intended behavior at this time.

shutdownGracePeriod: 45s
shutdownGracePeriodCriticalPods: 30s

Anything else we need to know?

No response

Kubernetes version

Kubernetes v1.25.3

Cloud provider

AWS, Azure, GCP, DigitalOcean

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@dghubble dghubble added the kind/bug Categorizes issue or PR as related to a bug. label Oct 23, 2022
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 23, 2022
@k8s-ci-robot
Copy link
Contributor

@dghubble: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pacoxu
Copy link
Member

pacoxu commented Oct 23, 2022

/cc @wzshiming

@pacoxu pacoxu added this to Triage in SIG Node Bugs Oct 24, 2022
@mwoodson-cb
Copy link

I don't have more to add than originally posted, but we have recently upgraded to 1.24.7. We have many pods across the fleet that will go into a "Failed" Status, as other replacement pods are created.

From "kubectl describe pod":

Status:           Failed
Reason:           Terminated
Message:          Pod was terminated in response to imminent node shutdown.

Version:

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.7", GitCommit:"e6f35974b08862a23e7f4aad8e5d7f7f2de26c15", GitTreeState:"clean", BuildDate:"2022-10-12T10:50:21Z", GoVersion:"go1.18.7", Compiler:"gc", Platform:"linux/amd64"}

@dghubble
Copy link
Contributor Author

dghubble commented Nov 1, 2022

@mwoodson-coinbasecloud I wrote about these rough edges and solutions in https://www.psdn.io/posts/kubelet-graceful-shutdown/ if its of interest to you

@SergeyKanzhelev
Copy link
Member

/remove-kind bug
/kind feature

@bobbypage do we have this documented? Cannot find it with the quick google search

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 9, 2022
@SergeyKanzhelev
Copy link
Member

I found documentation on how to delete these pods on GKE docs: https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms#graceful-shutdown May be worth it to copy to the kubernetes.io.

In this usage style, we effectively consider shutdowns a very normal behavior. Pods that are evicted due to shutdown aren't noteworthy or in need of persistence or further investigation.

this is questionable. Sometimes this information may be needed. I agree with the sentiment though

@dghubble
Copy link
Contributor Author

dghubble commented Dec 4, 2022

I've seen GKE's workaround note. Although these days GracefulNodeShutdown will generally leave a Pod Completed or Failed (like two of us have mentioned). Its important to look deeper when cleaning up gracefully terminated pods (as opposed to general failed pods)

GKE workaround won't do what you want:

kubectl get pods --all-namespaces | grep -i NodeShutdown | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
kubectl get pods --all-namespaces | grep -i Terminated | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n

You probably want to be looking for the actual message "Pod was terminated in response to imminent node shutdown"

kubectl get pods --all-namespaces -o go-template='{{range .items}}{{printf "%s %s %s\n" .metadata.namespace .metadata.name .status.message}}{{end}}' | grep -i "Pod was terminated in response to imminent node shutdown" | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n

Ultimately, we'd like for Kubernetes to handle cleanup of the gracefully terminated pods. For the same reason CronJobs have job history limits. Or for the same reason a manual drain doesn't leave a bunch of Pods around in failed state. Because in practice, when cluster users look at pods, they don't always need to see every Pod that has ever run in the past on a machine that happened to reboot. With nodes are always updating, over time you end up hundreds of these "terminated in response to imminent node shutdown" Pods lying around.

For now, we use our own hacky mechanism to clean up the pods left around by GracefulNodeShutdown.

@SergeyKanzhelev SergeyKanzhelev removed this from Triage in SIG Node Bugs Dec 14, 2022
@aslafy-z
Copy link

aslafy-z commented Jan 27, 2023

We've started to observe this behavior on our recently upgraded clusters after an upgrade on the bundled kube-prometheus-stack alerts (kubernetes-monitoring/kubernetes-mixin#784
, prometheus-operator/kube-prometheus#1877, prometheus-community/helm-charts#2410) was done.

Descheduler's RemoveFailedPod can be used to cleanup these terminated pods until something is done in Kubernetes.
Shouldn't these pods be set as Evicted instead of Completed/Failed?

I opened kubernetes-monitoring/kubernetes-mixin#821 to revert the KubePodNotReady change.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2023
@rptaylor
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024
@dghubble
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024
@rptaylor
Copy link

Any updates on this?

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants