Pods fail with "NodeAffinity failed" after kubelet restarts #100467

ruiwen-zhao · 2021-03-23T00:27:36Z

What happened:

The issue is basically same as #92067.

With the fix #94087 in place, kubelet will node lister to sync in GetNode().

However, in the case of kubelet restart, the pods scheduled on the node before the restart might still fail with "NodeAffinity failed" after the restart. Looking at the code, this is probably because the admit pod check (canAdmitPod()) might happen before GetNode().

What you expected to happen:

After kubelet restart, old pods (pods scheduled on the node before the restart) do not see "NodeAffinity failed".

How to reproduce it (as minimally and precisely as possible):

This issue does not happen all the time. To reproduce it, you will need to keep restarting the kubelet, and you might see a previously running Pod started to fail with "Predicate NodeAffinity failed".

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-03-23T00:27:47Z

@ruiwen-zhao: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

BenTheElder · 2021-03-23T01:48:23Z

cc @neolit123 @ehashman
/sig node

phantooom · 2021-03-23T08:39:17Z

/assign

neolit123 · 2021-03-23T18:07:30Z

thank you for reporting this @ruiwen-zhao

neolit123 · 2021-03-23T18:07:52Z

However, in the case of kubelet restart, the pods scheduled on the node before the restart might still fail with "NodeAffinity failed" after the restart. Looking at the code, this is probably because the admit pod check (canAdmitPod()) might happen before GetNode().

from my understanding this should be resolved by #99336 in its current state?

lwsanty · 2021-04-27T10:43:51Z

issue is still reproduced in GKE 1.19.8-gke.1600

Keramblock · 2021-04-27T17:56:31Z

issue is still reproduced in GKE 1.19.8-gke.1600

same

neolit123 · 2021-04-27T18:22:47Z

GKE builds are proprietary.

for upstream Kubernetes once these cherry picks merge you should check the latest respective PATCH releases:
#99336 (comment)

Keramblock · 2021-04-27T21:22:44Z

@neolit123 Still, they based on open source, or aren't they?

neolit123 · 2021-04-27T21:32:16Z

my point is that i cannot give you a timeline of when the GKE builds will be available and you should consult with GKE support.

k8s patch releases should be up on 12th of May:
https://groups.google.com/g/kubernetes-dev/c/H06vjjSzX44/m/rZcBO0_rAAAJ

Keramblock · 2021-04-28T14:24:42Z

Oh, ok. Thank you =)

primeroz · 2021-05-13T07:55:40Z

FYI this is also present in gke 1.18.17-gke.700 , i did hope they would backport the patch since yesterday the .700 was released to stable channel but that is not the case.

Luckily, for us, this is only an issue with preemptible nodes since that is effectively a node restart

Will wait for 1.18.19 impatiently. 🤞

sbocinec · 2021-05-26T12:56:34Z

We got first affected by this issue after upgrading our GKE cluster from v1.17.17-gke.2800 to 1.18.17-gke.700 for pods running on pre-emptible nodes. Is this k8s 1.18+ specific?

primeroz · 2021-05-26T13:10:25Z

Same here, as far as i know this is fixed in 1.18.19

fix in #99336 (comment) cherrypicked to 1.18 in #101343

also affectx up to 1.21 btw, check that PR to see the commit for each version

pacoxu · 2021-06-25T09:36:52Z

It should be fixed in 1.18.19, v1.19.10, v1.20.7 and v1.21.1.

For GKE upgrade, I think it should be asked in GKE service?
/triage duplicate
/close

k8s-ci-robot · 2021-06-25T09:36:58Z

@pacoxu: Closing this issue.

In response to this:

It should be fixed in 1.18.19, v1.19.10, v1.20.7 and v1.21.1.

For GKE upgrade, I think it should be asked in GKE service?
/triage duplicate
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

username1366 · 2021-06-30T10:37:03Z

After upgrading GKE to v1.18.19-gke.1700 I experienced the same issue - some of the pods after node preemption moved to NodeAffinity status

kubectl get pods -o wide --all-namespaces | grep NodeAffinity
app              app-cd5d5595f-tkw9p                          0/5     NodeAffinity

primeroz · 2021-06-30T13:34:34Z

On GKE i also tested with v1.19.10-gke.1600 and getting plenty of nodeaffinity pods

sbocinec · 2021-07-26T14:26:59Z

Tested with the latest v1.18.20-gke.900 and the issue is still present. @phantooom is it possible to reopen the issue as the fix appears not to fix it?

miguepintor · 2021-07-28T12:35:14Z

on GKE v1.19.11-gke.2101 is reproducible as well, please @phantooom consider the re-opening

alculquicondor · 2021-08-04T13:08:15Z

/reopen
@neolit123 any ideas?

k8s-ci-robot · 2021-08-04T13:08:22Z

@alculquicondor: Reopened this issue.

In response to this:

/reopen
@neolit123 any ideas?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2021-08-04T13:28:20Z

sounds like something that regressed after the node sync changes, but the second one (that i did) did not fix it.

the change that i did:
#99336

was technically a refactor on what was already established by the previous change:
#94087

However, in the case of kubelet restart, the pods scheduled on the node before the restart might still fail with "NodeAffinity failed" after the restart. Looking at the code, this is probably because the admit pod check (canAdmitPod()) might happen before GetNode().

this seems racy and should be brought to discussion at the SIG Node meeting
https://github.com/kubernetes/community/tree/master/sig-node#meetings

kubelet maintainers that are more savvy must be able to reproduce it:

This issue does not happen all the time. To reproduce it, you will need to keep restarting the kubelet, and you might see a previously running Pod started to fail with "Predicate NodeAffinity failed".

we have a lot of GKE reporters in this ticket. has anyone seen the problem on non-GKE clusters?

ehashman · 2021-08-04T17:23:58Z

/remove-triage duplicate
/triage needs-information

Tfmenard · 2021-08-12T19:00:31Z

For affected GKE users, graceful node termination feature fixes the issue and is enabled on clusters running node pools on 1.20+

Note that this issue has little to no impact on workloads. As long as the pod is backed by controller (deployment/statefulset, etc), when a pod runs into the NodeAffinity issue a new pod is immediately created and rescheduled.
See https://issuetracker.google.com/185362914 for details.

In case this issue comes back, to reproduce it simply create a cluster with a node pool that runs preemptible VMs and deploy this simple deployment [1] that uses nodeSelector.
Then to simulate a preemption run
gcloud compute instances simulate-maintenance-event <node-name> --zone <zone name>
If you're lucky the issue will occur on the first preemption, but it may only occur on the 10th one.

I think we should close this one as we've only seen this occur on GKE and a fix is out now for GKE.
Please reopen if you experience the issue on GKE node pools running 1.20+ or on non-GKE clusters.

[1]

apiVersion: apps/v1
kind: Deployment
metadata:
  name: na-test
spec:
  replicas: 5
  selector:
    matchLabels:
      role: na-test
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1000
    type: RollingUpdate
  template:
    metadata:
      labels:
        role: na-test
    spec:
      containers:
      - image: busybox
        command:
          - sh
          - -c
          - 'echo "NodeAffinity test"; sleep 300;'
        imagePullPolicy: IfNotPresent
        name: busybox
      nodeSelector:
        cloud.google.com/gke-nodepool: <node pool name>

/close

k8s-ci-robot · 2021-08-12T19:00:45Z

@Tfmenard: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

For affected GKE users, graceful node termination feature fixes the issue and is enabled on clusters running node pools on 1.20+

Note that this issue has little to no impact on workloads. As long as the pod is backed by controller (deployment/statefulset, etc), when a pod runs into the NodeAffinity issue a new pod is immediately created and rescheduled.
See https://issuetracker.google.com/185362914 for details.

In case this issue comes back, to reproduce it simply create a cluster with a node pool that runs preemptible VMs and deploy this simple deployment [1] that uses nodeSelector.
Then to simulate a preemption run
gcloud compute instances simulate-maintenance-event <node-name> --zone <zone name>
If you're lucky the issue will occur on the first preemption, but it may only occur on the 10th one.

I'm closing this issue as we've only seen this occur on GKE and a fix is out now for GKE.
Please reopen if you experience the issue on GKE node pools running 1.20+ or on non-GKE clusters.

[1]
apiVersion: apps/v1
kind: Deployment
metadata:
 name: na-test
spec:
 replicas: 5
 selector:
   matchLabels:
     role: na-test
 strategy:
   rollingUpdate:
     maxSurge: 0
     maxUnavailable: 1000
   type: RollingUpdate
 template:
   metadata:
     labels:
       role: na-test
   spec:
     containers:
     - image: busybox
       command:
         - sh
         - -c
         - 'echo "NodeAffinity test"; sleep 300;'
       imagePullPolicy: IfNotPresent
       name: busybox
     nodeSelector:
       cloud.google.com/gke-nodepool: <node pool name>
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SergeyKanzhelev · 2021-08-12T19:05:16Z

/close

k8s-ci-robot · 2021-08-12T19:05:32Z

@SergeyKanzhelev: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ruiwen-zhao added the kind/bug Categorizes issue or PR as related to a bug. label Mar 23, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 23, 2021

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 23, 2021

k8s-ci-robot assigned phantooom Mar 23, 2021

neolit123 mentioned this issue Mar 23, 2021

pkg/kubelet: improve the node informer sync check #99336

Merged

BenTheElder mentioned this issue Mar 23, 2021

Kubelet doesn't check taints #100408

Closed

k8s-ci-robot added the triage/duplicate Indicates an issue is a duplicate of other open issue. label Jun 25, 2021

k8s-ci-robot closed this as completed Jun 25, 2021

k8s-ci-robot reopened this Aug 4, 2021

k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed triage/duplicate Indicates an issue is a duplicate of other open issue. labels Aug 4, 2021

ehashman added this to Needs Information in SIG Node Bugs Aug 5, 2021

k8s-ci-robot closed this as completed Aug 12, 2021

SIG Node Bugs automation moved this from Needs Information to Done Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods fail with "NodeAffinity failed" after kubelet restarts #100467

Pods fail with "NodeAffinity failed" after kubelet restarts #100467

ruiwen-zhao commented Mar 23, 2021

k8s-ci-robot commented Mar 23, 2021

BenTheElder commented Mar 23, 2021

phantooom commented Mar 23, 2021

neolit123 commented Mar 23, 2021

neolit123 commented Mar 23, 2021 •

edited

lwsanty commented Apr 27, 2021

Keramblock commented Apr 27, 2021

neolit123 commented Apr 27, 2021

Keramblock commented Apr 27, 2021

neolit123 commented Apr 27, 2021

Keramblock commented Apr 28, 2021

primeroz commented May 13, 2021

sbocinec commented May 26, 2021

primeroz commented May 26, 2021 •

edited

pacoxu commented Jun 25, 2021

k8s-ci-robot commented Jun 25, 2021

username1366 commented Jun 30, 2021

primeroz commented Jun 30, 2021

sbocinec commented Jul 26, 2021 •

edited

miguepintor commented Jul 28, 2021

alculquicondor commented Aug 4, 2021

k8s-ci-robot commented Aug 4, 2021

neolit123 commented Aug 4, 2021

ehashman commented Aug 4, 2021

Tfmenard commented Aug 12, 2021 •

edited

k8s-ci-robot commented Aug 12, 2021

SergeyKanzhelev commented Aug 12, 2021

k8s-ci-robot commented Aug 12, 2021

Pods fail with "NodeAffinity failed" after kubelet restarts #100467

Pods fail with "NodeAffinity failed" after kubelet restarts #100467

Comments

ruiwen-zhao commented Mar 23, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

k8s-ci-robot commented Mar 23, 2021

BenTheElder commented Mar 23, 2021

phantooom commented Mar 23, 2021

neolit123 commented Mar 23, 2021

neolit123 commented Mar 23, 2021 • edited

lwsanty commented Apr 27, 2021

Keramblock commented Apr 27, 2021

neolit123 commented Apr 27, 2021

Keramblock commented Apr 27, 2021

neolit123 commented Apr 27, 2021

Keramblock commented Apr 28, 2021

primeroz commented May 13, 2021

sbocinec commented May 26, 2021

primeroz commented May 26, 2021 • edited

pacoxu commented Jun 25, 2021

k8s-ci-robot commented Jun 25, 2021

username1366 commented Jun 30, 2021

primeroz commented Jun 30, 2021

sbocinec commented Jul 26, 2021 • edited

miguepintor commented Jul 28, 2021

alculquicondor commented Aug 4, 2021

k8s-ci-robot commented Aug 4, 2021

neolit123 commented Aug 4, 2021

ehashman commented Aug 4, 2021

Tfmenard commented Aug 12, 2021 • edited

k8s-ci-robot commented Aug 12, 2021

SergeyKanzhelev commented Aug 12, 2021

k8s-ci-robot commented Aug 12, 2021

neolit123 commented Mar 23, 2021 •

edited

primeroz commented May 26, 2021 •

edited

sbocinec commented Jul 26, 2021 •

edited

Tfmenard commented Aug 12, 2021 •

edited