Handle Non-graceful Node Shutdown #108486

sonasingh46 · 2022-03-03T11:47:02Z

Signed-off-by: Ashutosh Kumar sonasingh46@gmail.com

Co-authored-by: Ashutosh Kumar sonasingh46@gmail.com

What type of PR is this?

Implements KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown

/kind feature

What this PR does / why we need it:

This PR adds a feature to detach volume immediately and does not wait for the 6 min timeout in case of a non graceful shutdown of a node that has an node.kubernetes.io/out-of-service taint added manually.

Adds feature gate NodeOutOfServiceVolumeDetach for this feature which is disabled by default.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Performed the following tests:

Used kubernetes cluster created on vSphere infra and vSphere CSI driver.
Kubernetes Version:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}

Kube-controller-image was based out on the current PR

Test 1:

1. Deployed the kube-controller-manager with current code changes and disabled 
the non graceful shutdown feature gate. (It is disabled by default)

2. Created a statefulset pod. 

3. Shutdown the node on which the pod is scheduled. Node is shutdown using the 
`Shut Down Guest OS` from vSphere UI ( This shutdown is a non graceful shutdown ) 

4. Observed that after 5 mins, the pod changed to `Terminating` state. 

5. Observed that even after 6 mins, ( i.e total 6+5 = 11 mins ) the pod is stuck in `Terminating` state

Test 2:

1. Deployed the kube-controller-manager with current code changes and disabled 
the non graceful shutdown feature gate. (It is disabled by default).

2. Created a statefulset pod. 

3. Shutdown the node on which the pod is scheduled. Node is shutdown using the 
`Shut Down Guest OS` from vSphere UI ( This shutdown is a non graceful shutdown ) 

4. Observed that after 5 mins, the pod changed to `Terminating` state. 

5. Deleted the pod using `kubectl delete pod <pod-name> --force --grace-period 0`

6. The pod immediately got scheduled to a different healthy node but was stuck in `ContainerCreating` 
state for 6 mins. The pod came into `Running` state after 6 mins. It had the following events: 
 ----     ------                  ----   ----                     -------
  Normal   Scheduled               6m10s  default-scheduler        Successfully assigned default/test-sts-0 to k8s-node-716-1644856168
  Warning  FailedAttachVolume      6m10s  attachdetach-controller  Multi-Attach error for volume "pvc-b83ddf01-3029-4666-aea5-f43c91d8ddf0" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             4m7s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[test-volume], unattached volumes=[test-volume kube-api-access-48d69]: timed out waiting for the condition
  Warning  FailedMount             113s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[test-volume], unattached volumes=[kube-api-access-48d69 test-volume]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  1s     attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-b83ddf01-3029-4666-aea5-f43c91d8ddf0"

Test 3:

1. Deployed the kube-controller-manager with current code changes and enabled the 
non graceful shutdown feature gate. The feature gate can be enabled by adding the 
following to kube-controller-manager manifest yaml.
spec:
  containers:
  - command:
    // Add the below line
    - --feature-gates=NodeOutOfServiceVolumeDetach=true

2. Created a statefulset pod. 

3. Shutdown the node on which the pod is scheduled. Node is shutdown using the 
`Shut Down Guest OS` from vSphere UI ( This shutdown is a non graceful shutdown ) 

4. Observed that after 5 mins, the pod changed to `Terminating` state. 

5. Taint the node on which the pod was scheduled using the command: 
kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=value1:NoExecute

6. The pod immediately got scheduled to a different healthy and came into running state in next couple of seconds without waiting for the 6 mins detach timeout period.

Does this PR introduce a user-facing change?

Non graceful node shutdown handling.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown

k8s-ci-robot · 2022-03-03T11:47:11Z

Hi @sonasingh46. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xing-yang · 2022-03-03T13:56:45Z

/assign

pkg/controller/podgc/gc_controller.go

pkg/controller/volume/attachdetach/attach_detach_controller.go

pkg/features/kube_features.go

pkg/kubelet/volumemanager/reconciler/reconciler.go

pkg/util/taints/taints.go

xing-yang · 2022-03-03T14:23:24Z

/ok-to-test

xing-yang · 2022-03-03T14:23:41Z

@sonasingh46 Can you add a release note?

xing-yang · 2022-03-03T14:24:42Z

/assign @jingxu97 @gnufied

pkg/controller/podgc/gc_controller.go

xing-yang · 2022-03-03T14:30:46Z

/assign @YuikoTakada

sonasingh46 · 2022-03-26T09:28:04Z

/retest

xing-yang · 2022-03-26T12:03:00Z

/lgtm

xing-yang · 2022-03-26T13:10:20Z

/test pull-kubernetes-node-e2e-containerd

xing-yang · 2022-03-26T13:24:44Z

/retest

sonasingh46 · 2022-03-26T15:46:29Z

/retest

tengqm · 2022-03-27T00:34:27Z

The new feature gate needs a docs website update.

xing-yang · 2022-03-27T02:39:15Z

Thanks @tengqm for the reminder.
@sonasingh46 Can you add the feature gate in your doc PR?
kubernetes/website#32406

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com> Co-authored-by: Ashutosh Kumar <sonasingh46@gmail.com> Co-authored-by: xing-yang <xingyang105@gmail.com>

K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service" that enables automatic deletion of pv-attached pods on failed nodes. This patch makes fencing-controller adding it to a fenced node just after the fencing job was successfully finished. See the pages below for more detail: kubernetes/enhancements#2268 kubernetes/kubernetes#108486

k8s-ci-robot added area/kubelet sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 3, 2022

k8s-ci-robot requested review from deads2k and gnufied March 3, 2022 11:48

k8s-ci-robot assigned xing-yang Mar 3, 2022

xing-yang reviewed Mar 3, 2022

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 3, 2022

k8s-ci-robot assigned gnufied and jingxu97 Mar 3, 2022

xing-yang reviewed Mar 3, 2022

View reviewed changes

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned YuikoTakada Mar 3, 2022

sftim mentioned this pull request Mar 4, 2022

Non-graceful node shutdown kubernetes/enhancements#2268

Closed

sonasingh46 force-pushed the nongraceful_shutdown branch from 0b5347a to e728d5c Compare March 6, 2022 11:24

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2022

k8s-ci-robot merged commit c009753 into kubernetes:master Mar 26, 2022

tengqm mentioned this pull request Mar 29, 2022

NodeOutOfServiceVolumeDetach feature introduced kubernetes/website#32595

Merged

SystemZ mentioned this pull request May 4, 2022

If kubelet is unavailable, AttachDetachController fails to force detach on pod deletion #65392

Closed

This was referenced Jun 21, 2022

Add note to Volume Lifecycle figure container-storage-interface/spec#513

Closed

Volume Lifecycle is not correspond to actual k8s behavior container-storage-interface/spec#512

Open

torredil mentioned this pull request Jul 7, 2022

PVC attaching takes much time kubernetes-sigs/aws-ebs-csi-driver#1302

Closed

This was referenced Jul 25, 2022

chore(e2e): add e2e test for non graceful node shutdown #111380

Merged

add nodeoutofservicevolumedetach e2e test kubernetes/test-infra#26947

Merged

sonasingh46 mentioned this pull request Sep 22, 2022

add nodeoutofservicevolumedetach e2e test kubernetes/test-infra#27600

Closed

yosshy mentioned this pull request Nov 28, 2022

Support a new fencing/mode "taint" kvaps/kube-fencing#28

Open

sonasingh46 mentioned this pull request Jun 24, 2023

add scale test for non graceful node shutdown #118848

Closed

sonasingh46 mentioned this pull request Jul 24, 2023

add non graceful shutdown integration test #119478

Open

liggitt removed this from Assigned in API Reviews Aug 17, 2023

liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Non-graceful Node Shutdown #108486

Handle Non-graceful Node Shutdown #108486

sonasingh46 commented Mar 3, 2022 •

edited

k8s-ci-robot commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

sonasingh46 commented Mar 26, 2022

xing-yang commented Mar 26, 2022

xing-yang commented Mar 26, 2022

xing-yang commented Mar 26, 2022

sonasingh46 commented Mar 26, 2022

tengqm commented Mar 27, 2022

xing-yang commented Mar 27, 2022

Handle Non-graceful Node Shutdown #108486

Handle Non-graceful Node Shutdown #108486

Conversation

sonasingh46 commented Mar 3, 2022 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Test 1:

Test 2:

Test 3:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

xing-yang commented Mar 3, 2022

sonasingh46 commented Mar 26, 2022

xing-yang commented Mar 26, 2022

xing-yang commented Mar 26, 2022

xing-yang commented Mar 26, 2022

sonasingh46 commented Mar 26, 2022

tengqm commented Mar 27, 2022

xing-yang commented Mar 27, 2022

sonasingh46 commented Mar 3, 2022 •

edited