Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Non-graceful Node Shutdown #108486

Merged
merged 1 commit into from Mar 26, 2022

Conversation

sonasingh46
Copy link
Contributor

@sonasingh46 sonasingh46 commented Mar 3, 2022

Signed-off-by: Ashutosh Kumar sonasingh46@gmail.com

Co-authored-by: Ashutosh Kumar sonasingh46@gmail.com

What type of PR is this?

Implements KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown

/kind feature

What this PR does / why we need it:

This PR adds a feature to detach volume immediately and does not wait for the 6 min timeout in case of a non graceful shutdown of a node that has an node.kubernetes.io/out-of-service taint added manually.

Adds feature gate NodeOutOfServiceVolumeDetach for this feature which is disabled by default.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Performed the following tests:

Used kubernetes cluster created on vSphere infra and vSphere CSI driver.
Kubernetes Version:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}

Kube-controller-image was based out on the current PR

Test 1:
1. Deployed the kube-controller-manager with current code changes and disabled 
the non graceful shutdown feature gate. (It is disabled by default)

2. Created a statefulset pod. 

3. Shutdown the node on which the pod is scheduled. Node is shutdown using the 
`Shut Down Guest OS` from vSphere UI ( This shutdown is a non graceful shutdown ) 

4. Observed that after 5 mins, the pod changed to `Terminating` state. 

5. Observed that even after 6 mins, ( i.e total 6+5 = 11 mins ) the pod is stuck in `Terminating` state
Test 2:
1. Deployed the kube-controller-manager with current code changes and disabled 
the non graceful shutdown feature gate. (It is disabled by default).

2. Created a statefulset pod. 

3. Shutdown the node on which the pod is scheduled. Node is shutdown using the 
`Shut Down Guest OS` from vSphere UI ( This shutdown is a non graceful shutdown ) 

4. Observed that after 5 mins, the pod changed to `Terminating` state. 

5. Deleted the pod using `kubectl delete pod <pod-name> --force --grace-period 0`

6. The pod immediately got scheduled to a different healthy node but was stuck in `ContainerCreating` 
state for 6 mins. The pod came into `Running` state after 6 mins. It had the following events: 
 ----     ------                  ----   ----                     -------
  Normal   Scheduled               6m10s  default-scheduler        Successfully assigned default/test-sts-0 to k8s-node-716-1644856168
  Warning  FailedAttachVolume      6m10s  attachdetach-controller  Multi-Attach error for volume "pvc-b83ddf01-3029-4666-aea5-f43c91d8ddf0" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             4m7s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[test-volume], unattached volumes=[test-volume kube-api-access-48d69]: timed out waiting for the condition
  Warning  FailedMount             113s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[test-volume], unattached volumes=[kube-api-access-48d69 test-volume]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  1s     attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-b83ddf01-3029-4666-aea5-f43c91d8ddf0"
  
Test 3:
1. Deployed the kube-controller-manager with current code changes and enabled the 
non graceful shutdown feature gate. The feature gate can be enabled by adding the 
following to kube-controller-manager manifest yaml.
spec:
  containers:
  - command:
    // Add the below line
    - --feature-gates=NodeOutOfServiceVolumeDetach=true

2. Created a statefulset pod. 

3. Shutdown the node on which the pod is scheduled. Node is shutdown using the 
`Shut Down Guest OS` from vSphere UI ( This shutdown is a non graceful shutdown ) 

4. Observed that after 5 mins, the pod changed to `Terminating` state. 

5. Taint the node on which the pod was scheduled using the command: 
kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=value1:NoExecute

6. The pod immediately got scheduled to a different healthy and came into running state in next couple of seconds without waiting for the 6 mins detach timeout period.
  

Does this PR introduce a user-facing change?

Non graceful node shutdown handling.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 3, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @sonasingh46. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 3, 2022
@xing-yang
Copy link
Contributor

/assign

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/features/kube_features.go Outdated Show resolved Hide resolved
pkg/features/kube_features.go Outdated Show resolved Hide resolved
pkg/kubelet/volumemanager/reconciler/reconciler.go Outdated Show resolved Hide resolved
pkg/util/taints/taints.go Outdated Show resolved Hide resolved
@xing-yang
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 3, 2022
@xing-yang
Copy link
Contributor

@sonasingh46 Can you add a release note?

@xing-yang
Copy link
Contributor

/assign @jingxu97 @gnufied

@xing-yang
Copy link
Contributor

/assign @YuikoTakada

@sonasingh46
Copy link
Contributor Author

/retest

@xing-yang
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2022
@xing-yang
Copy link
Contributor

/test pull-kubernetes-node-e2e-containerd

@xing-yang
Copy link
Contributor

/retest

1 similar comment
@sonasingh46
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit c009753 into kubernetes:master Mar 26, 2022
@tengqm
Copy link
Contributor

tengqm commented Mar 27, 2022

The new feature gate needs a docs website update.

@xing-yang
Copy link
Contributor

Thanks @tengqm for the reminder.
@sonasingh46 Can you add the feature gate in your doc PR?
kubernetes/website#32406

muyangren2 pushed a commit to muyangren2/kubernetes that referenced this pull request Jul 14, 2022
Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>

Co-authored-by: Ashutosh Kumar <sonasingh46@gmail.com>

Co-authored-by: xing-yang <xingyang105@gmail.com>
yosshy added a commit to yosshy/kube-fencing that referenced this pull request Oct 14, 2022
K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service"
that enables automatic deletion of pv-attached pods on failed nodes.
This patch makes fencing-controller adding it to a fenced node just after
the fencing job was successfully finished.

See the pages below for more detail:
kubernetes/enhancements#2268
kubernetes/kubernetes#108486
yosshy added a commit to yosshy/kube-fencing that referenced this pull request Nov 26, 2022
K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service"
that enables automatic deletion of pv-attached pods on failed nodes.
This patch makes fencing-controller adding it to a fenced node just after
the fencing job was successfully finished.

See the pages below for more detail:
kubernetes/enhancements#2268
kubernetes/kubernetes#108486
yosshy added a commit to yosshy/kube-fencing that referenced this pull request Nov 26, 2022
K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service"
that enables automatic deletion of pv-attached pods on failed nodes.
This patch makes fencing-controller adding it to a fenced node just after
the fencing job was successfully finished.

See the pages below for more detail:
kubernetes/enhancements#2268
kubernetes/kubernetes#108486
@liggitt liggitt removed this from Assigned in API Reviews Aug 17, 2023
@liggitt liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet