-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to delete machinePool for unreachable cluster #10544
Comments
This issue is currently awaiting triage. CAPI contributors will take a look as soon as possible, apply one of the Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/priority awaiting-more-evidence cc @mboersma @willie-yao @Jont828 note: cluster is unreachable is handled "gracefully" in machine controller, I do expect the same to happen in MachinePools too. |
/assign |
Thanks for the follow up, @fabriziopandini you mentioned handling unreachable cluster "gracefully" in the machinePool, Would you elaborate more. Cause I was thinking that we should have unreachable state for the cluster CR and giving it timeout to do actions like force delete. Is that align with what you mean ? |
@fabriziopandini refers to the code for machine-controller which, we explicitly skips the node deletion on certain errors: cluster-api/internal/controllers/machine/machine_controller.go Lines 347 to 356 in 47ec791
Probably a similar behaviour can be implemented for the machine pool case. |
agree, make sense so the fix I pushed check here if the cluster is deleted, no need to create a client. |
What steps did you take and what happened?
At the time to uninstall/deprovision a cluster AND the cluster is unreachable or the cluster kubeConfig secret is deleted (as it managed by controlPlaneRef) the machinePool controller raise the below errors
The issue is initially raised for cluster-api-provider-aws issue number 4936. Failed to clean up the MachinePool CRs in gitOps workFlow using ArgoCD.
Workaround to clean the machinePool CR by removing the finalizer manually.
What did you expect to happen?
Expect to be able to delete the machinePool CRs without the needs to clean the finalizer manually
Cluster API version
v1.6.3
registry.k8s.io/cluster-api/cluster-api-controller:v1.6.3
Kubernetes version
Kubernetes Version: v1.25.7+eab9cc9
Anything else you would like to add?
Checking the machinePool_controller_reconcileDelete the fix could be by checking the returning error at line-263 is not as failed to create cluster accessor (same error as above logs). The changes could be as below
OR if the machinePool has InfrastructureRef then the responsibility of deleting the node should be done by the InfrastructureRef CR not by the machinePool (it may raise back compatibility issue). The changes could be as below
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: