Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

Open
ardixit-msft-la opened this issue Feb 29, 2024 · 8 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@ardixit-msft-la
Copy link

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
In cases where manual deletion of self-hosted cluster is attempted before deleting through kubectl command, the cluster never gets deleted. On reattempting manual deletions, the resources are recreated while the provisioning state of the cluster is shown as deleting.
kubectl --kubeconfig C:\Users\ardixit.kube\management get clusters
image

What did you expect to happen:
After the deletion is attempted through the following command

kubectl --kubeconfig C:\Users\ardixit.kube\management delete cluster flexiblec220849

The cluster should be deleted instead of hanging the deleting state.
The cluster should not be recreated.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
This issue was reproduced with an AKS setup with 4 cluster, each having 5 user nodepools.
The issue is intermittent and happens twice in approx. 10 runs.

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 29, 2024
@jackfrancis
Copy link
Contributor

/assign @nawazkh

@ardixit-msft-la could you be more specific what "manual deletion of self-hosted cluster is attempted before deleting through kubectl command" means? This way we can repro.

  1. Create an AKS cluster managed by CAPZ
  2. Manually delete <something???>
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

Specifically, we need more info on repro step 2 above.

Thanks!

cc @nojnhuh

@ardixit-msft-la
Copy link
Author

/assign @nawazkh

@ardixit-msft-la could you be more specific what "manual deletion of self-hosted cluster is attempted before deleting through kubectl command" means? This way we can repro.

  1. Create an AKS cluster managed by CAPZ
  2. Manually delete <something???>
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

Specifically, we need more info on repro step 2 above.

Thanks!

cc @nojnhuh

Correct.

Here are the steps.

  1. Create a Self-Hosted cluster managed by CAPZ
  2. Manually delete the resource group hosting the self-hosted cluster from Azure portal
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

@nawazkh
Copy link
Member

nawazkh commented Mar 14, 2024

@ardixit-msft-la Sorry for the delay.

  1. Create a Self-Hosted cluster managed by CAPZ
  2. Manually delete the resource group hosting the self-hosted cluster from Azure portal
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

I dont see the erronous behavior by following the steps shared above. I am using CAPZ main for the repro and K8s: v1.28.5.
Can you share the version of CAPZ and Kubernetes where you see this error ?

Also, can you delineate the steps even more? Are the steps 2 and 3 performed immediately one after the other? Or are you waiting for step 2 to finish before executing step 3 ?

@nawazkh
Copy link
Member

nawazkh commented Mar 14, 2024

Retried with CAPZ v1.13.2 and K8s v1.28.5 but could not repro this issue.

@ardixit-msft-la
Copy link
Author

ardixit-msft-la commented Mar 15, 2024

I am using k8s version 1.26.0 and I have the live environment for the same. Please let me know, I can help you with that.

@nawazkh
Copy link
Member

nawazkh commented Mar 15, 2024

Can you please share the CAPZ version as well?

@ardixit-msft-la
Copy link
Author

I am not sure where/how can I find CAPZ version. Can you please help?

@nawazkh
Copy link
Member

nawazkh commented Mar 15, 2024

I am not sure where/how can I find CAPZ version. Can you please help?

One of the ways is to get the version suffixed to capz-controller-manager-xyz pod from the management cluster.

@mboersma mboersma added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
Status: Wait-On-Author
Development

No branches or pull requests

5 participants