Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

ardixit-msft-la · 2024-02-29T18:53:09Z

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
In cases where manual deletion of self-hosted cluster is attempted before deleting through kubectl command, the cluster never gets deleted. On reattempting manual deletions, the resources are recreated while the provisioning state of the cluster is shown as deleting.
kubectl --kubeconfig C:\Users\ardixit.kube\management get clusters

What did you expect to happen:
After the deletion is attempted through the following command

kubectl --kubeconfig C:\Users\ardixit.kube\management delete cluster flexiblec220849

The cluster should be deleted instead of hanging the deleting state.
The cluster should not be recreated.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
This issue was reproduced with an AKS setup with 4 cluster, each having 5 user nodepools.
The issue is intermittent and happens twice in approx. 10 runs.

Environment:

cluster-api-provider-azure version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

jackfrancis · 2024-02-29T19:02:28Z

/assign @nawazkh

@ardixit-msft-la could you be more specific what "manual deletion of self-hosted cluster is attempted before deleting through kubectl command" means? This way we can repro.

Create an AKS cluster managed by CAPZ
Manually delete <something???>
Attempt to delete cluster object from CAPI mgmt cluster
Observe that cluster gets stuck in Deleting state

Specifically, we need more info on repro step 2 above.

Thanks!

cc @nojnhuh

ardixit-msft-la · 2024-02-29T19:19:15Z

/assign @nawazkh

@ardixit-msft-la could you be more specific what "manual deletion of self-hosted cluster is attempted before deleting through kubectl command" means? This way we can repro.

Create an AKS cluster managed by CAPZ

Manually delete <something???>

Attempt to delete cluster object from CAPI mgmt cluster

Observe that cluster gets stuck in Deleting state

Specifically, we need more info on repro step 2 above.

Thanks!

cc @nojnhuh

Correct.

Here are the steps.

Create a Self-Hosted cluster managed by CAPZ
Manually delete the resource group hosting the self-hosted cluster from Azure portal
Attempt to delete cluster object from CAPI mgmt cluster
Observe that cluster gets stuck in Deleting state

nawazkh · 2024-03-14T20:59:52Z

@ardixit-msft-la Sorry for the delay.

Create a Self-Hosted cluster managed by CAPZ

Manually delete the resource group hosting the self-hosted cluster from Azure portal

Attempt to delete cluster object from CAPI mgmt cluster

Observe that cluster gets stuck in Deleting state

I dont see the erronous behavior by following the steps shared above. I am using CAPZ main for the repro and K8s: v1.28.5.
Can you share the version of CAPZ and Kubernetes where you see this error ?

Also, can you delineate the steps even more? Are the steps 2 and 3 performed immediately one after the other? Or are you waiting for step 2 to finish before executing step 3 ?

nawazkh · 2024-03-14T23:30:57Z

Retried with CAPZ v1.13.2 and K8s v1.28.5 but could not repro this issue.

ardixit-msft-la · 2024-03-15T19:59:20Z

I am using k8s version 1.26.0 and I have the live environment for the same. Please let me know, I can help you with that.

nawazkh · 2024-03-15T20:51:18Z

Can you please share the CAPZ version as well?

ardixit-msft-la · 2024-03-15T20:53:12Z

I am not sure where/how can I find CAPZ version. Can you please help?

nawazkh · 2024-03-15T21:34:53Z

I am not sure where/how can I find CAPZ version. Can you please help?

One of the ways is to get the version suffixed to capz-controller-manager-xyz pod from the management cluster.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 29, 2024

k8s-ci-robot assigned nawazkh Feb 29, 2024

nawazkh mentioned this issue Mar 15, 2024

CAPZ stays stuck in deleting mode when resources are all actually gone #4570

Closed

mboersma added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

ardixit-msft-la commented Feb 29, 2024

jackfrancis commented Feb 29, 2024

ardixit-msft-la commented Feb 29, 2024

nawazkh commented Mar 14, 2024 •

edited

nawazkh commented Mar 14, 2024

ardixit-msft-la commented Mar 15, 2024 •

edited

nawazkh commented Mar 15, 2024

ardixit-msft-la commented Mar 15, 2024

nawazkh commented Mar 15, 2024

Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

Comments

ardixit-msft-la commented Feb 29, 2024

jackfrancis commented Feb 29, 2024

ardixit-msft-la commented Feb 29, 2024

nawazkh commented Mar 14, 2024 • edited

nawazkh commented Mar 14, 2024

ardixit-msft-la commented Mar 15, 2024 • edited

nawazkh commented Mar 15, 2024

ardixit-msft-la commented Mar 15, 2024

nawazkh commented Mar 15, 2024

nawazkh commented Mar 14, 2024 •

edited

ardixit-msft-la commented Mar 15, 2024 •

edited