ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

serngawy · 2024-04-18T14:44:32Z

/kind bug

What steps did you take and what happened:
We using gitOps workflow to provision ROSA-HCP. The required CRs; RosaControlPlane, RosaCluster, RosaMachinePool, Cluster and MachinePool are stored in a git repo and ArgoCDs is used to Sync the CRs to the installer cluster.
At the time to de-provision the ROSA-HCP;
1- Delete all the required CRs from the git repo
2- Let ArgoCD sync the git repo status and delete all the required CRs from the installer cluster.
3- The ROSA-HCP start the uninstall process
4- The RosaControlPlace and RosaCluster CRs are deleted however RosaMachinePool, MachinePool and Cluster CR stuck never get deleted.
5- Checking the aws console and the redhat console all the ROSA-HCP and AWS resources are destroyed.
6- After manually cleaning the finalizers from RosamachinePool and MachinePool CRs the Cluster CR is deleted.

What did you expect to happen:
We expect at the cluster uninstall all the CRs get deleted.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Checking the logs for capa-controller-manager deployment during the uninstall process, the logs below is shown

E0415 22:22:23.974724       1 controller.go:329] "Reconciler error" err="Node pools can only be deleted on clusters in 'ready' state, cluster requested is in 'uninstalling' state." controller="rosamachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ROSAMachinePool" ROSAMachinePool="ns-rosa-hcp/workers-ex" namespace="ns-rosa-hcp" name="workers-ex" reconcileID="4a8dfbec-4fb6-48ff-aa07-bf75ba7cd31b"

After the aws and rosa-hcp resources destroyed the logs below is shown

I0415 22:42:21.241631       1 rosamachinepool_controller.go:142] "Failed to retrieve ControlPlane from MachinePool" controller="rosamachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ROSAMachinePool" ROSAMachinePool="ns-rosa-hcp/workers-ex" namespace="ns-rosa-hcp" name="workers-ex" reconcileID="78e9b21b-937c-41fe-9550-34b974be85dc" MachinePool="ns-rosa-hcp/workers-ex" cluster="ns-rosa-hcp/rosa-hcp-2"

The Logs for capi-controller-manager deployment during the uninstall process, the logs below is shown

I0415 22:49:51.074671       1 cluster_controller.go:269] "Cluster still has descendants - need to requeue" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="ns-rosa-hcp/rosa-hcp-2" namespace="ns-rosa-hcp" name="rosa-hcp-2" reconcileID="6d892381-4d67-402d-97da-c4a2f76c7a57" descendants="Machine pools: workers-ex" indirect descendants count=0
E0415 22:49:54.492574       1 controller.go:329] "Reconciler error" err="failed to create cluster accessor: error fetching REST client config for remote cluster \"ns-rosa-hcp/rosa-hcp-2\": failed to retrieve kubeconfig secret for Cluster ns-rosa-hcp/rosa-hcp-2: Secret \"rosa-hcp-2-kubeconfig\" not found" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="ns-rosa-hcp/workers-ex" namespace="ns-rosa-hcp" name="workers-ex" reconcileID="c2c91b09-2e38-4dfd-9a2f-c0478de02ac1"

capi-logs.txt
capa-logs.txt
rosa-hcp-2.txt

Environment:

Cluster-api-provider-aws version: v2.4.2
Kubernetes version: (use kubectl version): v1.27.3
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-18T14:44:41Z

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nrb · 2024-04-18T15:54:35Z

6- After manually cleaning the finalizers from RosamachinePool and MachinePool CRs the Cluster CR is deleted.

Can you share which finalizer specifically?

serngawy · 2024-04-22T12:39:06Z

6- After manually cleaning the finalizers from RosamachinePool and MachinePool CRs the Cluster CR is deleted.

Can you share which finalizer specifically?

RosaMachinePool finalizer then MachinePool finalizer

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority labels Apr 18, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 18, 2024

serngawy changed the title ~~RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP~~ ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP Apr 22, 2024

nrb added the area/provider/rosa Issues or PRs related to Red Hat ROSA provider label Apr 22, 2024

This was referenced May 1, 2024

Failed to delete machinePool for unreachable cluster kubernetes-sigs/cluster-api#10544

Open

🐛 ROSA: Fix issue-4936 delete rosaMachinePool and rosaControlPlane #4953

Merged

k8s-ci-robot closed this as completed in #4953 May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

serngawy commented Apr 18, 2024

k8s-ci-robot commented Apr 18, 2024

nrb commented Apr 18, 2024

serngawy commented Apr 22, 2024

ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

Comments

serngawy commented Apr 18, 2024

k8s-ci-robot commented Apr 18, 2024

nrb commented Apr 18, 2024

serngawy commented Apr 22, 2024