Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

Closed
serngawy opened this issue Apr 18, 2024 · 3 comments · Fixed by #4953
Closed

ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP #4936

serngawy opened this issue Apr 18, 2024 · 3 comments · Fixed by #4953
Labels
area/provider/rosa Issues or PRs related to Red Hat ROSA provider kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@serngawy
Copy link
Contributor

/kind bug

What steps did you take and what happened:
We using gitOps workflow to provision ROSA-HCP. The required CRs; RosaControlPlane, RosaCluster, RosaMachinePool, Cluster and MachinePool are stored in a git repo and ArgoCDs is used to Sync the CRs to the installer cluster.
At the time to de-provision the ROSA-HCP;
1- Delete all the required CRs from the git repo
2- Let ArgoCD sync the git repo status and delete all the required CRs from the installer cluster.
3- The ROSA-HCP start the uninstall process
4- The RosaControlPlace and RosaCluster CRs are deleted however RosaMachinePool, MachinePool and Cluster CR stuck never get deleted.
5- Checking the aws console and the redhat console all the ROSA-HCP and AWS resources are destroyed.
6- After manually cleaning the finalizers from RosamachinePool and MachinePool CRs the Cluster CR is deleted.

What did you expect to happen:
We expect at the cluster uninstall all the CRs get deleted.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Checking the logs for capa-controller-manager deployment during the uninstall process, the logs below is shown

E0415 22:22:23.974724       1 controller.go:329] "Reconciler error" err="Node pools can only be deleted on clusters in 'ready' state, cluster requested is in 'uninstalling' state." controller="rosamachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ROSAMachinePool" ROSAMachinePool="ns-rosa-hcp/workers-ex" namespace="ns-rosa-hcp" name="workers-ex" reconcileID="4a8dfbec-4fb6-48ff-aa07-bf75ba7cd31b"

After the aws and rosa-hcp resources destroyed the logs below is shown

I0415 22:42:21.241631       1 rosamachinepool_controller.go:142] "Failed to retrieve ControlPlane from MachinePool" controller="rosamachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ROSAMachinePool" ROSAMachinePool="ns-rosa-hcp/workers-ex" namespace="ns-rosa-hcp" name="workers-ex" reconcileID="78e9b21b-937c-41fe-9550-34b974be85dc" MachinePool="ns-rosa-hcp/workers-ex" cluster="ns-rosa-hcp/rosa-hcp-2"

The Logs for capi-controller-manager deployment during the uninstall process, the logs below is shown

I0415 22:49:51.074671       1 cluster_controller.go:269] "Cluster still has descendants - need to requeue" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="ns-rosa-hcp/rosa-hcp-2" namespace="ns-rosa-hcp" name="rosa-hcp-2" reconcileID="6d892381-4d67-402d-97da-c4a2f76c7a57" descendants="Machine pools: workers-ex" indirect descendants count=0
E0415 22:49:54.492574       1 controller.go:329] "Reconciler error" err="failed to create cluster accessor: error fetching REST client config for remote cluster \"ns-rosa-hcp/rosa-hcp-2\": failed to retrieve kubeconfig secret for Cluster ns-rosa-hcp/rosa-hcp-2: Secret \"rosa-hcp-2-kubeconfig\" not found" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="ns-rosa-hcp/workers-ex" namespace="ns-rosa-hcp" name="workers-ex" reconcileID="c2c91b09-2e38-4dfd-9a2f-c0478de02ac1"

capi-logs.txt
capa-logs.txt
rosa-hcp-2.txt

Environment:

  • Cluster-api-provider-aws version: v2.4.2

  • Kubernetes version: (use kubectl version): v1.27.3

  • OS (e.g. from /etc/os-release):

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority labels Apr 18, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 18, 2024
@nrb
Copy link
Contributor

nrb commented Apr 18, 2024

6- After manually cleaning the finalizers from RosamachinePool and MachinePool CRs the Cluster CR is deleted.

Can you share which finalizer specifically?

@serngawy serngawy changed the title RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP ROSA: RosaMachinePool and MachinePool CRs stuck at de-provision ROSA-HCP Apr 22, 2024
@serngawy
Copy link
Contributor Author

6- After manually cleaning the finalizers from RosamachinePool and MachinePool CRs the Cluster CR is deleted.

Can you share which finalizer specifically?

RosaMachinePool finalizer then MachinePool finalizer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/rosa Issues or PRs related to Red Hat ROSA provider kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
3 participants