Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed VR and Volsync workloads fail on Relocate to a cluster that the workload was relocated or failed over from #1327

Open
ShyamsundarR opened this issue Apr 9, 2024 · 0 comments

Comments

@ShyamsundarR
Copy link
Member

Test is to have a workload that uses one of each RBD and CephFS PVCs and to failover and then relocate such a workload back to the preferredCluster.

On initial Failover as VRG is not deleted for Volsync cases, the VR and PVC remain on the preferredCluster, with the PVC in Terminating state. Thus, on a future relocate to this cluster (or a failover for that matter), the ClusterDataReady is never reported as True, as the PVC is in Terminating state and the restore of the PVC from the s3 store fails.

This causes the action to be stuck and not make forward progress.

Thoughts on fixes:

  • Handle VR deletion and PVC finalizer removal as part of VRG moving to Secondary, thus these stale resources are garbage collected as needed
  • Delete a Secondary VRG and then once deleted recreate it for Volsync needs

The former is preferable as that allows VRG to shift between Primary and Seconday as the case maybe, rather than enforcing a VRG movement for VR as Primary->Secondary->Delete and then recreate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant