You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We want to have some alerting in place to let us know when there are leftover Flux resources from failed E2E test runs. Currently there's a race condition when it comes to HelmRelease where it's possible for the metrics for a specific HelmRelease to not be removed or updated and thus always firing. This is because as part of our cluster uninstall logic we force delete those resources by removing their finalizers. We do this because Flux is unable to uninstall the charts once the cluster itself is no longer accessible.
Describe the solution you'd like
After looking through the code we discovered that Flux will not attempt the uninstall step is a resource is marked as suspended. If suspend: true is set on a HelmRelease then Flux will just remove the resource and correctly handle the cleanup of the metrics and remove the finalizer itself.
We'd like to propose that our cleanup-helmreleases-hook-job.yaml in the Cluster chart is updated to set suspend: true instead of force removing the finalizer.
Describe alternatives you've considered
Restarting the helm-controller on the MC is enough to have the metrics correctly re-calculated but this isn't a viable solution.
Is your feature request related to a problem? Please describe.
We want to have some alerting in place to let us know when there are leftover Flux resources from failed E2E test runs. Currently there's a race condition when it comes to HelmRelease where it's possible for the metrics for a specific HelmRelease to not be removed or updated and thus always firing. This is because as part of our cluster uninstall logic we force delete those resources by removing their finalizers. We do this because Flux is unable to uninstall the charts once the cluster itself is no longer accessible.
Describe the solution you'd like
After looking through the code we discovered that Flux will not attempt the uninstall step is a resource is marked as suspended. If suspend: true is set on a HelmRelease then Flux will just remove the resource and correctly handle the cleanup of the metrics and remove the finalizer itself.
We'd like to propose that our cleanup-helmreleases-hook-job.yaml in the Cluster chart is updated to set suspend: true instead of force removing the finalizer.
Describe alternatives you've considered
Restarting the helm-controller on the MC is enough to have the metrics correctly re-calculated but this isn't a viable solution.
Additional context
Related issue: https://github.com/giantswarm/giantswarm/issues/29504
The text was updated successfully, but these errors were encountered: