Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially replace cleanup with finalizers #2431

Merged
merged 17 commits into from
May 29, 2024
Merged

Conversation

weyfonk
Copy link
Contributor

@weyfonk weyfonk commented May 14, 2024

This implements finalizers on GitRepos, bundles and bundle deployments, removing the corresponding cleanup controller logic for those resources. That cleanup is no longer necessary, as those resources can no longer be orphaned, ie deleted when no Fleet controller is running.

This is tested by end-to-end tests, by temporarily scaling down the (unsharded) fleet-controller deployment and checking that deleting one of those resources has no effect other than setting their deletion timestamps.

In general, adding and removing finalizers on a resource is handled by that resource's reconciler, with the following exception: the GitRepo reconciler takes care of removing bundle deployment finalizers, when purging bundle deployments. This prevents the agent from dealing with finalizers, which would require a change in permissions and thereby a possible security risk.

Out of scope

This does not add finalizers to RBAC resources created for GitRepos (eg. service accounts, roles, role bindings, config maps), as those are native k8s resources which can be created in more use cases than GitRepo lifecycle management (eg. agent management), therefore cleanup logic is still needed there.

Content resources do not bear finalizers either, because cleaning them up responds to a different process according to reference counts, ie how many bundles refer to a given Content resource.

Refers to #1980

@weyfonk weyfonk requested a review from a team as a code owner May 14, 2024 15:42
@weyfonk weyfonk marked this pull request as draft May 14, 2024 15:51
@weyfonk weyfonk marked this pull request as ready for review May 20, 2024 15:51
Copy link
Contributor

@0xavi0 0xavi0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions. And, aside from preventing the user to delete bundledeployments I don't see why we need to set a finalizer in those.

In the version without finalisers the user is able to delete a bundledeployment which is is almos instantly recreated by Fleet.

With this the user can't delete bundledeployments as the DeleteTimestamp is set but the finaliser is never removed.
I don't know if keeping the bundledeployment with the delete timestamp set and alive is the right thing to do.

There is no problem deleting bundles or gitrepos (I've tested also those cases with this branch)

Copy link
Contributor

@0xavi0 0xavi0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions. And, aside from preventing the user to delete bundledeployments I don't see why we need to set a finalizer in those.

In the version without finalisers the user is able to delete a bundledeployment which is is almos instantly recreated by Fleet.

With this the user can't delete bundledeployments as the DeleteTimestamp is set but the finaliser is never removed.
I don't know if keeping the bundledeployment with the delete timestamp set and alive is the right thing to do.

There is no problem deleting bundles or gitrepos (I've tested also those cases with this branch)

)
Expect(err).ToNot(HaveOccurred())

_, _ = env.Kubectl.Namespace(bdNamespace).Delete(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to "manually" delete the bundledeployment?
It should be gone after deleting the bundle, right?

In fact this should be part of the test... checking that when deleting a bundle or gitrepo the bundledeployments related are also gone

Copy link
Contributor Author

@weyfonk weyfonk May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this, as the bundle deployment will indeed have been deleted by each test case, either manually or through previous cleanup steps, ie scaling the Fleet controller back up, which would trigger GitRepo and bundle deletion.

As to testing that bundle deployments are actually deleted when deleting a bundle or GitRepo within this test suite, as noted as a response to your next comment, I am not sure that this is the right place to do it as that deletion does not depend on finalizers.

I could take the opportunity to expand gitrepo_test.go though, where I think those checks should live.

Expect(err).NotTo(HaveOccurred())

// These resources should be deleted when the GitRepo is deleted.
By("checking that the auxiliary resources don't exist anymore")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also check that the bundles and bundledeployments are deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not depend on finalizers though, but rather on relationships between a GitRepo, bundles and bundle deployments. We cover some of it here.

internal/cmd/controller/reconciler/bundle_controller.go Outdated Show resolved Hide resolved
@weyfonk weyfonk force-pushed the 1980-finalizers branch 2 times, most recently from 8927736 to 549e84f Compare May 27, 2024 14:13
@weyfonk weyfonk requested a review from 0xavi0 May 27, 2024 14:34
0xavi0
0xavi0 previously approved these changes May 28, 2024
weyfonk added 17 commits May 28, 2024 16:37
The bundle reconciler creates bundle deployments with finalizers, which
are removed when bundle deployments are purged by either of the bundle
or `GitRepo` reconcilers.
This prevents bundle deployments from being deleted when no Fleet
controller is running.
A bundle now receives a finalizer on its first reconciler loop.
This prevents bundles from being deleted when no Fleet controller is
running.
This prevents `GitRepo`s from being deleted when no Fleet controller is
running.
This should prevent conflict errors in controller logs, which cause some
gitrepo tests to fail.
If a GitRepo is not found, there is no point in adding or deleting its
finalizer anyway.
This enables Fleet to be uninstalled. Without patching those resources
manually, deleting namespace `fleet-local` would hang because of
finalizers still being present on them.
Cleanup is now handled via finalizers for those resources.
When the GitRepo reconciler processes a GitRepo deletion, imagescans are
deleted. Finalizers prevent a GitRepo from being deleted outside of that
reconciler, therefore orphan imagescans should no longer exist.
When dealing with deletion of a GitRepo, end-to-end tests now check
whether the service account, config map, role and role binding exist.
This validates that finalizers mean that those resources can no longer
be orphaned, unless one deliberately, manually removes a GitRepo's
finalizers through patching.
Memory aliasing does not matter in this loop since we do not store the
pointer.
This brings consistency with other test cases. Still, this does not seem
to resolve conflicts at `GitRepo` creation time, which seem to be caused
by limitations of the `setenv` framework.
This commit also skips deletion of the test namespace, due to those
limitations [1].

[1]: https://book.kubebuilder.io/reference/envtest#namespace-usage-limitation
This should allow end-to-end tests with finalizers to pass without race
conditions.
For some reason, this makes them pass even with finalizers, eliminating
errors about a `GitRepo` already existing when it is created just once,
with a random name.
This makes use of a single `Eventually` call, which prevents flakiness
caused by a `GitRepo`'s status not having been updated before its
resource version.
Checking a deletion timestamp on an object which has just been created
is superfluous.
Bundle deployments are deleted anyway, either manually or through bundle
deletion.
A new parameter has been added to `CreateGitRepo`, meaning that an empty
argument must be added for working in unsharded contexts.
@weyfonk weyfonk merged commit 5bc51b0 into rancher:main May 29, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants