Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalizers #207

Merged
merged 3 commits into from May 6, 2022
Merged

Finalizers #207

merged 3 commits into from May 6, 2022

Conversation

scothis
Copy link
Contributor

@scothis scothis commented Apr 17, 2022

Finalizers allow a reconciler to clean up state for a resource that has
been deleted by a client, and not yet fully removed. Resources with
finalizers will stay in this terminating state until all finalizers are
cleared from the resource.

ParentReconcilers will now reconcile resources that are terminating as
long as they also have a pending finalizer.

SyncReconcilers have a new, optional, Finalize method that is called
when the resource is terminating. The Sync method will not be called
unless SyncDuringFinalization is true. Users are responsible to add and
clear finalizers from resources.

The AddParentFinalizer and ClearParentFinalizer functions will patch the
resource. These method work with cast parents objects and use the same
client the ParentReconciler used to originally load the parent resource,
so they can be called inside SubReconcilers that use a different client.

ChildReconcilers have a Finalizer field that when set will ensure that
value is set on the parent resource finalizers before creating/updating
a child resource. It will also clear that finalizer after the child is
deleted. When the parent is terminating, the DesiredChild method is not
called and existing children are deleted.

Signed-off-by: Scott Andrews andrewssc@vmware.com

@codecov
Copy link

codecov bot commented Apr 17, 2022

Codecov Report

Merging #207 (7f2cfff) into main (bb1d838) will increase coverage by 8.84%.
The diff coverage is 94.77%.

@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
+ Coverage   62.09%   70.94%   +8.84%     
==========================================
  Files           9        9              
  Lines         905     1277     +372     
==========================================
+ Hits          562      906     +344     
- Misses        325      351      +26     
- Partials       18       20       +2     
Impacted Files Coverage Δ
reconcilers/reconcilers.go 84.90% <94.77%> (+4.94%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb1d838...7f2cfff. Read the comment docs.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@mamachanko
Copy link
Collaborator

Took your branch for a spin. Here's what I found.

To unit test adding of a finalizer, the parent needs to have a ResourceVersion(e.g. "123") and the expected parent
will have that incremented(e.g. "123"). That's not a problem, but while it's putting the burden on the implementer of
the tests they may benefit from better docs here. The test should expect the patch of the parent to add the finalizer
and an additional event "FinalizerPatched".

Unit testing clearing of a finalizer happens by setting the DeletionTimestamp. That may be helpful to document as
well. The patch's body then looks something like:

//...
ExpectPatches: []rtesting.PatchRef{
    {
        Group:     "my.group",
        Kind:      "Thing",
        Namespace: "test-ns",
        Name:      "test-client",
        PatchType: "application/strategic-merge-patch+json",
        Patch:     []byte(`{"metadata":{"finalizers":null,"resourceVersion":"123"}}`),
    },
},
//...

In action, however, reconcilers.AddParentFinalizer breaks down for me with

{
	"level": "error",
	"ts": "2022-04-21T10:07:03.738701466Z",
	"logger": "controller.thing",
	"msg": "Reconciler error",
	"reconciler group": "my.group",
	"reconciler kind": "Thing",
	"name": "max",
	"namespace": "default",
	"error": "the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json, application/apply-patch+yaml",
	"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"
}

I think that's because PatchType: "application/strategic-merge-patch+json" is not available for CRDs, or is it?

@scothis
Copy link
Contributor Author

scothis commented Apr 21, 2022

To unit test adding of a finalizer, the parent needs to have a ResourceVersion(e.g. "123") and the expected parent will have that incremented(e.g. "123"). That's not a problem, but while it's putting the burden on the implementer of the tests they may benefit from better docs here. The test should expect the patch of the parent to add the finalizer and an additional event "FinalizerPatched".

The ResourceVersion is used as an optimistic concurrency lock. If someone else modified the resource since we loaded it, the request will fail. Since all resources on the api server have a resource version, the test client will default the resource version to "999", we could do the same thing for the parent resource since it is also assumed to exist on the api server. This version shows up in the patch bytes, and the expected parent will have a bumped resource version, so it's still something a user will need to have some knowledge of.

Unit testing clearing of a finalizer happens by setting the DeletionTimestamp. That may be helpful to document as well.

This is how k8s defines termination, but yea, I agree it's worth documenting here.

I think that's because PatchType: "application/strategic-merge-patch+json" is not available for CRDs, or is it?

Yep, the strategic merge patch isn't available for CRDs, switching to regular merge patch.

Finalizers allow a reconciler to clean up state for a resource that has
been deleted by a client, and not yet fully removed. Resources with
finalizers will stay in this terminating state until all finalizers are
cleared from the resource.

ParentReconcilers will now reconcile resources that are terminating as
long as they also have a pending finalizer.

SyncReconcilers have a new, optional, Finalize method that is called
when the resource is terminating. The Sync method will not be called
unless SyncDuringFinalization is true. Users are responsible to add and
clear finalizers from resources.

The AddParentFinalizer and ClearParentFinalizer functions will patch the
resource. These method work with cast parents objects and use the same
client the ParentReconciler used to originally load the parent resource,
so they can be called inside SubReconcilers that use a different client.

ChildReconcilers have a Finalizer field that when set will ensure that
value is set on the parent resource finalizers before creating/updating
a child resource. It will also clear that finalizer after the child is
deleted. When the parent is terminating, the DesiredChild method is not
called and existing children are deleted.

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
Comment on lines +870 to +876
Metadata: map[string]interface{}{
"SubReconciler": func(t *testing.T, c reconcilers.Config) reconcilers.SubReconciler {
r := defaultChildReconciler(c)
r.Finalizer = testFinalizer
return r
},
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]: The doc for Metadata says "Metadata contains arbitrary value that are stored with the test case". Why attach a function here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allows us to define the reconciler under test dynamically. The alternative is to setup a new table for each test case. This is a bit of a special case as we're trying to test the subreconciler edge behavior and not a user's business logic.

Comment on lines +862 to +869
Name: "delete child, clearing finalizer",
Parent: resourceReady.
MetadataDie(func(d *diemetav1.ObjectMetaDie) {
d.Finalizers(testFinalizer)
}),
GivenObjects: []client.Object{
configMapGiven,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]: Can you help me understand why this is a deletion case? I can't tell by looking at the preconditions Parent and GivenObjects. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior is part of the business logic for the particular reconciler under test. The test case immediate above results in an existing child resource being no longer desired, and deleted without involving a finalizer. This test case is laregly a copy of that test adding in a finalizer.

The delete behavior is driven from

if len(parent.Spec.Fields) == 0 {
return nil, nil
}

@mamachanko
Copy link
Collaborator

[off-topic]:

  • Had to catch up on config and logger being in the context now. The config-related commit says "SubReconcilers that need to interact with the parent resource should use the parent config via RetrieveParentConfig
    method."
    , but it's fine to use config.Client for generic, parent-unrelated API interaction too, isn't it?

[on-topic]:

  • Had to adjust RBAC for deletion of unwanted children. Before I was simply relying on garbage collection to clean away unwanted children. I am not sure I understand how rr figures out when to delete a child.
  • Had to adjust the child reconciler which was using ReflectChildStatusOnParent. Once the unwanted child got deleted,
    the child passed to ReflectChildStatusOnParent is nil and the controller may fall over. It's reasonable to check for a
    nil child, but it becomes much more important now I think. On the other hand I wonder if a ChildReconciler should be concerned about deletion at all.
  • Ouch, finalizers bite. Once the controller starts falling over, k8s needs some negotiating to delete the resource. kubectl edit ftw.
  • It works, @scothis! ⭐

@scothis
Copy link
Contributor Author

scothis commented Apr 23, 2022

  • Had to catch up on config and logger being in the context now.

Yea, I rebased the PR

The config-related commit says "SubReconcilers that need to interact with the parent resource should use the parent config via RetrieveParentConfig
method."
, but it's fine to use config.Client for generic, parent-unrelated API interaction too, isn't it?

That's will work when the clients are the same, which is the default. The parent and current client may be different if the child resource is being created using a client with a different service account, or on a different cluster. In these cases using the correct client will be important. Because it's a subtile change that may bite you later when the context shifts, loading the "correct" config now is recommended.

  • Had to adjust RBAC for deletion of unwanted children. Before I was simply relying on garbage collection to clean away unwanted children. I am not sure I understand how rr figures out when to delete a child.

The ChildReconciler should always have delete permission to the child resource. Before this PR, the ChildReconciler will delete child resources when there are multiple potential children (rare but can happen if the informers are lagging and there are two controllers running without leader election), or when the desired child is nil. After this change, the ChildReconciler will also delete the child resource if the parent resource is terminating as the desired child is implicitly nil.

We can document this better, opened #217 to track.

  • Had to adjust the child reconciler which was using ReflectChildStatusOnParent. Once the unwanted child got deleted,
    the child passed to ReflectChildStatusOnParent is nil and the controller may fall over. It's reasonable to check for a
    nil child, but it becomes much more important now I think. On the other hand I wonder if a ChildReconciler should be concerned about deletion at all.

At the point the parent resource is terminating, the status won't be updated, so there's no point in calling ReflectChildStatusOnParent, we can skip that.

  • Ouch, finalizers bite. Once the controller starts falling over, k8s needs some negotiating to delete the resource. kubectl edit ftw.

Yea, don't use finalizers unless you really need them. Something else to be careful of, when uninstalling a controller, delete the CRD before the controller, sa, and rbac permissions to make sure that any existing resources can be deleted cleanly. kapp change groups are a great way to handle this.

We can add that warning to the readme.

@@ -75,6 +76,8 @@ The [`SubReconciler`](https://pkg.go.dev/github.com/vmware-labs/reconciler-runti

The [`SyncReconciler`](https://pkg.go.dev/github.com/vmware-labs/reconciler-runtime/reconcilers#SyncReconciler) is the minimal type-aware sub reconciler. It is used to manage a portion of the parent's reconciliation that is custom, or whose behavior is not covered by another sub reconciler type. Common uses include looking up reference data for the reconciliation, or controlling resources that are not kubernetes resources.

When a resource is deleted that has pending finalizers, the Finalize method is called instead of the Sync method. If the SyncDuringFinalization field is true, the Sync method will also by called. If creating state that must be manually cleaned up, it is the users responsibility to define and clear finalizers. Using the [parent finalizer helper methods](#finalizers) is strongly encouraged with working under a [ParentReconciler](#parentreconciler).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I'm unsure of the need for the SyncDuringFinalization field, can you give me some usecases?

Is it to save folks from manually calling Sync during Finalize? Which given they would probably have compatible signatures, would be fairly easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main intent is to avoid needing to duplicate the same function for both Sync and Finalize.

You'd want to use the same logic when the sync reconciler is looking up reference data that is used during regular processing and finalization. Imagine a child reconciler that creates a resource in a different cluster than the parent, and the credentials are in a secret referenced by the parent resource. A sync reconciler can lookup the credentials and then stash the value to be used later. Since this logic is needed for both the Sync (create/update) and Finalize (delete) phases, it's cleaner to define it once and then call the method for both.

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@scothis scothis marked this pull request as ready for review May 3, 2022 15:07
Copy link
Collaborator

@mamachanko mamachanko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployed our kit with this branch. It checks out! Looking forward to using it in production.

LGTM @scothis 👌🏻

reconcilers/reconcilers.go Outdated Show resolved Hide resolved

// AggregateResults combines multiple results into a single result. If any result requests
// requeue, the aggregate is requeued. The shortest non-zero requeue after is the aggregate value.
func AggregateResults(results ...ctrl.Result) ctrl.Result {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question, non-blocking]: Why is AggregateResults exported? When would a consumer of rr use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to someone implementing a composed SubReconciler

Co-authored-by: Max Brauer <mamachanko@users.noreply.github.com>
@scothis scothis merged commit af55ebc into vmware-labs:main May 6, 2022
@scothis scothis deleted the finalize branch May 6, 2022 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants