Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report Pending status despite hub template error #139

Conversation

JustinKuli
Copy link
Member

A previous change allowing hub-templates for all policy types, and handling any hub-template errors in this controller, caused a change in behavior. Before that change, when a policy both had a hub-template error and an unsatisfied dependency, the policy would be marked as Pending. The change made the hub-template error have priority.

This commit gives the Pending status priority, and adds a test for that specific behavior.

Refs:

@@ -676,6 +678,17 @@ func (r *PolicyReconciler) Reconcile(ctx context.Context, request reconcile.Requ
continue
}

// check for hub template error
if errAnno := metaObj.GetAnnotations()[hubTmplErrorKey]; errAnno != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, the ConfigurationPolicy that already exists would remain on the cluster even if there was a hub template error. This was also previously true in previous OCM versions, however, the config-policy-controller would not process these invalid ConfigurationPolicies. So I think this may be an unintended change in behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. And your comment exposes the bug in the implementation just before this commit: while it prevents creation when there is a hub-template error, it doesn't remove an existing instance.

So it seems like it should delete the existing object in this situation. But now I'm wondering if we need to be very careful with configuration policies here, because of PruneObjectBehavior? What do you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli good point. It's hacky, but we could unset pruneObjectBehavior before deleting it. You lose the metadata on the status but at least it doesn't delete the object.

Alternatively, we could have template sync not handle ConfigurationPolicy hub template errors but handle all other policy kinds. Then the config-policy-controller would continue to handle them like before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newly pushed change deals with it in the template-sync, and specially sets pruneObjectBehavior on ConfigurationPolicies in this situation.

A previous change allowing hub-templates for all policy types, and
handling any hub-template errors in this controller, caused a change in
behavior. Before that change, when a policy both had a hub-template
error *and* an unsatisfied dependency, the policy would be marked as
Pending. The change made the hub-template error have priority.

This commit gives the Pending status priority, and adds a test for that
specific behavior. It also handles ConfigurationPolicies particularly
carefully: setting `PruneObjectBehavior` to `None` when the template
will be deleted specifically because of a hub-template error.

Refs:
 - https://issues.redhat.com/browse/ACM-11530

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli JustinKuli force-pushed the 11530-pending-over-hubtmplerr branch from 131a2b5 to bd644e6 Compare May 13, 2024 16:35
resultError = emitErr
// not found should consider creating it
if k8serrors.IsNotFound(err) {
if len(dependencyFailures) > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was moved inside the IsNotFound part: previously, it was possible that other errors would not be re-queued if the template was supposed to be Pending. Since those errors are rare, and this is already a complicated situation, I'm not sure if this ever caused any strange problems, but it's possible.

Copy link

openshift-ci bot commented May 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, mprahl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants