Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Backoff-based error policy #314

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nightkr
Copy link
Member

@nightkr nightkr commented Aug 30, 2020

This adds an error_policy based on the backoff crate, which gives us the ability to use stateful error policies. It also comes with some useful default implementations, such as exponential backoff, retry immediately, and constant retry.

This also has some side effects with breaking changes. Notably:

  1. error_policy is now told about successful reconciliations. This is
    required so that stateful error policies can reset upon successes.

  2. error_policy is now told the reference to the reconciled object.

  3. controller::Error is now keyed by the object type. This isn't strictly
    required, but allows for some internal cleanup.

  4. Controller::run takes a Backoff factory rather than a raw error_policy.
    This provides a much cleaner interface for custom policies, and the backoff
    crate provides both constant and exponential policies. If you need the extra
    control that the old interface provided, use the "hard-mode"
    controller::applier API.

Fixes #297. Arguably fixes #34.

This also has some side effects with breaking changes. Notably:

1. `error_policy` is now told about successful reconciliations. This is
required so that stateful error policies can reset upon successes.

2. `error_policy` is now told the reference to the reconciled object.

3. `controller::Error` is now keyed by the object type. This isn't strictly
required, but allows for some internal cleanup.

4. `Controller::run` takes a `Backoff` factory rather than a raw `error_policy`.
This provides a much cleaner interface for custom policies, and the `backoff`
crate provides both constant and exponential policies. If you need the extra
control that the old interface provided, use the "hard-mode"
`controller::applier` API.
@nightkr nightkr requested a review from clux August 30, 2020 01:38
The signature was getting unwieldy, and with the recent changes it was hard to
know whether all instances had "kept up".

This also gets rid of the weirdness where `error_policy` could rewrite the
`ReconcilerAction` of a successful reconciliation.
@nightkr
Copy link
Member Author

nightkr commented Aug 30, 2020

One downside of this PR is that it worsens the ergonomics of error policies that differ depending on the error case...

clux
clux previously approved these changes Aug 31, 2020
Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments as I was reading through. It looks solid, even if it makes the interface sightly more awkward. I think therefore we should put the error policy on a separate instruction to make that less clumsy. Internals seem super solid though.

kube-runtime/src/controller.rs Show resolved Hide resolved
examples/configmapgen_controller.rs Show resolved Hide resolved
#[derive(Debug)]
pub struct BackoffErrorPolicy<MkBackoff, B, K: RuntimeResource, Err, Ctx> {
make_backoff: MkBackoff,
backoffs: HashMap<ObjectRef<K>, B>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh. One back off per object. Interesting. I thought we would have one global one. This makes sense though. Probably smarter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I /think/ we should encourage backoff-per-object by default. There are cases where global backoffs make sense (gating access to some central resource for example), but you probably want to gate those by specific error cases..

kube-runtime/src/controller.rs Show resolved Hide resolved
kube-runtime/src/controller.rs Show resolved Hide resolved
kube-runtime/src/controller.rs Show resolved Hide resolved
kube-runtime/src/controller.rs Show resolved Hide resolved
@@ -261,7 +360,7 @@ where
/// let cms = Api::<ConfigMap>::all(client.clone());
/// Controller::new(cmgs, ListParams::default())
/// .owns(cms, ListParams::default())
/// .run(reconcile, error_policy, context)
/// .run(reconcile, || backoff::backoff::Constant::new(Duration::from_secs(60)), context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have a .backoff builder on Controller here for readability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside there is that we either need to introduce a new generic or box the Backoff.

Copy link
Member

@clux clux Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, does it get awkward if we try to do something like:

Controller::(..)
    .error(|| backoff::Constant::new(Duration::from_secs(60)))
    .run(reconcile, context)

I guess I wanted to have a default like what you have there without having to be explicit about the lambda.
With maybe https://docs.rs/backoff/0.2.1/backoff/type.ExponentialBackoff.html as the default if omitted.

@clux
Copy link
Member

clux commented Sep 28, 2020

Sorry for the slow responses here. This is ultimately great. Have approved it so merge if you are happy, otherwise there's a bit of bikeshedding in the comments (closed all my exploratory ones).

@clux
Copy link
Member

clux commented Nov 11, 2020

Having thought about this some more. You seem unsure about this as well (since it is still here), and there are some downsides to this that is making me more sceptical:

  • the user ergonomics, as you say, feel a bit awkward with a lambda in the signature
  • it is somewhat complex internally
  • backoff constitutes an extra dependency (with runtime deps) that no one but me actually asked for
  • maybe this is more heavyweight than required (given that most controller-runtime based controllers always just have a flat ~300s)
  • not clear what a default backoff retry should be after a first failure, 1s, 10s, 100s?

I think our lack of defining a good default start time here is the most problematic. All those values (1s/10s/100s), feel sensible depending on what needs to be created of the back of a CRD. So users probably want to set a minimum time to avoid a bunch of initial errors in logs from early retries.

However, doing so in ExponentialBackoff looks pretty awkward to me. Manually creating the instance, selecting a generic clock, possibly using its Default impl.

I'm starting to think this is probably not worth merging at the moment. Not sure what you think?
Maybe this is something we put on the back-burner for now until we have more info?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants