Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of retryWhen to switch to a Spec/Builder model #1979

Merged
merged 3 commits into from Mar 18, 2020

Conversation

simonbasle
Copy link
Member

@simonbasle simonbasle commented Nov 28, 2019

This big commit is a large refactor of the retryWhen operator in order
to add several features.

Fixes #1978
Fixes #1905
Fixes #2063
Fixes #2052
Fixes #2064

This introduces a retryWhen variant based on a Retry functional
interface. This "function" deals not with a Flux of Throwable but of
RetrySignal. This allows retry function to check if there was some
success (onNext) since last retry attempt, in which case the current
attempt can be interpreted as if this was the first ever error.

This is especially useful for cases where exponential backoff delays
should be reset, for long lived sequences that only see intermittent
bursts of errors (transient errors).

We take that opportunity to offer a builder for such a function that
could take transient errors into account.

  • the Retry builders

Inspired by the Retry builder in addons, we introduce two classes:
RetrySpec and RetryBackoffSpec. We name them Spec and not Builder
because they don't require to call a build() method. Rather, each
configuration step produces A) a new instance (copy on write) that B)
is by itself already a Retry.

The Retry + xxxSpec approach allows us to offer 2 standard
strategies that both support transient error handling, while letting
users write their own strategy (either as a standalone Retry concrete
implementation, or as a builder/spec that builds one).

Both specs allow to handle transientErrors(boolean), which when true
relies on the extra state exposed by the RetrySignal. For the simple
case, this means that the remaining number of retries is reset in case
of onNext. For the exponential case, this means retry delay is reset to
minimum after an onNext (#1978).

Additionally, the introduction of the specs allows us to add more
features and support some features on more combinations, see below.

Previously we could only filter exceptions to be retried on the simple
long-based retry methods. With the specs we can filter in both
immediate and exponential backoff retry strategies.

The specs let the user configure two types of pre/post hooks.
Note that if the retry attempt is denied (eg. we've reached the maximum
number of attempts), these hooks are NOT executed.

Synchronous hooks (doBeforeRetry and doAfterRetry) are side effects
that should not block for too long and are executed right before and
right after the retry trigger is sent by the companion publisher.

Asynchronous hooks (doBeforeRetryAsync and doAfterRetryAsync) are
composed into the companion publisher which generates the triggers, and
they both delay the emission of said trigger in non-blocking and
asynchronous fashion. Having pre and post hooks allows a user to better
manage the order in which these asynchronous side effect should be
performed.

The Retry function implemented by both spec throw a RuntimeException
with a meaningful message when the configured maximum amount of attempts
is reached. That exception can be pinpointed by calling the utility
Exceptions.isRetryExhausted method.

For further customization, users can replace that default with their
own custom exception via onRetryExhaustedThrow. The BiFunction lets
user access the Spec, which has public final fields that can be
used to produce a meaningful message.

The old retryBackoff would internally use a flatMap, which can
cause issues. The Spec functions use concatMap.

/!\ CAVEAT

This commit deprecates all of the retryBackoff methods as well as the
original retryWhen (based on Throwable companion publisher) in order
to introduce the new RetrySignal based signature.

The use of Retry explicit type lifts any ambiguity when using the Spec
but using a lambda instead will raise some ambiguity at call sites of
retryWhen.

We deem that acceptable given that the migration is quite easy
(turn e -> whatever(e) to (Retry) rs -> whatever(rs.failure())).
Furthermore, retryWhen is an advanced operator, and we expect most
uses to be combined with the retry builder in reactor-extra, which lifts
the ambiguity itself.

@simonbasle simonbasle added the for/merge-with-rebase Reminder to merge the PR in rebase mode (several semantic commits) label Nov 28, 2019
RetryWhenOtherSubscriber other = new RetryWhenOtherSubscriber();
Subscriber<Throwable> signaller = Operators.serialize(other.completionSignal);

signaller.onSubscribe(Operators.emptySubscription());

CoreSubscriber<T> serial = Operators.serialize(s);

RetryWhenMainSubscriber<T> main =
new RetryWhenMainSubscriber<>(serial, signaller, source);
Runnable otherReset = () -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocation police (lol 😂):
WDYT about avoiding this Runnable and introducing a more specialised RetryWhenMainSubscriber OR passing a boolean (and other necessary variables) to the existing one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into passing the arguments directly to the ctor rather than relying on the Runnable capturing that state. That said allocation wise it is only one allocation per original subscription (not on the critical data path).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially started with passing these elements, but it was cumbersome

@simonbasle
Copy link
Member Author

@smaldini I will need to clean up commits in preparation of rebase-merge once this is approved

Copy link
Contributor

@smaldini smaldini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, could you elaborate on why waiting first onNext and not just onSubcribe

@simonbasle
Copy link
Member Author

simonbasle commented Dec 2, 2019

@smaldini if you're referring to resetTriggerOnNextElement, that boolean is there to avoid re-applying the when Function on each subsequent onNext. However we must react to the first onNext, not merely onSubscribe, because onSubscribe could very well be immediately followed by onError.

One typical scenario we want to cover is roughly:

onNext
onError -> retry 1s
onError -> retry 2s (no reset here otherwise it is not exponential backoff)
onError -> retry 4s
onNext -> reset
onNext (no unnecessary reset)
onNext
onNext
onError -> retry 1s, not 8s
onNext -> reset
onComplete

@codecov-io
Copy link

codecov-io commented Dec 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@a98d6fa). Click here to learn what that means.
The diff coverage is 81.3%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1979   +/-   ##
=========================================
  Coverage          ?   81.93%           
  Complexity        ?     4086           
=========================================
  Files             ?      380           
  Lines             ?    31214           
  Branches          ?     5796           
=========================================
  Hits              ?    25575           
  Misses            ?     4060           
  Partials          ?     1579
Impacted Files Coverage Δ Complexity Δ
...ain/java/reactor/core/publisher/MonoRetryWhen.java 100% <100%> (ø) 2 <0> (?)
...or-core/src/main/java/reactor/core/Exceptions.java 88.66% <100%> (ø) 61 <3> (?)
.../java/reactor/util/retry/ImmutableRetrySignal.java 100% <100%> (ø) 6 <6> (?)
...r-core/src/main/java/reactor/util/retry/Retry.java 100% <100%> (ø) 3 <3> (?)
...ore/src/main/java/reactor/core/publisher/Flux.java 95.87% <22.72%> (ø) 527 <3> (?)
...ore/src/main/java/reactor/core/publisher/Mono.java 89.18% <30%> (ø) 271 <3> (?)
...main/java/reactor/util/retry/RetryBackoffSpec.java 87.8% <87.8%> (ø) 27 <27> (?)
...ain/java/reactor/core/publisher/FluxRetryWhen.java 89.58% <90%> (ø) 4 <0> (?)
...re/src/main/java/reactor/util/retry/RetrySpec.java 93.22% <93.22%> (ø) 29 <29> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a98d6fa...754b5e9. Read the comment docs.

@simonbasle
Copy link
Member Author

@smaldini I tried a quite different approach. Also relates to #1905

Big refactor: introduce builder and change retryWhen companion

The builder is more simple than the one in addons, but covers some good
ground. It allows predicate on either exponential backoff strategy or
simple retry strategy.

We introduce Retry.State to replace Flux<Throwable> by Flux<State>
internally, allowing both strategies produced by the builder to look
at the number of subsequent failures as index rather than the global
failure index. This effectively allows "reset by onNext".

Old `retryWhen` decorates the user provided function to only look at
the exception.

We have only 1 builder, that switches from simple to backoff as soon
as one of the backoff configuration methods are invoked. One cannot
easily switch back. Factory methods help select the backoff strategy
right away.

@simonbasle simonbasle marked this pull request as ready for review December 6, 2019 16:04
@simonbasle
Copy link
Member Author

made it depend on #2039 (rebased) and squashed the rest of the commits with a detailed explanation in commit body.

@simonbasle simonbasle requested review from bsideup and rstoyanchev and removed request for smaldini February 7, 2020 17:18
@simonbasle
Copy link
Member Author

@rstoyanchev I'd be interested in your opinion about the Retry.Builder API, and notably the naming / documentation of the transientError method.

@simonbasle simonbasle changed the title 1978 retry when reset fix #1978 retry when reset Feb 7, 2020
@simonbasle simonbasle linked an issue Feb 7, 2020 that may be closed by this pull request
@simonbasle simonbasle removed the for/merge-with-rebase Reminder to merge the PR in rebase mode (several semantic commits) label Feb 17, 2020
@simonbasle
Copy link
Member Author

(updated the PR today with removal of the need for a new SwapSubscription)

@rstoyanchev
Copy link
Contributor

@simonbasle too bad but good you found out at this stage. I'm personally still not in favor of Supplier. I'd rather see a longer name like retryStrategy until deprecated methods are removed at which point a more convenient variant can be introduced.

One other thought. What if the Retry shortcuts were exposed on Flux and Mono:

public final Flux<T> retryMax(long max, Consumer<RetrySpec> consumer) {}

public final Flux<T> retryBackOff(long max, Duration min, Consumer<RetryBackoffSpec> consumer) {}

If a custom Retry isn't critical initially, it could be added later when it becomes possible.

@simonbasle
Copy link
Member Author

@simonbasle too bad but good you found out at this stage. I'm personally still not in favor of Supplier. I'd rather see a longer name like retryStrategy until deprecated methods are removed at which point a more convenient variant can be introduced.

One other thought. What if the Retry shortcuts were exposed on Flux and Mono:

public final Flux<T> retryMax(long max, Consumer<RetrySpec> consumer) {}

public final Flux<T> retryBackOff(long max, Duration min, Consumer<RetryBackoffSpec> consumer) {}

If a custom Retry isn't critical initially, it could be added later when it becomes possible.

retryMax / retryBackoff idea is basically going back to the current status quo with overloads. Or doesn't work well with the Spec either (we would need to reintroduce a build() or similar method for the purpose of producing the Flux). We do need Retry because we got requests around customizing the exception, using a predicate with backoff, introducing transient error handling, etc...

The Supplier indeed feels weird, but is the only way to introduce the API without breakage either now or later. We could add a different suffix, but then it would have to stay in 3.4. the When suffix echoes other operators that work with companions, like repeatWhen :(

@simonbasle
Copy link
Member Author

@rstoyanchev @bsideup updated the PR with one last commit getting rid of the Supplier<Retry> in favor of simply Retry parameter.

@simonbasle
Copy link
Member Author

note that my intent is now to squash the commits in this PR. the resulting commit will be quite big but at this point I doubt it is worth the trouble to try and re-split into meaningful commits per issue that this PR fixes.

@bsideup
Copy link
Contributor

bsideup commented Mar 11, 2020

@simonbasle aren't we breaking the source compatibility here?

@simonbasle
Copy link
Member Author

why not to use {@code ... } instead of <code> so you do not have to escape html special symbols?

done, thanks for the suggestion

@simonbasle
Copy link
Member Author

@simonbasle aren't we breaking the source compatibility here?

@bsideup not really, no. as discussed off-band that change may raise complaints from the compiler if a lambda is used with retryWhen in a codebase, but that is a tradeoff that we're willing to make because A) we expect that retryWhen(Function) is not widely used and B) such usage is probably encapsulated into a Function-generating method or helper (like addons' Retry) which lifts the ambiguity.

japicmp should catch source compatibility breakage.

/**
* Set the generator for the {@link Exception} to be propagated when the maximum amount of retries
* is exhausted. By default, throws an {@link Exceptions#retryExhausted(String, Throwable)} with the
* message reflecting the total attempt index, transien attempt index and maximum retry count.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "transien": -> "transient".

simonbasle added a commit that referenced this pull request Mar 18, 2020
This big commit is a large refactor of the `retryWhen` operator in order
to add several features.

Fixes #1978
Fixes #1905
Fixes #2063
Fixes #2052
Fixes #2064

 * Expose more state to `retryWhen` companion (#1978)

This introduces a retryWhen variant based on a `Retry` functional
interface. This "function" deals not with a Flux of `Throwable` but of
`RetrySignal`. This allows retry function to check if there was some
success (onNext) since last retry attempt, in which case the current
attempt can be interpreted as if this was the first ever error.

This is especially useful for cases where exponential backoff delays
should be reset, for long lived sequences that only see intermittent
bursts of errors (transient errors).

We take that opportunity to offer a builder for such a function that
could take transient errors into account.

 * the `Retry` builders

Inspired by the `Retry` builder in addons, we introduce two classes:
`RetrySpec` and `RetryBackoffSpec`. We name them Spec and not Builder
because they don't require to call a `build()` method. Rather, each
configuration step produces A) a new instance (copy on write) that B)
is by itself already a `Retry`.

The `Retry` + `xxxSpec` approach allows us to offer 2 standard
strategies that both support transient error handling, while letting
users write their own strategy (either as a standalone `Retry` concrete
implementation, or as a builder/spec that builds one).

Both specs allow to handle `transientErrors(boolean)`, which when true
relies on the extra state exposed by the `RetrySignal`. For the simple
case, this means that the remaining number of retries is reset in case
of onNext. For the exponential case, this means retry delay is reset to
minimum after an onNext (#1978).

Additionally, the introduction of the specs allows us to add more
features and support some features on more combinations, see below.

 * `filter` exceptions (#1905)

 Previously we could only filter exceptions to be retried on the simple
 long-based `retry` methods. With the specs we can `filter` in both
 immediate and exponential backoff retry strategies.

 * Add pre/post attempt hooks (#2063)

The specs let the user configure two types of pre/post hooks.
Note that if the retry attempt is denied (eg. we've reached the maximum
number of attempts), these hooks are NOT executed.

Synchronous hooks (`doBeforeRetry` and `doAfterRetry`) are side effects
that should not block for too long and are executed right before and
right after the retry trigger is sent by the companion publisher.

Asynchronous hooks (`doBeforeRetryAsync` and `doAfterRetryAsync`) are
composed into the companion publisher which generates the triggers, and
they both delay the emission of said trigger in non-blocking and
asynchronous fashion. Having pre and post hooks allows a user to better
manage the order in which these asynchronous side effect should be
performed.

 * Retry exhausted meaningful exception (#2052)

The `Retry` function implemented by both spec throw a `RuntimeException`
with a meaningful message when the configured maximum amount of attempts
is reached. That exception can be pinpointed by calling the utility
`Exceptions.isRetryExhausted` method.

For further customization, users can replace that default with their
own custom exception via `onRetryExhaustedThrow`. The BiFunction lets
user access the Spec, which has public final fields that can be
used to produce a meaningful message.

 * Ensure retry hooks completion is taken into account (#2064)

The old `retryBackoff` would internally use a `flatMap`, which can
cause issues. The Spec functions use `concatMap`.

 /!\ CAVEAT

This commit deprecates all of the retryBackoff methods as well as the
original `retryWhen` (based on Throwable companion publisher) in order
to introduce the new `RetrySignal` based signature.

The use of `Retry` explicit type lifts any ambiguity when using the Spec
but using a lambda instead will raise some ambiguity at call sites of
`retryWhen`.

We deem that acceptable given that the migration is quite easy
(turn `e -> whatever(e)` to `(Retry) rs -> whatever(rs.failure())`).
Furthermore, `retryWhen` is an advanced operator, and we expect most
uses to be combined with the retry builder in reactor-extra, which lifts
the ambiguity itself.
@simonbasle
Copy link
Member Author

[old PR body]:

FluxRetryWhen now allow to optionally re-apply the whenFunction whenever a first onNext is encountered after a resubscribe, by exposing a RetrySignal state that indicates how many times there was an error in a row.
This necessitates a new API (we can't rely solely on Flux<Throwable> companion like before). (fixes #1978)

A Retry.Builder is introduced to allow creation of Function that deal with this new state. It further allows predicates on the retry-with-backoff case (fixes #1905)

This PR also explores adding hooks before/after the standard retry Mono, both synchronous in doOn* style and asynchronous style. Note that the later modify the shape of the retry, ie can introduce further delays than a backoff without such hooks does. (fixes reactor/reactor-addons#220)

I have now squashed the changes into a single commit with detailed body.

Copy link
Contributor

@bsideup bsideup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@simonbasle don't forget to update the license headers

This big commit is a large refactor of the `retryWhen` operator in order
to add several features.

Fixes #1978
Fixes #1905
Fixes #2063
Fixes #2052
Fixes #2064

 * Expose more state to `retryWhen` companion (#1978)

This introduces a retryWhen variant based on a `Retry` functional
interface. This "function" deals not with a Flux of `Throwable` but of
`RetrySignal`. This allows retry function to check if there was some
success (onNext) since last retry attempt, in which case the current
attempt can be interpreted as if this was the first ever error.

This is especially useful for cases where exponential backoff delays
should be reset, for long lived sequences that only see intermittent
bursts of errors (transient errors).

We take that opportunity to offer a builder for such a function that
could take transient errors into account.

 * the `Retry` builders

Inspired by the `Retry` builder in addons, we introduce two classes:
`RetrySpec` and `RetryBackoffSpec`. We name them Spec and not Builder
because they don't require to call a `build()` method. Rather, each
configuration step produces A) a new instance (copy on write) that B)
is by itself already a `Retry`.

The `Retry` + `xxxSpec` approach allows us to offer 2 standard
strategies that both support transient error handling, while letting
users write their own strategy (either as a standalone `Retry` concrete
implementation, or as a builder/spec that builds one).

Both specs allow to handle `transientErrors(boolean)`, which when true
relies on the extra state exposed by the `RetrySignal`. For the simple
case, this means that the remaining number of retries is reset in case
of onNext. For the exponential case, this means retry delay is reset to
minimum after an onNext (#1978).

Additionally, the introduction of the specs allows us to add more
features and support some features on more combinations, see below.

 * `filter` exceptions (#1905)

 Previously we could only filter exceptions to be retried on the simple
 long-based `retry` methods. With the specs we can `filter` in both
 immediate and exponential backoff retry strategies.

 * Add pre/post attempt hooks (#2063)

The specs let the user configure two types of pre/post hooks.
Note that if the retry attempt is denied (eg. we've reached the maximum
number of attempts), these hooks are NOT executed.

Synchronous hooks (`doBeforeRetry` and `doAfterRetry`) are side effects
that should not block for too long and are executed right before and
right after the retry trigger is sent by the companion publisher.

Asynchronous hooks (`doBeforeRetryAsync` and `doAfterRetryAsync`) are
composed into the companion publisher which generates the triggers, and
they both delay the emission of said trigger in non-blocking and
asynchronous fashion. Having pre and post hooks allows a user to better
manage the order in which these asynchronous side effect should be
performed.

 * Retry exhausted meaningful exception (#2052)

The `Retry` function implemented by both spec throw a `RuntimeException`
with a meaningful message when the configured maximum amount of attempts
is reached. That exception can be pinpointed by calling the utility
`Exceptions.isRetryExhausted` method.

For further customization, users can replace that default with their
own custom exception via `onRetryExhaustedThrow`. The BiFunction lets
user access the Spec, which has public final fields that can be
used to produce a meaningful message.

 * Ensure retry hooks completion is taken into account (#2064)

The old `retryBackoff` would internally use a `flatMap`, which can
cause issues. The Spec functions use `concatMap`.

 /!\ CAVEAT

This commit deprecates all of the retryBackoff methods as well as the
original `retryWhen` (based on Throwable companion publisher) in order
to introduce the new `RetrySignal` based signature.

The use of `Retry` explicit type lifts any ambiguity when using the Spec
but using a lambda instead will raise some ambiguity at call sites of
`retryWhen`.

We deem that acceptable given that the migration is quite easy
(turn `e -> whatever(e)` to `(Retry) rs -> whatever(rs.failure())`).
Furthermore, `retryWhen` is an advanced operator, and we expect most
uses to be combined with the retry builder in reactor-extra, which lifts
the ambiguity itself.
Copy link
Contributor

@rstoyanchev rstoyanchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a few minor comments/suggestions.

reactor-core/src/main/java/reactor/util/retry/Retry.java Outdated Show resolved Hide resolved
* @see RetryBackoffSpec#maxAttempts(long)
* @see RetryBackoffSpec#minBackoff(Duration)
*/
static RetryBackoffSpec backoff(long maxAttempts, Duration minBackoff) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this have a marble diagram like the retryBackOff methods that are now deprecated and will go away? This will become the main entry point for the backoff strategy.

* @return the builder for further configuration
* @see RetrySpec#maxAttempts(long)
*/
static RetrySpec max(long max) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here a marble diagram like on retry(long)?

@simonbasle simonbasle changed the title Introduce Retry.Builder, allows reset on transient errors / predicates / retry hooks Refactor of retryWhen to switch to a Spec/Builder model Mar 18, 2020
@simonbasle simonbasle merged commit c0b6001 into master Mar 18, 2020
@simonbasle simonbasle deleted the 1978-retryWhenReset branch March 18, 2020 19:08
@simonbasle simonbasle added this to the 3.3.4.RELEASE milestone Mar 18, 2020
simonbasle added a commit that referenced this pull request Mar 19, 2020
Retry is now an abstract class, which lift the lambda ambiguity
(since this disallows lambda usage). To help expressing simple
Retry strategies as lambdas, a Retry.from(Function) is provided.

A couple of factories have been added:
 - indefinitely() is a RetrySpec configured to always immediately
retry (but can be further configured)
 - fixedDelay(maxAttempts, duration) is a RetryBackoffSpec
configured with a minBackoff equal to maxBackoff and no
jitter, effectively emulating a fixed delay retry

Additionally, a few polish are applied to the original PR:
 - javadocs have been improved
 - marble diagrams have been added to the Retry factories
 - for RetryBackoffSpec, one can now reset the default Scheduler
by passing null. That default is also lazily resolved, making it
compatible with StepVerifier#withVirtualTime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants