Provide an async do on retry callback #2063

remk · 2020-01-24T13:03:12Z

Motivation

The Retry.doOnRetry allow to register a synchronous callback. At the time being if we want to use an async operation in this callback which finish before the next retry is attempted we have to block on it. This can cause some locks and was not usable. This was reported in the gitter channel reactor/reactor by @mbaechler.

Hi there, I have a problem with using retryWhen on my Mono. Basically, what I want to do is:

    put a REST resource somewhere
    If failing with a "parent resource doesn't exist", retry using Retry.onlyIf and doOnRetry creating the parent resource

However, doOnRetry is called synchronously, so the REST request I perform here requires a call to block, triggering in some cases a starvation of my threads

Do you know any way to implement that in a different way?

Desired solution

So it would be good to be able to register an async callback with an equivalent to Retry.doOnRetry for example :

 public Retry<T> onRetryWithMono(Function<? super RetryContext<T>, Mono<?>> onRetryMono)

The text was updated successfully, but these errors were encountered:

Add Retry.onRetryWithMono which is the async equivalent to the Retry.doOnRetry operation. This allow to do some async operation on retry without having to block, which could cause some locks.

rouazana · 2020-01-24T13:30:06Z

Here is a similar issue: https://stackoverflow.com/questions/55924849/on-error-do-another-call-and-retry-in-webflux/55937290

This would avoid the ugly workaround shown is answers.

simonbasle · 2020-01-24T16:01:10Z

At first this felt like bending the intent of Retry builder, which was initially thought around the combination of counter + delays. doOnRetry could also be seen as a side-effect, that are not supposed to introduce blocking behavior / latency.

That said, why not having this way of further customizing the retry attempt. Two caveats:

we need to come up with a better name for it
the documentation must be very explicit that this changes the delay between two attempts. eg. if you use that + linearBackoff(Duration.ofSeconds(1)), the backoff is not guaranteed to be 1s anymore (depending on how long the custom Mono takes to terminate).

In balancing whether or not to add it, let's also consider that:

the "ugly" code in the stackoverflow question stems from a more complex retry case than the gitter one, and can probably be made a bit more palatable by putting the Function in a constant, or as a class method reference
Simpler cases where only an action + single retry is needed, the onErrorResume approach below can be used:

operation.retry(e -> isTransient(e), 10)
    .onErrorResume(PreconditionException.class, e -> prepareOperation().then(operation));

simonbasle · 2020-01-24T16:04:54Z

The two alternatives with pure core operators:

public final Function<Flux<Throwable>, Publisher<String>> retryOrFixPrecondition(int max, Predicate<Throwable> canRetry,
		Predicate<Throwable> isPreconditionError, Mono<String> fixPrecondition) {
	return ft -> ft
			.index()
			.flatMap(idxError -> {
				if (isPreconditionError.test(idxError.getT2())) {
					return fixPrecondition;
				}
				if (!canRetry.test(idxError.getT2()) || idxError.getT1() > max) {
					throw Exceptions.propagate(idxError.getT2());
				}
				return Mono.just("retry");
			});
}

@Test
public void retrySpecialErrorWithPrecondition() {
	//simulate a source with transient errors + a precondition error that needs
	//specific handling then retry
	AtomicInteger count = new AtomicInteger(0);
	AtomicBoolean prerequisite = new AtomicBoolean();
	Mono<String> operation = Mono.defer(() -> {
		int c = count.getAndIncrement();
		if (c < 2) {
			throw new IllegalStateException("TRANSIENT FAILURE " + c);
		}
		if (prerequisite.get()) {
			return Mono.just("SUCCESS");
		}
		throw new IllegalArgumentException("PREREQUISITE");
	});

	//simulate a long running asynchronous operation that updates state to match
	//the precondition
	Mono<String> prepare = Mono.delay(Duration.ofSeconds(3))
	                           .doOnNext(ignored -> prerequisite.set(true))
	                           .thenReturn("retry");

	//predicate for errors that can be retried
	Predicate<Throwable> shouldRetry = p -> p instanceof IllegalStateException
			&& p.getMessage() != null
			&& p.getMessage().startsWith("TRANSIENT");

	//predicate for the error that reports precondition is not met
	Predicate<Throwable> isMissingPrerequisite = p -> p instanceof IllegalArgumentException
			&& "PREREQUISITE".equals(p.getMessage());

	//APPROACH 1, capable of retrying further in case of transient failures after
	//the prerequisite state is updated
	Mono<String> retryWhen = operation
			.retryWhen(retryOrFixPrecondition(10, shouldRetry, isMissingPrerequisite, prepare));

	//APPROACH 2, if you want to retry other exceptions but update prerequisite state
	// and re-attempt the operation only once
	Mono<String> retryResume = operation
			.retry(10, shouldRetry)
			.onErrorResume(isMissingPrerequisite, e -> prepare.then(operation));


	//testing all this
	retryWhen.as(StepVerifier::create)
	         .expectNext("SUCCESS")
	         .verifyComplete();
	count.set(0); prerequisite.set(false);
	retryResume.as(StepVerifier::create)
	           .expectNext("SUCCESS")
	           .verifyComplete();
}

remk · 2020-01-27T08:24:42Z

Thank you for your suggestions, but it means we should re-implement the backoff and jitter we rely on with Retry.

simonbasle · 2020-01-27T11:02:43Z

@remk I see. We'll discuss this. Also note that we're thinking about bringing the builder pattern into the core library in parallel, so maybe that new feature should be part of this effort core-side rather than addons-side.

Also use concatMap instead of flatMap to be 100% sure and set a sane precedent.

This big commit is a large refactor of the `retryWhen` operator in order to add several features. Fixes #1978 Fixes #1905 Fixes #2063 Fixes #2052 Fixes #2064 * Expose more state to `retryWhen` companion (#1978) This introduces a retryWhen variant based on a `Retry` functional interface. This "function" deals not with a Flux of `Throwable` but of `RetrySignal`. This allows retry function to check if there was some success (onNext) since last retry attempt, in which case the current attempt can be interpreted as if this was the first ever error. This is especially useful for cases where exponential backoff delays should be reset, for long lived sequences that only see intermittent bursts of errors (transient errors). We take that opportunity to offer a builder for such a function that could take transient errors into account. * the `Retry` builders Inspired by the `Retry` builder in addons, we introduce two classes: `RetrySpec` and `RetryBackoffSpec`. We name them Spec and not Builder because they don't require to call a `build()` method. Rather, each configuration step produces A) a new instance (copy on write) that B) is by itself already a `Retry`. The `Retry` + `xxxSpec` approach allows us to offer 2 standard strategies that both support transient error handling, while letting users write their own strategy (either as a standalone `Retry` concrete implementation, or as a builder/spec that builds one). Both specs allow to handle `transientErrors(boolean)`, which when true relies on the extra state exposed by the `RetrySignal`. For the simple case, this means that the remaining number of retries is reset in case of onNext. For the exponential case, this means retry delay is reset to minimum after an onNext (#1978). Additionally, the introduction of the specs allows us to add more features and support some features on more combinations, see below. * `filter` exceptions (#1905) Previously we could only filter exceptions to be retried on the simple long-based `retry` methods. With the specs we can `filter` in both immediate and exponential backoff retry strategies. * Add pre/post attempt hooks (#2063) The specs let the user configure two types of pre/post hooks. Note that if the retry attempt is denied (eg. we've reached the maximum number of attempts), these hooks are NOT executed. Synchronous hooks (`doBeforeRetry` and `doAfterRetry`) are side effects that should not block for too long and are executed right before and right after the retry trigger is sent by the companion publisher. Asynchronous hooks (`doBeforeRetryAsync` and `doAfterRetryAsync`) are composed into the companion publisher which generates the triggers, and they both delay the emission of said trigger in non-blocking and asynchronous fashion. Having pre and post hooks allows a user to better manage the order in which these asynchronous side effect should be performed. * Retry exhausted meaningful exception (#2052) The `Retry` function implemented by both spec throw a `RuntimeException` with a meaningful message when the configured maximum amount of attempts is reached. That exception can be pinpointed by calling the utility `Exceptions.isRetryExhausted` method. For further customization, users can replace that default with their own custom exception via `onRetryExhaustedThrow`. The BiFunction lets user access the Spec, which has public final fields that can be used to produce a meaningful message. * Ensure retry hooks completion is taken into account (#2064) The old `retryBackoff` would internally use a `flatMap`, which can cause issues. The Spec functions use `concatMap`. /!\ CAVEAT This commit deprecates all of the retryBackoff methods as well as the original `retryWhen` (based on Throwable companion publisher) in order to introduce the new `RetrySignal` based signature. The use of `Retry` explicit type lifts any ambiguity when using the Spec but using a lambda instead will raise some ambiguity at call sites of `retryWhen`. We deem that acceptable given that the migration is quite easy (turn `e -> whatever(e)` to `(Retry) rs -> whatever(rs.failure())`). Furthermore, `retryWhen` is an advanced operator, and we expect most uses to be combined with the retry builder in reactor-extra, which lifts the ambiguity itself.

simonbasle referenced this issue Feb 18, 2020

WIP reactor/reactor-addons#220 Add pre/post attempt hooks

8303bbe

simonbasle referenced this issue Feb 19, 2020

WIP reactor/reactor-addons#220 Add pre/post attempt hooks

c4d5e23

simonbasle transferred this issue from reactor/reactor-addons Mar 2, 2020

simonbasle added the type/enhancement A general enhancement label Mar 2, 2020

simonbasle mentioned this issue Mar 2, 2020

retryBackoff should be based off a concatMap rather than a flatMap #2064

Closed

simonbasle added a commit that referenced this issue Mar 2, 2020

fix #2063 Ensure retry hooks completion is taken into account

7c0b7a6

Also use concatMap instead of flatMap to be 100% sure and set a sane precedent.

simonbasle added a commit that referenced this issue Mar 2, 2020

WIP #2063 Add pre/post attempt hooks

8df5cd5

simonbasle added a commit that referenced this issue Mar 9, 2020

WIP #2063 Add pre/post attempt hooks

e94f1f4

simonbasle added a commit that referenced this issue Mar 10, 2020

WIP #2063 Add pre/post attempt hooks

feca44b

simonbasle added a commit that referenced this issue Mar 16, 2020

WIP #2063 Add pre/post attempt hooks

7dfab58

simonbasle mentioned this issue Mar 18, 2020

Refactor of retryWhen to switch to a Spec/Builder model #1979

Merged

simonbasle added this to the 3.3.4.RELEASE milestone Mar 18, 2020

simonbasle closed this as completed in #1979 Mar 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an async do on retry callback #2063

Provide an async do on retry callback #2063

remk commented Jan 24, 2020

rouazana commented Jan 24, 2020

simonbasle commented Jan 24, 2020

simonbasle commented Jan 24, 2020 •

edited

remk commented Jan 27, 2020

simonbasle commented Jan 27, 2020

Provide an async do on retry callback #2063

Provide an async do on retry callback #2063

Comments

remk commented Jan 24, 2020

Motivation

Desired solution

rouazana commented Jan 24, 2020

simonbasle commented Jan 24, 2020

simonbasle commented Jan 24, 2020 • edited

remk commented Jan 27, 2020

simonbasle commented Jan 27, 2020

simonbasle commented Jan 24, 2020 •

edited