Repeated checkpoints added when using retry #2774

osi · 2021-09-17T01:31:57Z

Expected Behavior

There should be a single checkpoint

Actual Behavior

There are multiple checkpoints.

Steps to Reproduce

@Test
void reproCase() {
        StepVerifier.create(Flux.error(() -> new RuntimeException("rawr"))
                                .cache()
                                .checkpoint("nom")
                                .retryWhen(Retry.max(3))
                    )
                    .expectErrorSatisfies(t -> assertThat(Arrays.asList(t.getCause().getSuppressed()))
                        .hasSize(1)
                        .first(THROWABLE)
                        .hasMessage("\n" +
                                    "Error has been observed at the following site(s):\n" +
                                    "\t|_ checkpoint ⇢ nom\n" +
                                    "Stack trace:"))
                    .verify(Duration.ofSeconds(10));
}

It is generating multiple checkpoints. Only a single one is expected

	|_ checkpoint ⇢ nom
	|_____ checkpoint ⇢ nom
	|_________ checkpoint ⇢ nom
	|_____________ checkpoint ⇢ nom

Your Environment

Reactor version(s) used: 3.4.9

I observed this in our full application where the checkpointed stream is ultimately a r2dbc-postgresql notification stream prefixed by a query using mergeSequential. There was a problem connecting to the DB, and that was being retried. The same behavior was observed, and this is a distillation of that. As I don't have a full reproduction with our entire stack, my hope is that fixing this also fixes what we observed.

The text was updated successfully, but these errors were encountered:

smaldini · 2021-09-17T01:35:29Z

In the FluxAssembly implementation, the operator is checking for the presence of checkpoint exception tuple. Problem is that i keeps being incremented and contains will always fail the check. I can't remember the rationale behind the increment but if the checkpoint is exactly identical (same parent, description) there shouldnt be a need for increment as part of the tuple. Without i, the tuple would match the correct hashcode.

simonbasle · 2021-09-21T10:38:54Z

without much tests nor comments nor git commit comments to provide more context, I have trouble understanding the rationale behind this i either :(

removing it doesn't seem to fail any test.

On a side note, the additional entries to the backtrace are only there because the "rawr" exception is effectively passed to checkpoint's onError multiple time due to the cache().
It would have been equivalent to do Flux.onError(new RuntimeException("rawr")).

This commit removes the `i` counter that was effectively guaranteeing duplicated entries in the backtrace (eg. with a checkpoint) in the case where the triggering exception was reused. The only differentiator was then the indentation (driven by `i`). This is especially visible when using a `checkpoint(String)` operator along a `retry`. Provided the same exception instance is emitted on each retry loop, `checkpoint` will add to the backtrace on each loop. This commit removes the `i` from the tracking of existing backtraces and thus ensures that for the same parent subscriber, prefix and description there will be only one entry in the backtrace. Fixes #2774.

simonbasle · 2021-09-21T12:01:50Z

actually, this one test fails:

reactor-core/reactor-core/src/test/java/reactor/core/publisher/HooksTest.java

Line 738 in 9996658

public void testMultiReceiver() throws Exception {

are we losing any information by removing these extra-indented "FluxPublish -> at ..." entries ?

simonbasle · 2021-09-24T08:04:20Z

I have a change that better cater to multiple paths in the Error has been observed at the following site(s) section, but I wonder how disruptive it it to change the representation of the tracebacks 🤔

To the best of my knowledge I don't know of a tool that would parse the traceback, but you never know.

Any insight to provide @osi @smaldini ?

(let's also cc @bsideup who worked on the agent and on tracebacks in the past, he might also have insights)

simonbasle · 2021-09-24T08:06:20Z

to clarify, from #2780 the current proposal looks like this:

Assembly trace from producer [reactor.core.publisher.MonoDefer] :
	reactor.core.publisher.Mono.defer(Mono.java:213)
	reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:747)
Error has been observed at the following site(s):
	*____Mono.defer ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:747)
	*_____Flux.then ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:747)
	|_    Mono.flux ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:749)
	|_ Flux.publish ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:749) (observed 2 times)
	|_     Flux.map ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:751)
	*_____Flux.then ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:747)
	|_    Mono.flux ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:749)
	|_ Flux.publish ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:749) (observed 2 times)
	|_  Flux.filter ⇢ at reactor.core.publisher.HooksTest.testMultiReceiver(HooksTest.java:761)

osi · 2021-09-24T21:41:06Z

@simonbasle I'm not parsing tracebacks, and don't have awareness of any tooling around it.

The new form looks great!!

simonbasle · 2021-09-27T13:57:25Z

this will be re-targeted to 3.4.x only

This commit reworks how lines are added to the traceback second section, which displays callsites where an exception has been propagated. It rework the presentation of that section to cater for more complex cases. As a benefit, it fixes a situation where an unbounded number of retries over a singleton exception + a checkpoint leads to infinitely growing traceback. It also polishes HooksTraceTest to result in less spammy logs, and updates the reference guide documentation to reflect the new style (as well as fixing some outdated content). The change especially helps when the exception is instantiated at publisher level but is propagated to several subscribe paths (eg. multiple chains assembled from the same source, or a single chain that resubscribe like with a retry operator). Previously, lines where effectively guaranteed to be added because the stored Tuple4 included a `i` variable that was always incremented. Now these traceback lines are stored in a `Map`, and effort is made to detect "multicasting" of the error. Lines that appear more than once are counted and this is reflected in the traceback line after the callsite (` (observed x times)`). An effort is made to group operators that belong to distinct chains together under a "root", which get repeated and is denoted by a `*_` prefix and `_` left-padding. Instead of: ``` |_ rootOp -> callsite1 |_ chainOneOp1 -> callsite2 |_ chainOneOp2 -> callsite3 |_ chainTwoOp1 -> callsite4 ``` We now get: ``` |*______rootOp -> callsite1 (observed 2 times) |_ chainOneOp1 -> callsite2 |_ chainOneOp2 -> callsite3 |*______rootOp -> callsite1 (observed 2 times) |_ chainTwoOp1 -> callsite4 ``` This makes it clearer that the error propagates through `rootOp`, which is used to construct two distinct chains (and subscriptions to both chains end up seeing the same instance of the error). Fixes #2774.

reactorbot added the ❓need-triage This issue needs triage, hasn't been looked at by a team member yet label Sep 17, 2021

simonbasle added status/need-decision This needs a decision from the team status/need-investigation This needs more in-depth investigation and removed ❓need-triage This issue needs triage, hasn't been looked at by a team member yet labels Sep 20, 2021

simonbasle mentioned this issue Sep 21, 2021

Prevent unnecessary duplicated lines in backtraces #2780

Merged

1 task

simonbasle linked a pull request Sep 22, 2021 that will close this issue

Prevent unnecessary duplicated lines in backtraces #2780

Merged

1 task

simonbasle added type/enhancement A general enhancement and removed status/need-investigation This needs more in-depth investigation labels Sep 24, 2021

simonbasle added this to the 3.3.21.RELEASE milestone Sep 24, 2021

simonbasle modified the milestones: 3.3.21.RELEASE, 3.4.11 Sep 27, 2021

simonbasle removed the status/need-decision This needs a decision from the team label Sep 27, 2021

simonbasle closed this as completed in #2780 Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated checkpoints added when using retry #2774

Repeated checkpoints added when using retry #2774

osi commented Sep 17, 2021

smaldini commented Sep 17, 2021

simonbasle commented Sep 21, 2021 •

edited

simonbasle commented Sep 21, 2021

simonbasle commented Sep 24, 2021 •

edited

simonbasle commented Sep 24, 2021

osi commented Sep 24, 2021

simonbasle commented Sep 27, 2021

Repeated checkpoints added when using retry #2774

Repeated checkpoints added when using retry #2774

Comments

osi commented Sep 17, 2021

Expected Behavior

Actual Behavior

Steps to Reproduce

Your Environment

smaldini commented Sep 17, 2021

simonbasle commented Sep 21, 2021 • edited

simonbasle commented Sep 21, 2021

simonbasle commented Sep 24, 2021 • edited

simonbasle commented Sep 24, 2021

osi commented Sep 24, 2021

simonbasle commented Sep 27, 2021

simonbasle commented Sep 21, 2021 •

edited

simonbasle commented Sep 24, 2021 •

edited