-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated checkpoints added when using retry #2774
Comments
In the FluxAssembly implementation, the operator is checking for the presence of checkpoint exception tuple. Problem is that |
without much tests nor comments nor git commit comments to provide more context, I have trouble understanding the rationale behind this removing it doesn't seem to fail any test. On a side note, the additional entries to the backtrace are only there because the |
This commit removes the `i` counter that was effectively guaranteeing duplicated entries in the backtrace (eg. with a checkpoint) in the case where the triggering exception was reused. The only differentiator was then the indentation (driven by `i`). This is especially visible when using a `checkpoint(String)` operator along a `retry`. Provided the same exception instance is emitted on each retry loop, `checkpoint` will add to the backtrace on each loop. This commit removes the `i` from the tracking of existing backtraces and thus ensures that for the same parent subscriber, prefix and description there will be only one entry in the backtrace. Fixes #2774.
actually, this one test fails:
are we losing any information by removing these extra-indented "FluxPublish -> at ..." entries ? |
I have a change that better cater to multiple paths in the To the best of my knowledge I don't know of a tool that would parse the traceback, but you never know. Any insight to provide @osi @smaldini ? (let's also cc @bsideup who worked on the agent and on tracebacks in the past, he might also have insights) |
to clarify, from #2780 the current proposal looks like this:
|
@simonbasle I'm not parsing tracebacks, and don't have awareness of any tooling around it. The new form looks great!! |
this will be re-targeted to 3.4.x only |
This commit reworks how lines are added to the traceback second section, which displays callsites where an exception has been propagated. It rework the presentation of that section to cater for more complex cases. As a benefit, it fixes a situation where an unbounded number of retries over a singleton exception + a checkpoint leads to infinitely growing traceback. It also polishes HooksTraceTest to result in less spammy logs, and updates the reference guide documentation to reflect the new style (as well as fixing some outdated content). The change especially helps when the exception is instantiated at publisher level but is propagated to several subscribe paths (eg. multiple chains assembled from the same source, or a single chain that resubscribe like with a retry operator). Previously, lines where effectively guaranteed to be added because the stored Tuple4 included a `i` variable that was always incremented. Now these traceback lines are stored in a `Map`, and effort is made to detect "multicasting" of the error. Lines that appear more than once are counted and this is reflected in the traceback line after the callsite (` (observed x times)`). An effort is made to group operators that belong to distinct chains together under a "root", which get repeated and is denoted by a `*_` prefix and `_` left-padding. Instead of: ``` |_ rootOp -> callsite1 |_ chainOneOp1 -> callsite2 |_ chainOneOp2 -> callsite3 |_ chainTwoOp1 -> callsite4 ``` We now get: ``` |*______rootOp -> callsite1 (observed 2 times) |_ chainOneOp1 -> callsite2 |_ chainOneOp2 -> callsite3 |*______rootOp -> callsite1 (observed 2 times) |_ chainTwoOp1 -> callsite4 ``` This makes it clearer that the error propagates through `rootOp`, which is used to construct two distinct chains (and subscriptions to both chains end up seeing the same instance of the error). Fixes #2774.
Expected Behavior
There should be a single checkpoint
Actual Behavior
There are multiple checkpoints.
Steps to Reproduce
It is generating multiple checkpoints. Only a single one is expected
Your Environment
I observed this in our full application where the checkpointed stream is ultimately a
r2dbc-postgresql
notification stream prefixed by a query usingmergeSequential
. There was a problem connecting to the DB, and that was being retried. The same behavior was observed, and this is a distillation of that. As I don't have a full reproduction with our entire stack, my hope is that fixing this also fixes what we observed.The text was updated successfully, but these errors were encountered: