-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancel job with incident when canceling the process instance #9219
Cancel job with incident when canceling the process instance #9219
Conversation
3bab30a
to
8fb143e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question/concern.
Please also create follow-up issues to address the general change you alluded to in your description (resolving incidents should not activate the Job)
engine/src/main/java/io/camunda/zeebe/engine/processing/bpmn/behavior/BpmnJobBehavior.java
Outdated
Show resolved
Hide resolved
This adds a regression test against activating jobs of cancelled process instances, by making sure they are cancelled (i.e. deleted). In this specific case, an uncaught error was thrown on a job which led to an UNHANDLED_ERROR_EVENT incident. When the process instance is cancelled, the job was not called, but instead was made activatable again. This happened, because the incident was resolved as part of the process instance termination logic. Instead, we should guarantee that the job is cancelled. A cancelled job can no longer be made re-activatable. This is a terminal state for jobs, because a cancelled job is a deleted job. See #8588 for more details.
This makes it easier to add more values to the predicate, while staying understandable.
Jobs in the error_thrown state should also be able to be cancelled. By adding it to this list, we make sure that the job with error_thrown state can be cancelled by the engine when the element it belongs to is being terminated.
9da335b
to
d612677
Compare
We need to make sure the job is cancelled immediately (synchronously) because otherwise the IncidentResolvedApplier may still make the job activatable again. To cancel the job immediately we simply need to use the statewriter to write Job Cancelled, instead of using the commandwriter to write Cancel Job. The statewriter will immediately apply the event when writing it.
Since the command is no longer written, the test cases that were verifying that the command was written should be updated to verify that the job canceled event is written in the same spot. Note that in the BoundaryEventTest it highlights the difference between asynchronously canceling the job with a command and synchronously (immediately) canceling the job with an event.
The ability to read a metric value will be useful in future Metrics tests.
This adds tests that verify that the job metrics are counted when expected. Note that the `shouldCountCanceled` case fails, because canceling the job was no longer counted in the metrics.
This behavior was broken in an earlier commit, when we switched from writing a Cancel Job command to a Job Canceled event.
d612677
to
0aac0d7
Compare
It's important to highlight duplication, because if either changes, the other should be changed as well.
@pihme Please have another look, I'll open an issue for the follow-up work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good job!
bors merge |
Backport failed for Please cherry-pick the changes locally. git fetch origin stable/1.3
git worktree add -d .worktree/backport-9219-to-stable/1.3 origin/stable/1.3
cd .worktree/backport-9219-to-stable/1.3
git checkout -b backport-9219-to-stable/1.3
ancref=$(git merge-base 2f4d8c859cfde20f7d203fea9141f6ed9ccfd305 25fae05a803c17cc5bb9a76620f3e25c7194cc06)
git cherry-pick -x $ancref..25fae05a803c17cc5bb9a76620f3e25c7194cc06 |
Backport failed for Please cherry-pick the changes locally. git fetch origin stable/8.0
git worktree add -d .worktree/backport-9219-to-stable/8.0 origin/stable/8.0
cd .worktree/backport-9219-to-stable/8.0
git checkout -b backport-9219-to-stable/8.0
ancref=$(git merge-base 2f4d8c859cfde20f7d203fea9141f6ed9ccfd305 25fae05a803c17cc5bb9a76620f3e25c7194cc06)
git cherry-pick -x $ancref..25fae05a803c17cc5bb9a76620f3e25c7194cc06 |
9250: [Backport 1.3] Cancel job with incident when canceling the process instance r=korthout a=korthout ## Description <!-- Link to the PR that is back ported --> Backport of #9219 ## Related issues <!-- Link to the related issues of the origin PR --> closes #8588 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout ## Description <!-- Link to the PR that is back ported --> Backport of #9219 ## Related issues <!-- Link to the related issues of the origin PR --> closes #8588 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout ## Description <!-- Link to the PR that is back ported --> Backport of #9219 ## Related issues <!-- Link to the related issues of the origin PR --> closes #8588 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Description
This fixes a bug where canceling a process instance could lead to a job becoming activatable even though the process instance is terminated. See #8588 for more details.
The bug is fixed by making sure the job is immediately canceled (i.e. deleted from the state) when we terminate the related element instance. We achieve this by writing a Job Canceled event when the TERMINATE_ELEMENT command is being processed.
This PR fixes the bug in a way that can be backported because it doesn't introduce too many changes. Alternatively, the bug could be fixed by tackling the main issue: applying the Incident Resolved event should not affect the state of a job. Instead, the job should be made activatable by writing a separate Job event in the Resolve Incident Processor. That would be a larger change, with more impact on the event appliers than what this PR proposes. Changes to event appliers should generally be avoided.
Related issues
closes #8588
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.