Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel job with incident when canceling the process instance #9219

Merged
merged 9 commits into from
Apr 28, 2022

Conversation

korthout
Copy link
Member

Description

This fixes a bug where canceling a process instance could lead to a job becoming activatable even though the process instance is terminated. See #8588 for more details.

The bug is fixed by making sure the job is immediately canceled (i.e. deleted from the state) when we terminate the related element instance. We achieve this by writing a Job Canceled event when the TERMINATE_ELEMENT command is being processed.

This PR fixes the bug in a way that can be backported because it doesn't introduce too many changes. Alternatively, the bug could be fixed by tackling the main issue: applying the Incident Resolved event should not affect the state of a job. Instead, the job should be made activatable by writing a separate Job event in the Resolve Incident Processor. That would be a larger change, with more impact on the event appliers than what this PR proposes. Changes to event appliers should generally be avoided.

Related issues

closes #8588

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

@korthout korthout force-pushed the korthout-8588-cancel-job-with-incident branch from 3bab30a to 8fb143e Compare April 22, 2022 13:34
@pihme pihme self-requested a review April 25, 2022 12:39
Copy link
Contributor

@pihme pihme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question/concern.

Please also create follow-up issues to address the general change you alluded to in your description (resolving incidents should not activate the Job)

This adds a regression test against activating jobs of cancelled process
instances, by making sure they are cancelled (i.e. deleted).

In this specific case, an uncaught error was thrown on a job which led
to an UNHANDLED_ERROR_EVENT incident. When the process instance is
cancelled, the job was not called, but instead was made activatable
again. This happened, because the incident was resolved as part of the
process instance termination logic. Instead, we should guarantee that
the job is cancelled. A cancelled job can no longer be made
re-activatable. This is a terminal state for jobs, because a cancelled
job is a deleted job.

See #8588 for more details.
This makes it easier to add more values to the predicate, while staying
understandable.
Jobs in the error_thrown state should also be able to be cancelled. By
adding it to this list, we make sure that the job with error_thrown
state can be cancelled by the engine when the element it belongs to is
being terminated.
@korthout korthout force-pushed the korthout-8588-cancel-job-with-incident branch 2 times, most recently from 9da335b to d612677 Compare April 28, 2022 09:13
We need to make sure the job is cancelled immediately (synchronously)
because otherwise the IncidentResolvedApplier may still make the job
activatable again.

To cancel the job immediately we simply need to use the statewriter to
write Job Cancelled, instead of using the commandwriter to write Cancel
Job. The statewriter will immediately apply the event when writing it.
Since the command is no longer written, the test cases that were
verifying that the command was written should be updated to verify that
the job canceled event is written in the same spot.

Note that in the BoundaryEventTest it highlights the difference between
asynchronously canceling the job with a command and synchronously
(immediately) canceling the job with an event.
The ability to read a metric value will be useful in future Metrics
tests.
This adds tests that verify that the job metrics are counted when
expected.

Note that the `shouldCountCanceled` case fails, because canceling the
job was no longer counted in the metrics.
This behavior was broken in an earlier commit, when we switched from
writing a Cancel Job command to a Job Canceled event.
@korthout korthout force-pushed the korthout-8588-cancel-job-with-incident branch from d612677 to 0aac0d7 Compare April 28, 2022 11:47
It's important to highlight duplication, because if either changes, the
other should be changed as well.
@korthout korthout requested a review from pihme April 28, 2022 11:57
@korthout
Copy link
Member Author

@pihme Please have another look, I'll open an issue for the follow-up work.

Copy link
Contributor

@pihme pihme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job!

@korthout
Copy link
Member Author

bors merge

@zeebe-bors-camunda
Copy link
Contributor

@zeebe-bors-camunda zeebe-bors-camunda bot merged commit 053ff2c into main Apr 28, 2022
@zeebe-bors-camunda zeebe-bors-camunda bot deleted the korthout-8588-cancel-job-with-incident branch April 28, 2022 12:52
@github-actions
Copy link
Contributor

Backport failed for stable/1.3, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin stable/1.3
git worktree add -d .worktree/backport-9219-to-stable/1.3 origin/stable/1.3
cd .worktree/backport-9219-to-stable/1.3
git checkout -b backport-9219-to-stable/1.3
ancref=$(git merge-base 2f4d8c859cfde20f7d203fea9141f6ed9ccfd305 25fae05a803c17cc5bb9a76620f3e25c7194cc06)
git cherry-pick -x $ancref..25fae05a803c17cc5bb9a76620f3e25c7194cc06

@github-actions
Copy link
Contributor

Backport failed for stable/8.0, because it was unable to create a new branch.

Please cherry-pick the changes locally.

git fetch origin stable/8.0
git worktree add -d .worktree/backport-9219-to-stable/8.0 origin/stable/8.0
cd .worktree/backport-9219-to-stable/8.0
git checkout -b backport-9219-to-stable/8.0
ancref=$(git merge-base 2f4d8c859cfde20f7d203fea9141f6ed9ccfd305 25fae05a803c17cc5bb9a76620f3e25c7194cc06)
git cherry-pick -x $ancref..25fae05a803c17cc5bb9a76620f3e25c7194cc06

zeebe-bors-camunda bot added a commit that referenced this pull request Apr 28, 2022
9250: [Backport 1.3] Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

<!-- Link to the PR that is back ported -->

Backport of #9219 

## Related issues

<!-- Link to the related issues of the origin PR -->

closes #8588


Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this pull request Apr 28, 2022
9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

<!-- Link to the PR that is back ported -->

Backport of #9219

## Related issues

<!-- Link to the related issues of the origin PR -->

closes #8588 


Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this pull request Apr 28, 2022
9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

<!-- Link to the PR that is back ported -->

Backport of #9219

## Related issues

<!-- Link to the related issues of the origin PR -->

closes #8588 


Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job of cancelled instance can be activated if an error was thrown on it
3 participants