Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stable/8.0] Allow retried DMN resource distribution #9883

Merged

Conversation

korthout
Copy link
Member

@korthout korthout commented Jul 25, 2022

Description

This patches a critical bug related to the retry mechanism of deployment distribution. If the retried deployment distribution contains a DMN resource, then this would trigger the consistency checks.

This patch is only available for stable/8.0, as we want to provide a different solution on main. See #9877 (comment).

Related issues

closes #9877

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

@korthout korthout marked this pull request as ready for review July 25, 2022 16:52
@korthout
Copy link
Member Author

korthout commented Jul 25, 2022

@oleschoenburg Do we want to forward port this to main or should we wait for the post-incident review?

Note, we can always retro-actively forward port this.

lenaschoenburg and others added 2 commits July 25, 2022 19:23
A critical bug was found related to redistributing DMN resources. When
the distribution isn't acknowledged, it is retried until it is. This
retry also occurs on the restart of the deployment partition, if the
deployment distribution hasn't been completed for that partition yet.

This retry results in an additional distribute deployment command on the
other partition. Once processed it writes the DISTRIBUTED event which
when applied inserts the decision and drg into the state. The second
time the event is written again and applied again, in which case the
insert fails due to consistency checks of the insert method.

This test doesn't check the implementation of the insert (using the
insert method), but checks the behavior. If the insert fails then it
also doesn't write the DISTRIBUTED event, as the exception will make
sure the entire transaction is rolled back. This test verifies that the
DISTRIBUTED event is correctly written to the log and so without errors.
It actually checks that this event is written twice, after the command
was written twice. After this we can be sure that the retried
distribution of the dmn resource in the deployment was succesfully
processed.

Note that this solution differs from the alternative: rejecting the
retried DISTRIBUTE command if the resources are already known. However,
this solution is more complex and not suitable for patching a critical
bug. We can have a look at that solution in future versions.

Co-authored-by: Nico korthout <nico.korthout@camunda.com>
This fixes critical a critical bug, where a decision and/or drg can be
inserted twice due to retry logic for inter-partition distribution of
deployments. To counter act, this allows an overwrite of existing
values.

See #9877 and DeploymentClusteredTest.shouldDistributeDmnResourceOnRetry.
@korthout korthout force-pushed the korthout-9877-idempotent-dmn-distribution branch from db84c51 to 0e60752 Compare July 25, 2022 17:23
Copy link
Member

@lenaschoenburg lenaschoenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @korthout ⛑️
Nice that you found a way to write this test cleanly 🙈

I'd say we can cherry-pick this on main, there shouldn't be any conflicts.

@korthout korthout added the backport main Forward-port a pull request to main label Jul 26, 2022
@korthout
Copy link
Member Author

bors merge

@zeebe-bors-camunda
Copy link
Contributor

@zeebe-bors-camunda zeebe-bors-camunda bot merged commit f22dcfb into stable/8.0 Jul 26, 2022
@zeebe-bors-camunda zeebe-bors-camunda bot deleted the korthout-9877-idempotent-dmn-distribution branch July 26, 2022 08:14
@github-actions
Copy link
Contributor

Successfully created backport PR #9887 for main.

zeebe-bors-camunda bot added a commit that referenced this pull request Jul 26, 2022
9885: Support logging partition id in compact record logger r=remcowesterhoud a=korthout

## Description

<!-- Please explain the changes you made here. -->
The compact record logger could already log the encoded partition id of keys when multiple partitions were used, but sometimes we write records on other partitions without a key (-1), or we write records on other partitions with a key that encodes a different partition.

This PR adds support for printing the partition id at the start of each log line. The partition id is omitted if the records don't contain any reference to other partitions (neither `partitionId` nor encoded in the `key`). 

I ran into this while working on a test case for #9877.

## Related issues

<!-- Which issues are closed by this PR or are related -->

relates #9877 



9887: [Backport main] [stable/8.0] Allow retried DMN resource distribution r=korthout a=github-actions[bot]

# Description
Backport of #9883 to `main`.

relates to #9877

Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport main Forward-port a pull request to main version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants