Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(raft): do not handle response if role is already closed #10640

Merged
merged 2 commits into from Oct 10, 2022

Conversation

deepthidevaki
Copy link
Contributor

@deepthidevaki deepthidevaki commented Oct 7, 2022

Description

The response to a request send when the node was in leader in a specific term, was processed even if the response was received in a later term. This resulted in following behavior:

  1. Leader sends an AppendRequest to follower, before sending it updates RaftMemberContext#inFlightAppendCount to 1.
  2. Leader steps down and becomes leader again
  3. Leader resets the member context of follower. So inFlightAppendCount is reset to 0
  4. Leader sends a new append request to the follower, inFlightAppendCount is set to 1
  5. The first request times out. On handling the response it decrements the inFlightAppendCount to 0.
  6. The second request times out. On handling the response it decrements the inFlightAppendCount to -1.
  7. Next time, when the leader attempts to send an append request RaftMemberContext#canAppend returns false because inFlightAppendCount != 0. As a result leader will never sent a heartbeat or new append request to the follower.

To fix this, the check for if the role is open is done before processing the response. As a precaution, other requests where the response is handled without checking for open is also updated.

Related issues

closes #10545

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

Test Results

   936 files  ±  0     936 suites  ±0   1h 45m 45s ⏱️ + 1m 15s
7 415 tests  - 38  7 409 ✔️  - 38  6 💤 ±0  0 ±0 
7 605 runs   - 38  7 597 ✔️  - 38  8 💤 ±0  0 ±0 

Results for commit 11a58e4. ± Comparison against base commit 56dd345.

♻️ This comment has been updated with latest results.

Copy link
Member

@oleschoenburg oleschoenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! 🚀

@deepthidevaki
Copy link
Contributor Author

bors merge

@zeebe-bors-camunda
Copy link
Contributor

Build succeeded:

@zeebe-bors-camunda zeebe-bors-camunda bot merged commit ce68a1d into main Oct 10, 2022
@zeebe-bors-camunda zeebe-bors-camunda bot deleted the dd-10545-fix-raft branch October 10, 2022 14:42
@backport-action
Copy link
Collaborator

Backport failed for stable/8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin stable/8.0
git worktree add -d .worktree/backport-10640-to-stable/8.0 origin/stable/8.0
cd .worktree/backport-10640-to-stable/8.0
git checkout -b backport-10640-to-stable/8.0
ancref=$(git merge-base 56dd345b2aae4ff4a39193a6f8c4e707e7afea69 11a58e45538b8fc51222eb2e5760bd8897a74fda)
git cherry-pick -x $ancref..11a58e45538b8fc51222eb2e5760bd8897a74fda

@backport-action
Copy link
Collaborator

Successfully created backport PR #10656 for stable/8.1.

zeebe-bors-camunda bot added a commit that referenced this pull request Oct 10, 2022
10656: [Backport stable/8.1] fix(raft): do not handle response if role is already closed r=deepthidevaki a=backport-action

# Description
Backport of #10640 to `stable/8.1`.

closes #10545

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this pull request Oct 10, 2022
10657: [Backport stable/8.0] ci: merge deploy and auto-merge workflows into unified CI workflow r=oleschoenburg a=oleschoenburg

manual backport of #10616

10659: [Backport stable/8.0] fix(raft): do not handle response if the role is closed r=oleschoenburg a=deepthidevaki

## Description

Backports #10640 

Changes to the `ControlledRaftContext` is not backported as the original code does not exist in this version. 

closes #10545 

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
@korthout korthout added the version:8.1.1 Marks an issue as being completely or in parts released in 8.1.1 label Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:8.1.1 Marks an issue as being completely or in parts released in 8.1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RandomizedRaftTest.livenessTestWithNoSnapshot fails because member is ACTIVE not READY
4 participants