Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follower cannot receive new entries because it did not reset the log on receiving snapshot #10202

Closed
deepthidevaki opened this issue Aug 26, 2022 · 0 comments · Fixed by #10183
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog version:1.3.14 Marks an issue as being completely or in parts released in 1.3.14 version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@deepthidevaki
Copy link
Contributor

Here is what happens:

15:22:09.891 [] INFO  io.atomix.raft.impl.RaftContext - RaftServer{0-partition-1} - Found leader 2
15:22:09.891 [] TRACE io.atomix.raft.impl.RaftContext - RaftServer{0-partition-1} - Set leader 2
15:22:09.891 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Sending ConfigureResponse{status=OK}
15:22:09.891 [] TRACE io.atomix.raft.impl.RandomizedElectionTimer - Election timeout scheduled for PT4.468S
15:22:09.891 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Received AppendRequest{term=74, leader=2, prevLogIndex=2, prevLogTerm=37, entries=0, commitIndex=2}
15:22:09.891 [] DEBUG io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Rejected AppendRequest{term=74, leader=2, prevLogIndex=2, prevLogTerm=37, entries=0, commitIndex=2}: Previous index (2) is greater than the local log's last index (1)
15:22:09.891 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Sending AppendResponse{status=OK, term=74, succeeded=false, lastLogIndex=1, lastSnapshotIndex=0}

Node 0 log has one entry which is at index 1.

15:22:09.892 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Received AppendRequest{term=74, leader=2, prevLogIndex=1, prevLogTerm=19, entries=2, commitIndex=2}
15:22:09.892 [] DEBUG io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Rejected AppendRequest{term=74, leader=2, prevLogIndex=1, prevLogTerm=19, entries=2, commitIndex=2}: Previous entry term (19) does not equal the local log's last term (5)
15:22:09.893 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Sending AppendResponse{status=OK, term=74, succeeded=false, lastLogIndex=0, lastSnapshotIndex=0}

Node 0 log's entry at index 1 has term 5. Leader's index 1 has term 19.

15:22:09.895 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Received InstallRequest{currentTerm=74, leader=2, index=1, term=19, version=1, chunkId=HeapByteBuffer{position=0, remaining=7, limit=7, capacity=7, mark=java.nio.HeapByteBuffer[pos=0 lim=7 cap=7], hash=263875441}, nextChunkId=null, data=HeapByteBuffer{position=0, remaining=57, limit=57, capacity=57, mark=java.nio.HeapByteBuffer[pos=0 lim=57 cap=57], hash=1020455180}, initial=false, complete=true}
15:22:09.895 [] DEBUG io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Received snapshot 1 chunk from 2
15:22:09.895 [] DEBUG io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Committing snapshot io.atomix.raft.snapshot.InMemorySnapshot@176fb1
15:22:09.895 [] INFO  io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Committed snapshot io.atomix.raft.snapshot.InMemorySnapshot@176fb1
15:22:09.895 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Sending InstallResponse{status=OK}

Leader sends snapshot at index 1. Node 0 commits it. But does not reset the log. (Bug!) We expect the follower to reset the log when ever it receives a new snapshot from the leader.

Now Node 0 has snapshot at index 1 and term 19 and log with entry at index 1 and term 5.(Bug!!)

15:22:09.896 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Received AppendRequest{term=74, leader=2, prevLogIndex=1, prevLogTerm=19, entries=2, commitIndex=3}
15:22:09.896 [] DEBUG io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Rejected AppendRequest{term=74, leader=2, prevLogIndex=1, prevLogTerm=19, entries=2, commitIndex=3}: Previous entry term (19) does not equal the local log's last term (5)
15:22:09.896 [] TRACE io.atomix.raft.roles.FollowerRole - RaftServer{0-partition-1}{role=FOLLOWER} - Sending AppendResponse{status=OK, term=74, succeeded=false, lastLogIndex=0, lastSnapshotIndex=1}

Node 0 receives entry at index 2. Expected behavior => there is a snapshot at index 1, so it can append entry at 2. But since we have an entry at index 1 in the log, it is using that to verify the previous entry. So it fails. And this repeats. 

Fix for this is to ensure that the log is reset when a follower receives a snapshot from the leader.

Why is the log not reset?

Can we remove this check?

  • This check was needed because of the snapshot replication outside of raft. We can remove this now. But simply removing this does not work. The same code is also used to reset the log in the init of a role. This is needed to guarantee that if the node crashes after snapshot is committed, but before resetting the log, the log is reset immediately after restart.

Proposed solution:

  1. Remove this check and reset the log always when the follower receives a snapshot
  2. Reset the log before the snapshot is committed.
  3. Remove the reset of the log during init. No need for that because of Step 2

Originally posted by @deepthidevaki in #10183 (comment)

@deepthidevaki deepthidevaki added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog labels Aug 26, 2022
@deepthidevaki deepthidevaki self-assigned this Aug 26, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Aug 29, 2022
10210: [Backport stable/8.0] fix(raft): follower resets pending snapshot after rejecting install request r=oleschoenburg a=deepthidevaki

## Description

Backport #10183 

closes #10180 #10202 

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 29, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=oleschoenburg a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 29, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
@saig0 saig0 added release/8.0.8 version:1.3.14 Marks an issue as being completely or in parts released in 1.3.14 labels Sep 1, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog version:1.3.14 Marks an issue as being completely or in parts released in 1.3.14 version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants