Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follower cannot receive snapshot because "chunk received out of order" #10180

Closed
deepthidevaki opened this issue Aug 25, 2022 · 1 comment · Fixed by #10183 or #10211
Closed

Follower cannot receive snapshot because "chunk received out of order" #10180

deepthidevaki opened this issue Aug 25, 2022 · 1 comment · Fixed by #10183 or #10211
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog support Marks an issue as related to a customer support request version:1.3.14 Marks an issue as being completely or in parts released in 1.3.14 version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@deepthidevaki
Copy link
Contributor

deepthidevaki commented Aug 25, 2022

Describe the bug

 io.atomix.raft.roles.LeaderAppender - RaftServer{raft-partition-partition-2} - Failed to send InstallRequest{currentTerm=191, leader=2, index=6790469, term=188, version=1, chunkId=HeapByteBuffer{position=0, remaining=10, limit=10, capacity=10, mark=java.nio.HeapByteBuffer[pos=0 lim=10 cap=10], hash=1628710428}, nextChunkId=HeapByteBuffer{position=0, remaining=10, limit=10, capacity=10, mark=java.nio.HeapByteBuffer[pos=0 lim=10 cap=10], hash=1777397309}, data=HeapByteBuffer{position=0, remaining=8821580, limit=8821580, capacity=8821580, mark=java.nio.HeapByteBuffer[pos=0 lim=8821580 cap=8821580], hash=2050789216}, initial=true, complete=false} to member 1, with RaftError{type=ILLEGAL_MEMBER_STATE, message=Request chunk is was received out of order}. Restart sending snapshot.

Broker 1 is not ready because Partition-2 has not caught up with the leader. The leader is Broker 2. Initially there are lot of timeouts which leads to frequent leader election, and broker 2 becomes leader again and again. This happens for a few minutes. After that there are no timeouts. No more leader election. Broker 2 stays as the leader without any interruption.

Now Broker 2 is trying to replicate the snapshot to Broker 1. But it is caught in a loop with error - "message=Request chunk is was received out of order}. Restart sending snapshot." It is expected to be able to recover from this error, but it did not happen. Leader keeps restart sending the snapshot, but follower rejects with the same error response. No new snapshot is taken on the leader. So leader tries replicating the same snapshot.

When the follower was restarted it was able to come out of this error loop, and was thus able to complete the start up.

To Reproduce

(Trying to write a reproducer test for it).

Expected behavior

Brokers can recover from this error without a restart.

Related Support Case:- SUPPORT-14296

Version: 8.0.4 (and previous versions)

@deepthidevaki deepthidevaki added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog support Marks an issue as related to a customer support request labels Aug 25, 2022
@deepthidevaki deepthidevaki self-assigned this Aug 25, 2022
@Zelldon
Copy link
Member

Zelldon commented Aug 25, 2022

@deepthidevaki can you add a version and config to the issue please?

zeebe-bors-camunda bot added a commit that referenced this issue Aug 29, 2022
10210: [Backport stable/8.0] fix(raft): follower resets pending snapshot after rejecting install request r=oleschoenburg a=deepthidevaki

## Description

Backport #10183 

closes #10180 #10202 

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 29, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=oleschoenburg a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 29, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 30, 2022
10211: [Backport stable/1.3] fix(raft): follower resets pending snapshot after rejecting install request r=deepthidevaki a=deepthidevaki

## Description

Backport #10183

closes #10180 #10202

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
@saig0 saig0 added release/8.0.8 version:1.3.14 Marks an issue as being completely or in parts released in 1.3.14 labels Sep 1, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog support Marks an issue as related to a customer support request version:1.3.14 Marks an issue as being completely or in parts released in 1.3.14 version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
3 participants