Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition without leader where all brokers are listed as followers #8978

Closed
romansmirnov opened this issue Mar 24, 2022 · 2 comments · Fixed by #8994
Closed

Partition without leader where all brokers are listed as followers #8978

romansmirnov opened this issue Mar 24, 2022 · 2 comments · Fixed by #8994
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@romansmirnov
Copy link
Member

romansmirnov commented Mar 24, 2022

Describe the bug

According to the heap dump, in the Raft layer, Broker 1 is leader for partition 2:
image

But when checking the cluster state with zbctl, there isn't any leader for partition 2, instead, all brokers are listed as followers:

Cluster size: 4
Partitions count: 4
Replication factor: 4
Gateway version: 1.3.5
Brokers:
  Broker 0
    Version: 1.3.5
    Partition 2 : Follower, Healthy
  Broker 1
    Version: 1.3.5
    Partition 2 : Follower, Healthy
  Broker 2
    Version: 1.3.5
    Partition 2 : Follower, Healthy
  Broker 3
    Version: 1.3.5
    Partition 2 : Follower, Healthy

According to the logs, on Broker 1 the ZeebePartition tried to transition to LEADER but that transition got canceled by a transition to INACTIVE followed up by a transition to FOLLOWER. So, as a result, the ZeebePartition ends up in the FOLLOWER role while in the Raft layer it is still in the LEADER role.

The following events happened which lead to the state:

  1. During the bootstrap phase, a snapshot is received on partition 2
2022-03-24 09:47:58.934 [] [raft-server-1-raft-partition-partition-2] INFO 
      io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-2}{role=FOLLOWER} - Started receiving new snapshot FileBasedReceivedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/2/pending/1416-1-1595-1594-1, snapshotStore=io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore@46da365a, metadata=FileBasedSnapshotMetadata{index=1416, term=1, processedPosition=1595, exporterPosition=1594}} from 2
2022-03-24 09:47:59.026 [Broker-1-SnapshotStore-2] [Broker-1-zb-fs-workers-0] INFO 
      io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 1416-1-1595-1594
  1. Broker 1 starts an election and becomes leader for partition 2:
2022-03-24 10:28:42.856 [] [raft-server-1-raft-partition-partition-2] INFO 
      io.atomix.raft.roles.CandidateRole - RaftServer{raft-partition-partition-2}{role=CANDIDATE} - Starting election
...
2022-03-24 10:28:42.868 [] [raft-server-1-raft-partition-partition-2] INFO 
      io.atomix.raft.impl.RaftContext - RaftServer{raft-partition-partition-2} - Transitioning to LEADER
  1. Eventually, the server for partition 4 has been started successfully which triggers the start of the ZeebePartitions
2022-03-24 10:29:45.669 [] [raft-server-1-raft-partition-partition-4] INFO 
      io.atomix.raft.partition.impl.RaftPartitionServer - RaftPartitionServer{raft-partition-partition-4} - Successfully started server for partition PartitionId{id=4, group=raft-partition} in 2509718ms
  1. The registration of the RoleChangeListener and the SnapshotReplicationListener results in the following transitions:
2022-03-24 10:29:45.916 [Broker-1-ZeebePartition-2] [Broker-1-zb-actors-1] INFO 
      io.camunda.zeebe.broker.system - Transition to LEADER on term 4 requested.
2022-03-24 10:29:45.920 [Broker-1-ZeebePartition-2] [Broker-1-zb-actors-1] INFO 
      io.camunda.zeebe.broker.system - Transition to INACTIVE on term 0 requested.
2022-03-24 10:29:45.920 [Broker-1-ZeebePartition-2] [Broker-1-zb-actors-1] INFO 
      io.camunda.zeebe.broker.system - Received cancel signal for transition to LEADER on term 4
...
2022-03-24 10:29:45.928 [Broker-1-ZeebePartition-2] [Broker-1-zb-actors-1] INFO 
      io.camunda.zeebe.broker.system - Cancelling transition to INACTIVE on term 0
2022-03-24 10:29:45.928 [Broker-1-ZeebePartition-2] [Broker-1-zb-actors-1] INFO 
      io.camunda.zeebe.broker.system - Prepare transition from INACTIVE on term 0 to FOLLOWER
...
2022-03-24 10:29:47.718 [Broker-1-ZeebePartition-2] [Broker-1-zb-actors-1] INFO 
      io.camunda.zeebe.broker.system - Transition to FOLLOWER on term 4 completed

The registration of the listeners happens in the following order:

private void registerListeners() {
context.getRaftPartition().addRoleChangeListener(this);
context.getComponentHealthMonitor().addFailureListener(this);
context.getRaftPartition().getServer().addSnapshotReplicationListener(this);
}

This means it will submit three transitions in the ZeebePartition:

  1. to LEADER,
  2. to INACTIVE, and
  3. to FOLLOWER

Transitions 2- and 3- are caused by the fact that at the beginning a snapshot was received:

public void addSnapshotReplicationListener(
final SnapshotReplicationListener snapshotReplicationListener) {
threadContext.execute(
() -> {
snapshotReplicationListeners.add(snapshotReplicationListener);
// Notify listener immediately if it registered during an ongoing replication.
// This is to prevent missing necessary state transitions.
switch (missedSnapshotReplicationEvents) {
case STARTED -> snapshotReplicationListener.onSnapshotReplicationStarted();
case COMPLETED -> {
snapshotReplicationListener.onSnapshotReplicationStarted();
snapshotReplicationListener.onSnapshotReplicationCompleted(term);
}
default -> {}
}
});
}

Impact

  • Stream processor is not in processing mode, hence, nothing gets processed.
  • Inflight process instances on that partition are "stuck" in the sense that any activatable job won't be activated.
  • Incoming commands (requests) like create process instance, activate jobs, etc. will be dispatched between the other partitions.

Expected behavior

The ZeebePartition transitions to the LEADER role successfully.

Possible Solutions:

  • Change the order in which the listeners are registered, for instance, register first the SnapshotReplicationListener and then the RoleChangeListener.
  • Ensure that in SnapshotReplicationListener#onSnapshotReplicationCompleted() a transition is triggered to the previous role.

related to

@romansmirnov romansmirnov added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request labels Mar 24, 2022
@romansmirnov
Copy link
Member Author

romansmirnov commented Mar 24, 2022

Discussed with @oleschoenburg some possible solutions, we agreed that the suggested "Possible Solutions" (in the description) does not cover all possible cases.

@deepthidevaki
Copy link
Contributor

I think there are also other things to consider:

  1. SnapshotReplicationListener should be invoked only on followers. This was implicit when it is invoked at the time snapshot is received because only follower can receive a snapshot. But this extra notification breaks that assumption.
  2. missedSnapshotReplicationEvents must be re-initialized after each role transition. (I think so, didn't think about edge cases yet).

@deepthidevaki deepthidevaki added the version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 label May 3, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants