Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: notify new SnapshotReplicationListeners about missed replications #8834

Merged
3 commits merged into from
Feb 28, 2022

Conversation

lenaschoenburg
Copy link
Member

@lenaschoenburg lenaschoenburg commented Feb 24, 2022

Description

We are using SnapshotReplicationListeners to transition to inactive when a snapshot replication starts. Snapshot replication can start before any listeners have registered, which means that the listener will only be notified about snapshot replication finishing, triggering a transition to follower without first transitioning to inactive.

This PR introduces a new flag in the RaftContext to keep track of ongoing snapshot replications so that newly registered listeners can be notified about immediately.

Related issues

closes #8830

We are using `SnapshotReplicationListener`s to transition to inactive
when a snapshot replication starts. Snapshot replication can start
before any listeners have registered, which means that the listener will
only be notified about snapshot replication finishing, triggering a
transition to follower without first transitioning to inactive.

Here we are keeping track of ongoing snapshot replication to immediately
notify listeners when they are registering.
@lenaschoenburg lenaschoenburg marked this pull request as ready for review February 25, 2022 11:00
@lenaschoenburg lenaschoenburg force-pushed the 8830-fix-missed-transition branch 2 times, most recently from b62c8a8 to 2c88fed Compare February 25, 2022 12:43
@lenaschoenburg lenaschoenburg changed the title fix: notify new SnapshotReplicationListeners about ongoing replication fix: notify new SnapshotReplicationListeners about missed replications Feb 25, 2022
Previously, we only notified new listeners about ongoing replication.
This was not enough in cases where snapshot replication finished before
the listener completed.
@romansmirnov
Copy link
Member

romansmirnov commented Feb 25, 2022

@oleschoenburg, thanks for the PR. As already discussed, there is still at least one scenario in which the ZeebePartition transitions too late into the INACTIVE role.

Basically, when during the Broker startup procedure a snapshot is received, then this PR ensures that three transitions are happening in an expected order (and thus, there is always happening a transition to INACTIVE):

  1. Initial transition to FOLLOWER
  2. On snapshot replication start, transition to INACTIVE
  3. On snapshot replication completed, transition to FOLLOWER

Now, what could happen is that the second transition to INACTIVE happens too late, and in the meantime, the log got compacted already, for instance:

  1. First, by the initial transition to FOLLOWER the latest snapshot is installed and a stream processor is started in replay mode.
  2. In the meantime, the last snapshot chunk is received which will cause a log compaction (i.e., the entire log gets deleted if the follower is lagging behind a lot):
    https://github.com/camunda-cloud/zeebe/blob/b470b4e8e9f821775d2237ce8e91b28d070ab586/atomix/cluster/src/main/java/io/atomix/raft/roles/PassiveRole.java#L764-L776
  3. After a complete snapshot replication, the Raft starts receiving append requests. With these requests, new entries are appended to the log.
  4. The stream processor receives a log stream reader and starts reading from the log to replay events to the runtime database.

At this stage, the installed snapshot during the first transition is not in-sync with the logstream anymore. Meaning, the stream processor might replay events that expect a certain element to be present in the (runtime) database which must not be the case. As a result, the stream processor potentially might fail with a NullPointerException.

With this PR, this might get resolved on its own. In this situation, it is sure that the transition to INACTIVE (and afterward again to FOLLOWER) will happen in any case. With that, the ZeebePartition will recover from that failure and the installed snapshot is in-sync with the logstream again.

However, as Zeebe is able to recover from that state on its own, I think, we can accept that scenario. @npepinpe, do you agree?

Please note: In theory, when the installed snapshot and the logstream are out of sync, and before an NPE is thrown and the transition to INACTIVE begins, it could happen that Zeebe takes its own snapshot for that given partition which is newer than the replicated snapshot (so that the replicated snapshot gets deleted eventually). As a consequence, with the next transition to FOLLOWER it will recover from that "new" taken snapshot and continue with replaying the events from the logstream. This would result in a potentially corrupted snapshot that does not contain the expected state. But given the fact, that the first snapshot is taken at least after one minute after starting the ZeebePartition, it is very likely that the transition to INACTIVE and then back to FOLLOWER happened in the meantime. So, I guess this scenario is not likely to happen.

Copy link
Member

@romansmirnov romansmirnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@npepinpe
Copy link
Member

I'm not 100% sure it's acceptable, if I understood correctly. Is there a chance the follower will persist its incorrect state? After all, when we take a snapshot, we simply take the runtime and save that, without really checking if that runtime was tied to a particular term or anything. So if we'd applied the wrong things, is there a chance we could snapshot it, before we transition to inactive? I don't think so, but it's not 100% sure to me.

@romansmirnov
Copy link
Member

@npepinpe,

So if we'd applied the wrong things, is there a chance we could snapshot it, before we transition to inactive? I don't think so, but it's not 100% sure to me.

No, it won't snapshot it. Unless the transition to INACTIVE happens after the firstTimeSnapshot, only then it could potentially snapshot it:

https://github.com/camunda-cloud/zeebe/blob/34c4014405165b5c77c8cb72223fe8a5b59d0cac/broker/src/main/java/io/camunda/zeebe/broker/system/partitions/impl/AsyncSnapshotDirector.java#L104-L106

But how likely is it? To my understanding, this is not very likely and should not happen. Otherwise, I would agree that snapshotting wrong things is not acceptable.

Assuming, we can be sure that the wrong things/state won't be snapshotted, do you agree that we can accept a potential NPE and/or apply wrong things to the runtime database? To me, this would be a transient situation that gets resolved by a transition to INACTIVE and subsequently to FOLLOWER. By transitioning to FOLLOWER, it will recover from the latest snapshot and replay the events correctly. WDYT?

Should we maybe create a separate issue for further discussion, if necessary?

@lenaschoenburg
Copy link
Member Author

bors r+

@ghost ghost merged commit 7c20b26 into main Feb 28, 2022
@ghost ghost deleted the 8830-fix-missed-transition branch February 28, 2022 15:12
@github-actions
Copy link
Contributor

Successfully created backport PR #8853 for stable/1.2.

@github-actions
Copy link
Contributor

Successfully created backport PR #8854 for stable/1.3.

@github-actions
Copy link
Contributor

Successfully created backport PR #8855 for release-1.4.0-alpha2.

ghost pushed a commit that referenced this pull request Feb 28, 2022
8855: [Backport release-1.4.0-alpha2] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot]

# Description
Backport of #8834 to `release-1.4.0-alpha2`.

relates to #8830

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
ghost pushed a commit that referenced this pull request Feb 28, 2022
8854: [Backport stable/1.3] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot]

# Description
Backport of #8834 to `stable/1.3`.

relates to #8830

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
ghost pushed a commit that referenced this pull request Feb 28, 2022
8853: [Backport stable/1.2] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot]

# Description
Backport of #8834 to `stable/1.2`.

relates to #8830

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
ghost pushed a commit that referenced this pull request Feb 28, 2022
8854: [Backport stable/1.3] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot]

# Description
Backport of #8834 to `stable/1.3`.

relates to #8830

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
ghost pushed a commit that referenced this pull request Feb 28, 2022
8853: [Backport stable/1.2] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot]

# Description
Backport of #8834 to `stable/1.2`.

relates to #8830

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
ghost pushed a commit that referenced this pull request Feb 28, 2022
8854: [Backport stable/1.3] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot]

# Description
Backport of #8834 to `stable/1.3`.

relates to #8830

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
@Zelldon Zelldon added Release: 1.2.11 version:1.3.5 Marks an issue as being completely or in parts released in 1.3.5 labels Mar 1, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:1.3.5 Marks an issue as being completely or in parts released in 1.3.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NPE during replay
4 participants