fix: notify new `SnapshotReplicationListener`s about missed replications #8834

lenaschoenburg · 2022-02-24T11:23:33Z

Description

We are using SnapshotReplicationListeners to transition to inactive when a snapshot replication starts. Snapshot replication can start before any listeners have registered, which means that the listener will only be notified about snapshot replication finishing, triggering a transition to follower without first transitioning to inactive.

This PR introduces a new flag in the RaftContext to keep track of ongoing snapshot replications so that newly registered listeners can be notified about immediately.

Related issues

closes #8830

We are using `SnapshotReplicationListener`s to transition to inactive when a snapshot replication starts. Snapshot replication can start before any listeners have registered, which means that the listener will only be notified about snapshot replication finishing, triggering a transition to follower without first transitioning to inactive. Here we are keeping track of ongoing snapshot replication to immediately notify listeners when they are registering.

Previously, we only notified new listeners about ongoing replication. This was not enough in cases where snapshot replication finished before the listener completed.

romansmirnov · 2022-02-25T16:07:58Z

@oleschoenburg, thanks for the PR. As already discussed, there is still at least one scenario in which the ZeebePartition transitions too late into the INACTIVE role.

Basically, when during the Broker startup procedure a snapshot is received, then this PR ensures that three transitions are happening in an expected order (and thus, there is always happening a transition to INACTIVE):

Initial transition to FOLLOWER
On snapshot replication start, transition to INACTIVE
On snapshot replication completed, transition to FOLLOWER

Now, what could happen is that the second transition to INACTIVE happens too late, and in the meantime, the log got compacted already, for instance:

First, by the initial transition to FOLLOWER the latest snapshot is installed and a stream processor is started in replay mode.
In the meantime, the last snapshot chunk is received which will cause a log compaction (i.e., the entire log gets deleted if the follower is lagging behind a lot):
https://github.com/camunda-cloud/zeebe/blob/b470b4e8e9f821775d2237ce8e91b28d070ab586/atomix/cluster/src/main/java/io/atomix/raft/roles/PassiveRole.java#L764-L776
After a complete snapshot replication, the Raft starts receiving append requests. With these requests, new entries are appended to the log.
The stream processor receives a log stream reader and starts reading from the log to replay events to the runtime database.

At this stage, the installed snapshot during the first transition is not in-sync with the logstream anymore. Meaning, the stream processor might replay events that expect a certain element to be present in the (runtime) database which must not be the case. As a result, the stream processor potentially might fail with a NullPointerException.

With this PR, this might get resolved on its own. In this situation, it is sure that the transition to INACTIVE (and afterward again to FOLLOWER) will happen in any case. With that, the ZeebePartition will recover from that failure and the installed snapshot is in-sync with the logstream again.

However, as Zeebe is able to recover from that state on its own, I think, we can accept that scenario. @npepinpe, do you agree?

Please note: In theory, when the installed snapshot and the logstream are out of sync, and before an NPE is thrown and the transition to INACTIVE begins, it could happen that Zeebe takes its own snapshot for that given partition which is newer than the replicated snapshot (so that the replicated snapshot gets deleted eventually). As a consequence, with the next transition to FOLLOWER it will recover from that "new" taken snapshot and continue with replaying the events from the logstream. This would result in a potentially corrupted snapshot that does not contain the expected state. But given the fact, that the first snapshot is taken at least after one minute after starting the ZeebePartition, it is very likely that the transition to INACTIVE and then back to FOLLOWER happened in the meantime. So, I guess this scenario is not likely to happen.

romansmirnov

🚀

npepinpe · 2022-02-25T17:13:02Z

I'm not 100% sure it's acceptable, if I understood correctly. Is there a chance the follower will persist its incorrect state? After all, when we take a snapshot, we simply take the runtime and save that, without really checking if that runtime was tied to a particular term or anything. So if we'd applied the wrong things, is there a chance we could snapshot it, before we transition to inactive? I don't think so, but it's not 100% sure to me.

romansmirnov · 2022-02-25T19:40:41Z

@npepinpe,

So if we'd applied the wrong things, is there a chance we could snapshot it, before we transition to inactive? I don't think so, but it's not 100% sure to me.

No, it won't snapshot it. Unless the transition to INACTIVE happens after the firstTimeSnapshot, only then it could potentially snapshot it:

https://github.com/camunda-cloud/zeebe/blob/34c4014405165b5c77c8cb72223fe8a5b59d0cac/broker/src/main/java/io/camunda/zeebe/broker/system/partitions/impl/AsyncSnapshotDirector.java#L104-L106

But how likely is it? To my understanding, this is not very likely and should not happen. Otherwise, I would agree that snapshotting wrong things is not acceptable.

Assuming, we can be sure that the wrong things/state won't be snapshotted, do you agree that we can accept a potential NPE and/or apply wrong things to the runtime database? To me, this would be a transient situation that gets resolved by a transition to INACTIVE and subsequently to FOLLOWER. By transitioning to FOLLOWER, it will recover from the latest snapshot and replay the events correctly. WDYT?

Should we maybe create a separate issue for further discussion, if necessary?

lenaschoenburg · 2022-02-28T14:45:23Z

bors r+

ghost · 2022-02-28T15:12:54Z

Build succeeded:

github-actions · 2022-02-28T15:13:53Z

Successfully created backport PR #8853 for stable/1.2.

github-actions · 2022-02-28T15:13:59Z

Successfully created backport PR #8854 for stable/1.3.

github-actions · 2022-02-28T15:14:03Z

Successfully created backport PR #8855 for release-1.4.0-alpha2.

8855: [Backport release-1.4.0-alpha2] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot] # Description Backport of #8834 to `release-1.4.0-alpha2`. relates to #8830 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

8854: [Backport stable/1.3] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot] # Description Backport of #8834 to `stable/1.3`. relates to #8830 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

8853: [Backport stable/1.2] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot] # Description Backport of #8834 to `stable/1.2`. relates to #8830 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

8854: [Backport stable/1.3] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot] # Description Backport of #8834 to `stable/1.3`. relates to #8830 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

8853: [Backport stable/1.2] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot] # Description Backport of #8834 to `stable/1.2`. relates to #8830 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

8854: [Backport stable/1.3] fix: notify new `SnapshotReplicationListener`s about missed replications r=oleschoenburg a=github-actions[bot] # Description Backport of #8834 to `stable/1.3`. relates to #8830 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

lenaschoenburg added backport stable/1.2 labels Feb 24, 2022

lenaschoenburg requested a review from romansmirnov February 24, 2022 11:30

lenaschoenburg marked this pull request as draft February 24, 2022 14:33

fix: register SnapshotReplicationListener in raft context

be9f78f

lenaschoenburg force-pushed the 8830-fix-missed-transition branch from 6db15ab to f9b5139 Compare February 25, 2022 10:42

lenaschoenburg marked this pull request as ready for review February 25, 2022 11:00

lenaschoenburg force-pushed the 8830-fix-missed-transition branch 2 times, most recently from b62c8a8 to 2c88fed Compare February 25, 2022 12:43

lenaschoenburg changed the title ~~fix: notify new SnapshotReplicationListeners about ongoing replication~~ fix: notify new SnapshotReplicationListeners about missed replications Feb 25, 2022

lenaschoenburg force-pushed the 8830-fix-missed-transition branch from 2c88fed to 7f69fd0 Compare February 25, 2022 13:10

fix: notify new SnapshotReplicationListener about finished replication

47badee

Previously, we only notified new listeners about ongoing replication. This was not enough in cases where snapshot replication finished before the listener completed.

lenaschoenburg force-pushed the 8830-fix-missed-transition branch from 7f69fd0 to 47badee Compare February 25, 2022 13:48

romansmirnov approved these changes Feb 25, 2022

View reviewed changes

lenaschoenburg added the backport release-1.4.0-alpha2 label Feb 28, 2022

ghost merged commit 7c20b26 into main Feb 28, 2022

ghost deleted the 8830-fix-missed-transition branch February 28, 2022 15:12

github-actions bot mentioned this pull request Feb 28, 2022

[Backport stable/1.2] fix: notify new SnapshotReplicationListeners about missed replications #8853

Merged

github-actions bot mentioned this pull request Feb 28, 2022

[Backport stable/1.3] fix: notify new SnapshotReplicationListeners about missed replications #8854

Merged

github-actions bot mentioned this pull request Feb 28, 2022

[Backport release-1.4.0-alpha2] fix: notify new SnapshotReplicationListeners about missed replications #8855

Merged

Zelldon added Release: 1.2.11 version:1.3.5 Marks an issue as being completely or in parts released in 1.3.5 labels Mar 1, 2022

lenaschoenburg added the Release: 1.4.0-alpha2 label Mar 1, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: notify new `SnapshotReplicationListener`s about missed replications #8834

fix: notify new `SnapshotReplicationListener`s about missed replications #8834

lenaschoenburg commented Feb 24, 2022 •

edited

romansmirnov commented Feb 25, 2022 •

edited

romansmirnov left a comment

npepinpe commented Feb 25, 2022

romansmirnov commented Feb 25, 2022

lenaschoenburg commented Feb 28, 2022

ghost commented Feb 28, 2022

github-actions bot commented Feb 28, 2022

github-actions bot commented Feb 28, 2022

github-actions bot commented Feb 28, 2022

fix: notify new SnapshotReplicationListeners about missed replications #8834

fix: notify new SnapshotReplicationListeners about missed replications #8834

Conversation

lenaschoenburg commented Feb 24, 2022 • edited

Description

Related issues

romansmirnov commented Feb 25, 2022 • edited

romansmirnov left a comment

Choose a reason for hiding this comment

npepinpe commented Feb 25, 2022

romansmirnov commented Feb 25, 2022

lenaschoenburg commented Feb 28, 2022

ghost commented Feb 28, 2022

github-actions bot commented Feb 28, 2022

github-actions bot commented Feb 28, 2022

github-actions bot commented Feb 28, 2022

fix: notify new `SnapshotReplicationListener`s about missed replications #8834

fix: notify new `SnapshotReplicationListener`s about missed replications #8834

lenaschoenburg commented Feb 24, 2022 •

edited

romansmirnov commented Feb 25, 2022 •

edited