-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: notify new SnapshotReplicationListener
s about missed replications
#8834
Conversation
We are using `SnapshotReplicationListener`s to transition to inactive when a snapshot replication starts. Snapshot replication can start before any listeners have registered, which means that the listener will only be notified about snapshot replication finishing, triggering a transition to follower without first transitioning to inactive. Here we are keeping track of ongoing snapshot replication to immediately notify listeners when they are registering.
6db15ab
to
f9b5139
Compare
b62c8a8
to
2c88fed
Compare
SnapshotReplicationListener
s about ongoing replicationSnapshotReplicationListener
s about missed replications
2c88fed
to
7f69fd0
Compare
Previously, we only notified new listeners about ongoing replication. This was not enough in cases where snapshot replication finished before the listener completed.
7f69fd0
to
47badee
Compare
@oleschoenburg, thanks for the PR. As already discussed, there is still at least one scenario in which the Basically, when during the Broker startup procedure a snapshot is received, then this PR ensures that three transitions are happening in an expected order (and thus, there is always happening a transition to
Now, what could happen is that the second transition to
At this stage, the installed snapshot during the first transition is not in-sync with the logstream anymore. Meaning, the stream processor might replay events that expect a certain element to be present in the (runtime) database which must not be the case. As a result, the stream processor potentially might fail with a With this PR, this might get resolved on its own. In this situation, it is sure that the transition to However, as Zeebe is able to recover from that state on its own, I think, we can accept that scenario. @npepinpe, do you agree? Please note: In theory, when the installed snapshot and the logstream are out of sync, and before an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
I'm not 100% sure it's acceptable, if I understood correctly. Is there a chance the follower will persist its incorrect state? After all, when we take a snapshot, we simply take the |
No, it won't snapshot it. Unless the transition to But how likely is it? To my understanding, this is not very likely and should not happen. Otherwise, I would agree that snapshotting wrong things is not acceptable. Assuming, we can be sure that the wrong things/state won't be snapshotted, do you agree that we can accept a potential Should we maybe create a separate issue for further discussion, if necessary? |
bors r+ |
Successfully created backport PR #8853 for |
Successfully created backport PR #8854 for |
Successfully created backport PR #8855 for |
Description
We are using
SnapshotReplicationListener
s to transition to inactive when a snapshot replication starts. Snapshot replication can start before any listeners have registered, which means that the listener will only be notified about snapshot replication finishing, triggering a transition to follower without first transitioning to inactive.This PR introduces a new flag in the
RaftContext
to keep track of ongoing snapshot replications so that newly registered listeners can be notified about immediately.Related issues
closes #8830