New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to take snapshot in leader because index entry is not found #9761
Labels
kind/bug
Categorizes an issue or PR as a bug
severity/mid
Marks a bug as having a noticeable impact but with a known workaround
version:8.1.1
Marks an issue as being completely or in parts released in 8.1.1
version:8.2.0-alpha1
Marks an issue as being completely or in parts released in 8.2.0-alpha1
version:8.2.0
Marks an issue as being completely or in parts released in 8.2.0
Comments
deepthidevaki
added
kind/bug
Categorizes an issue or PR as a bug
severity/mid
Marks a bug as having a noticeable impact but with a known workaround
labels
Jul 12, 2022
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Oct 6, 2022
10611: fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=oleschoenburg When figuring out where to take the next snapshot, we determine the snapshot position as the minimum of processing and exporter positions. There was an edge case where a leader could process but no export. In that case it'd use the exporter position as snapshot position and tryand find a log entry at that position. If the log starts with the exporter position, for example because the same broker previously received a snapshot and compacted the log, no entry could be found which led to a failed snapshot. We now try and use the latest snapshot's term and index if the log entry could not be found. This ensures that new snapshots can be taken even if nothing was exported since the last snapshot. closes #9761 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
This was referenced Oct 6, 2022
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Oct 7, 2022
10493: [Backport stable/8.0] fix(raft): handle exceptions on partition server init r=oleschoenburg a=megglos # Description Backport of #10450 to `stable/8.0`. 10566: [Backport stable/8.0] fix(helm): rename podSecurityContext to containerSecurityContext r=oleschoenburg a=backport-action # Description Backport of #10556 to `stable/8.0`. relates to camunda/camunda-platform-helm#374 10624: [Backport stable/8.0] fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=backport-action # Description Backport of #10611 to `stable/8.0`. relates to #9761 10638: [Backport stable/8.0] test: fix unfinished stubbing of command response writer r=oleschoenburg a=backport-action # Description Backport of #10605 to `stable/8.0`. relates to #10604 Co-authored-by: Meggle (Sebastian Bathke) <sebastian.bathke@camunda.com> Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Oct 7, 2022
10493: [Backport stable/8.0] fix(raft): handle exceptions on partition server init r=oleschoenburg a=megglos # Description Backport of #10450 to `stable/8.0`. 10566: [Backport stable/8.0] fix(helm): rename podSecurityContext to containerSecurityContext r=oleschoenburg a=backport-action # Description Backport of #10556 to `stable/8.0`. relates to camunda/camunda-platform-helm#374 10624: [Backport stable/8.0] fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=backport-action # Description Backport of #10611 to `stable/8.0`. relates to #9761 10638: [Backport stable/8.0] test: fix unfinished stubbing of command response writer r=oleschoenburg a=backport-action # Description Backport of #10605 to `stable/8.0`. relates to #10604 Co-authored-by: Meggle (Sebastian Bathke) <sebastian.bathke@camunda.com> Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
korthout
added
the
version:8.1.1
Marks an issue as being completely or in parts released in 8.1.1
label
Oct 13, 2022
korthout
added
version:8.2.0-alpha1
Marks an issue as being completely or in parts released in 8.2.0-alpha1
release/8.0.8
labels
Nov 1, 2022
npepinpe
added
the
version:8.2.0
Marks an issue as being completely or in parts released in 8.2.0
label
Apr 5, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
Categorizes an issue or PR as a bug
severity/mid
Marks a bug as having a noticeable impact but with a known workaround
version:8.1.1
Marks an issue as being completely or in parts released in 8.1.1
version:8.2.0-alpha1
Marks an issue as being completely or in parts released in 8.2.0-alpha1
version:8.2.0
Marks an issue as being completely or in parts released in 8.2.0
Describe the bug
This occurred in the leader.
Related to #7911 . But in that issue, it was followers who could not take snapshot and that is a valid scenario. However, leaders should always be able to take snapshot.
The impact of this bug is low (See following section). There is no data inconsistency or loss. The snapshot is taken later when new events are exported. However if exporting does not resume, it will not take new snapshot. So reprocessing time will be high.
To Reproduce
Following events happened in the broker leading to the above error.
Hypothesis
It fails to get the index because, it is trying to get the "previous" entry -> entry at index
494998838
. Since there is already a snapshot at that index, raft leader would have only replicated event from index494998839
. The record with the position1494387075
is at index494998839
. So there is no inconsistency in the logs but it cannot take snapshot because the previous index is already compacted.Expected behavior
Leader should take a snapshot if exporter position has not changed, but it has processed new events.
Log/Stacktrace
Logs
Full Stacktrace
The text was updated successfully, but these errors were encountered: