Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: take snapshot if nothing was exported since last snapshot #10611

Merged
merged 2 commits into from Oct 6, 2022

Conversation

oleschoenburg
Copy link
Member

When figuring out where to take the next snapshot, we determine the snapshot position as the minimum of processing and exporter positions.

There was an edge case where a leader could process but no export.
In that case it'd use the exporter position as snapshot position and tryand find a log entry at that position.
If the log starts with the exporter position, for example because the same broker previously received a snapshot and compacted the log, no entry could be found which led to a failed snapshot.

We now try and use the latest snapshot's term and index if the log entry could not be found. This ensures that new snapshots can be taken even if nothing was exported since the last snapshot.

closes #9761

In cases where the log contains no entry at the snapshot position but a
previous snapshot exists and taking a new snapshot is desirable because
the processed position changed, verify that a new snapshot is taken that
has the same term and index as the previous snapshot but a higher
processed position.
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

Test Results

   934 files  ±    0     934 suites  ±0   2h 4m 13s ⏱️ - 1m 15s
7 224 tests  - 289  7 218 ✔️  - 289  6 💤 ±0  0 ±0 
7 414 runs   - 289  7 406 ✔️  - 289  8 💤 ±0  0 ±0 

Results for commit ef090ab. ± Comparison against base commit ac6ede5.

♻️ This comment has been updated with latest results.

When figuring out where to take the next snapshot, we determine the
snapshot position as the minimum of processing and exporter positions.

There was an edge case where a leader could process but no export.
In that case it'd use the exporter position as snapshot position and try
and find a log entry at that position.
If the log starts with the exporter position, for example
because the same broker previously received a snapshot and compacted the
log, no entry could be found which led to a failed snapshot.

We now try and use the latest snapshot's term and index if the log entry
could not be found. This ensures that new snapshots can be taken even if
nothing was exported since the last snapshot.
@oleschoenburg oleschoenburg force-pushed the 9761-snapshot-with-truncated-log branch from 52bef12 to ef090ab Compare October 5, 2022 12:57
@oleschoenburg oleschoenburg marked this pull request as ready for review October 5, 2022 13:45
Copy link
Contributor

@deepthidevaki deepthidevaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. 🎉

@oleschoenburg
Copy link
Member Author

bors r+

zeebe-bors-camunda bot added a commit that referenced this pull request Oct 6, 2022
10611: fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=oleschoenburg

When figuring out where to take the next snapshot, we determine the snapshot position as the minimum of processing and exporter positions.

There was an edge case where a leader could process but no export.
In that case it'd use the exporter position as snapshot position and tryand find a log entry at that position.
If the log starts with the exporter position, for example because the same broker previously received a snapshot and compacted the log, no entry could be found which led to a failed snapshot.

We now try and use the latest snapshot's term and index if the log entry could not be found. This ensures that new snapshots can be taken even if nothing was exported since the last snapshot.

closes #9761 

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
@zeebe-bors-camunda
Copy link
Contributor

Build failed:

@oleschoenburg
Copy link
Member Author

bors retry

@zeebe-bors-camunda
Copy link
Contributor

Build succeeded:

@backport-action
Copy link
Collaborator

Successfully created backport PR #10624 for stable/8.0.

@backport-action
Copy link
Collaborator

Successfully created backport PR #10625 for stable/8.1.

zeebe-bors-camunda bot added a commit that referenced this pull request Oct 6, 2022
10624: [Backport stable/8.0] fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=backport-action

# Description
Backport of #10611 to `stable/8.0`.

relates to #9761

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this pull request Oct 7, 2022
10625: [Backport stable/8.1] fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=backport-action

# Description
Backport of #10611 to `stable/8.1`.

relates to #9761

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this pull request Oct 7, 2022
10493: [Backport stable/8.0] fix(raft): handle exceptions on partition server init r=oleschoenburg a=megglos

# Description
Backport of #10450 to `stable/8.0`.

10566: [Backport stable/8.0] fix(helm): rename podSecurityContext to containerSecurityContext r=oleschoenburg a=backport-action

# Description
Backport of #10556 to `stable/8.0`.

relates to camunda/camunda-platform-helm#374

10624: [Backport stable/8.0] fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=backport-action

# Description
Backport of #10611 to `stable/8.0`.

relates to #9761

10638: [Backport stable/8.0] test: fix unfinished stubbing of command response writer r=oleschoenburg a=backport-action

# Description
Backport of #10605 to `stable/8.0`.

relates to #10604

Co-authored-by: Meggle (Sebastian Bathke) <sebastian.bathke@camunda.com>
Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this pull request Oct 7, 2022
10493: [Backport stable/8.0] fix(raft): handle exceptions on partition server init r=oleschoenburg a=megglos

# Description
Backport of #10450 to `stable/8.0`.

10566: [Backport stable/8.0] fix(helm): rename podSecurityContext to containerSecurityContext r=oleschoenburg a=backport-action

# Description
Backport of #10556 to `stable/8.0`.

relates to camunda/camunda-platform-helm#374

10624: [Backport stable/8.0] fix: take snapshot if nothing was exported since last snapshot r=oleschoenburg a=backport-action

# Description
Backport of #10611 to `stable/8.0`.

relates to #9761

10638: [Backport stable/8.0] test: fix unfinished stubbing of command response writer r=oleschoenburg a=backport-action

# Description
Backport of #10605 to `stable/8.0`.

relates to #10604

Co-authored-by: Meggle (Sebastian Bathke) <sebastian.bathke@camunda.com>
Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
@korthout korthout added the version:8.1.1 Marks an issue as being completely or in parts released in 8.1.1 label Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:8.1.1 Marks an issue as being completely or in parts released in 8.1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to take snapshot in leader because index entry is not found
4 participants