-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend RandomizedRaftTest
with snapshot and data loss operations
#9837
Labels
area/test
Marks an issue as improving or extending the test coverage of the project
component/raft
kind/toil
Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
version:8.1.0
Marks an issue as being completely or in parts released in 8.1.0
version:8.2.0-alpha5
Marks an issue as being completely or in parts released in 8.2.0-alpha5
version:8.2.0
Marks an issue as being completely or in parts released in 8.2.0
Comments
lenaschoenburg
added
the
kind/toil
Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
label
Jul 19, 2022
marked it with the hot backup epic as it might be valuable to verify the system |
10 tasks
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Aug 29, 2022
10183: fix(raft): follower reset pendingsnapshot after rejecting install request r=deepthidevaki a=deepthidevaki ## Description - Extended RandomizedRaftTest to include snapshotting and compaction. This test could reproduce the bug #10180 - Without this fix, when a follower rejects a snapshot install request because it receives a duplicate chunk, the leader resets the snapshot replication and restart sending the same snapshot. But the follower has not reset its state, so it is still expecting a different chunk and reject the request again. The fix is to reset the pending snapshot when follower rejects a request on any error. Extended RandomizedRaftTest partially covers #9837 ## Related issues closes #10180 closes #10202 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com> Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
Is this done via #10183 ? |
No. It only added snapshot. We still have to add restarts and restarts with dataloss. |
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Sep 2, 2022
10249: Add restarts to `RandomizedRaftTest` r=oleschoenburg a=oleschoenburg Adds the ability to restart raft members without data loss. After the restart, the same data directory and snapshot store is used. Adds additional randomized tests for default operations + restarts and default operations + snapshots + restarts. relates to #9837 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
Zelldon
added
the
version:8.1.0
Marks an issue as being completely or in parts released in 8.1.0
label
Oct 4, 2022
Zelldon
added
component/raft
area/test
Marks an issue as improving or extending the test coverage of the project
labels
Jan 2, 2023
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Feb 13, 2023
11547: fix: append an empty expired message command r=remcowesterhoud a=romansmirnov ## Description To discard an expired message the processor and applier only need the message key. All other information is not necessary to process a expire message command. That's why only the `messageKey` is submitted to the command to expire a message. This prevents the message checker to stop too early when collecting expired messages and not being able to collect as many as necessary expired messages in one batch to prevent building up state. ## Related issues closes #10643 11581: Create `decisionRequirementsKeyByIdAndVersion` ColumnFamily r=koevskinikola a=remcowesterhoud ## Description <!-- Please explain the changes you made here. --> When we delete a DRG it is possible that this is the "latest" DRG. When this is the case we must be able to find the previous version of the DRG to make this the new "latest" DRG. This PR introduces a new ColumnFamily: `decisionRequirementsKeyByIdAndVersion`. The key of this ColumnFamily is a composite of the `drgId` and the `drgVersion`. This makes sure the key is unique for all DRGs, whilst allowing us to iterate over all entries using the `drgId`. The value of the ColumnFamily is the `drgKey`. A new method has been introduced to iterate over this ColumnFamily. It takes a `drgId` and a `currentVersion`. Using this information it will find the previous version (relative to the given `currentVersion`) of this specific `drgId`. If it found something it will return the key of this DRG. As part of another issue this method will be used and the key will be stored in the `latestDecisionRequirementsKeysById` ColumnFamily. When a new DRG is stored in the state it will be inserted into this ColumnFamily accordingly. Last but not least we must make sure we migrate all DRGs in the state to initially fill this new ColumnFamily. A new migration for this has been added. It will iterate over the `decisionRequirementsByKey` ColumnFamily and use these DRGs to insert new entries into the new ColumnFamily. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #11541 11632: Add randomized test for dataloss scenario r=deepthidevaki a=deepthidevaki ## Description Add restart with dataloss to the set of input operations in randomized raft tests. The operations ensures that we can only inject data loss of one node at a time. It waits until the restarted node have recovered and caught up before proceeding with the other operations. This PR also adds additional verification to ensure that there is no dataloss, by keeping track of committed entries and verifying that it is not overwritten. ## Related issues closes #9837 Co-authored-by: Roman <roman.smirnov@camunda.com> Co-authored-by: Remco Westerhoud <remco@westerhoud.nl> Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
zeebe-bors-camunda bot
added a commit
that referenced
this issue
Feb 13, 2023
11581: Create `decisionRequirementsKeyByIdAndVersion` ColumnFamily r=koevskinikola a=remcowesterhoud ## Description <!-- Please explain the changes you made here. --> When we delete a DRG it is possible that this is the "latest" DRG. When this is the case we must be able to find the previous version of the DRG to make this the new "latest" DRG. This PR introduces a new ColumnFamily: `decisionRequirementsKeyByIdAndVersion`. The key of this ColumnFamily is a composite of the `drgId` and the `drgVersion`. This makes sure the key is unique for all DRGs, whilst allowing us to iterate over all entries using the `drgId`. The value of the ColumnFamily is the `drgKey`. A new method has been introduced to iterate over this ColumnFamily. It takes a `drgId` and a `currentVersion`. Using this information it will find the previous version (relative to the given `currentVersion`) of this specific `drgId`. If it found something it will return the key of this DRG. As part of another issue this method will be used and the key will be stored in the `latestDecisionRequirementsKeysById` ColumnFamily. When a new DRG is stored in the state it will be inserted into this ColumnFamily accordingly. Last but not least we must make sure we migrate all DRGs in the state to initially fill this new ColumnFamily. A new migration for this has been added. It will iterate over the `decisionRequirementsByKey` ColumnFamily and use these DRGs to insert new entries into the new ColumnFamily. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #11541 11632: Add randomized test for dataloss scenario r=deepthidevaki a=deepthidevaki ## Description Add restart with dataloss to the set of input operations in randomized raft tests. The operations ensures that we can only inject data loss of one node at a time. It waits until the restarted node have recovered and caught up before proceeding with the other operations. This PR also adds additional verification to ensure that there is no dataloss, by keeping track of committed entries and verifying that it is not overwritten. ## Related issues closes #9837 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl> Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
deepthidevaki
added
the
version:8.2.0-alpha5
Marks an issue as being completely or in parts released in 8.2.0-alpha5
label
Mar 7, 2023
npepinpe
added
the
version:8.2.0
Marks an issue as being completely or in parts released in 8.2.0
label
Apr 5, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/test
Marks an issue as improving or extending the test coverage of the project
component/raft
kind/toil
Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
version:8.1.0
Marks an issue as being completely or in parts released in 8.1.0
version:8.2.0-alpha5
Marks an issue as being completely or in parts released in 8.2.0-alpha5
version:8.2.0
Marks an issue as being completely or in parts released in 8.2.0
Recent issues with snapshots have highlighted a need to improve test coverage. We can extend the
RandomizedRaftTest
by adding additional operations for taking snapshots and for losing data.The text was updated successfully, but these errors were encountered: