-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport stable/8.0] Only check that the topology reports all replicas #10084
Merged
+676
−282
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Backports the expanded TopologyAssert from the main branch to stable/8.0, as well as its tests.
Instead of using the helper `isComplete`, manually check that the topology contains all replicas, omitting the check for the partition health. While this would be safer, there is a race condition where the partitions may be unhealthy for a short time on start up. For this test, we only really want to check that the nodes find each other, not that processing works, as it should be largely unaffected.
The IPv6 test is sometimes flaky because it may take a while for a full cluster to start: 3 brokers and one gateway. To test IPv6, however, we don't really need so many nodes, and having 1 broker and 1 gateway is enough: all ports are tested properly (except the management port, which is tested already by Spring). Additionally, this fixes that if the test fails once, the network may not be removed properly afterwards, which causes further iterations to fail. We ensure the network is closed here, and also we print out a proper error if we fail to start the network on the next run. Finally I took the time to clean up the test using new APIs provided by zeebe-test-container.
Previously, the rule only waited for a leader of partition 1 before returning from `startBroker`. With this change, the leader must also report as healthy. This is intended to fix some flaky tests where after the startup, commands send to the leader of partition 1 failed because no command-api handler was registered. Since registration of handlers is required for a healthy status, this should hopefully prevent these kinds of flaky tests.
The `ClusterRule` uses the topology to figure out if the cluster is ready before a test starts. There was a flaky condition where it might start when all partitions are present, but before they are healthy. This change was pulled from #9502 - however in that PR, it was only done in the broker module's `EmbeddedBrokerRule`. At the same time, the rule is updated such that everywhere it was waiting for the topology, now also uses `TopologyAssert`.
10 tasks
@Zelldon - it's a lot of changes, but as I said, it's all backports with no conflicts. The only additional commit was the last one, which I've also forward ported. |
ghost
pushed a commit
that referenced
this pull request
Aug 17, 2022
10086: Avoid race condition when polling for topology r=oleschoenburg a=npepinpe ## Description This PR fixes a potential race condition when starting the `EmbeddedBrokerRule`. We poll for the topology and wait until it's complete. Unfortunately, if we're too fast and the request is rejected, the `.join()` will throw, which will be rethrown by `Awaitility` (as when using `untilAsserted` it only catches `AssertionError`), failing the test. By ignoring exceptions here we can focus on just failing when the topology is never complete (Awaitility will print out the last failure in that case). ## Related issues related to #10084 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
ChrisKujawa
approved these changes
Aug 17, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks!
bors merge |
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
version:8.1.0-alpha5
Marks an issue as being completely or in parts released in 8.1.0-alpha5
version:8.1.0
Marks an issue as being completely or in parts released in 8.1.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR backports #10082 to stable/8.0. At the same time, I backported two other flaky test fixes, since one included the
TopologyAssert
helpers used here (and possibly in further flaky test fixes).There were no conflicts by the way.
Related issues
backports #10082, #9596, #9502
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.