Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] #21255

ramizdundar · 2022-04-19T13:45:18Z

There is 1 stable member (master), 3 bouncing members and 1 driver in the test. Stable member and drive should never bounce in the test. The test was failing because master was kicking out the driver from the cluster even before driver can broadcast the dynamic changes.

Driver was kicked out because master wasn't able to handle driver's heartbeat. This would lead master to think that driver is dead.

Master isn't able to handle driver's heartbeat because of:

Master's operation threads are blocked.
Master can't get the lock for ClusterHeartbeatManager.handleHeartbeat().

Master is under pressure because there are 4 members trying to join him at the start of the test. Each member will try to send join request for each second master won't respond. For master to respond, we need at least one second because before responding master needs to call NodeEngineImpl.getPreJoinOperations() at least once. And all of these calls are done with ClusterJoinManager.clusterServiceLock. Which is actually the same lock ClusterHeartbeatManager uses.

If master for whatever reason slows down, all members want to join will send more and more join requests, which in turn make master even more congested, since all of these operations take 1 second (because of the sleep in the test) and they can't executed in parallel because of the lock.

So master would either be blocked by the join requests or can't acquire the lock for the ClusterHeartbeatManager. Then master will kick out the driver because it couldn't process the heartbeat.

This test doesn't test any new scenarios over DynamicConfigBouncingTest, hence it'll be removed with this PR.

Fixes #19785

Checklist:

Labels (Team:, Type:, Source:, Module:) and Milestone set
Label Add to Release Notes or Not Release Notes content set
Request reviewers if possible

This reverts commit 8668208.

ramizdundar · 2022-05-09T12:55:59Z

* Refactor test * Revert "Refactor test" This reverts commit 8668208. * Delete DynamicConfigSlowPreJoinBouncingTest (cherry picked from commit 17b92e6)

Backport of: #21255 (cherry picked from commit 17b92e6)

Refactor test

8668208

ramizdundar self-assigned this Apr 19, 2022

ramizdundar requested review from ufukyilmaz and vbekiaris April 19, 2022 14:25

ramizdundar added Team: Core Source: Internal PR or issue was opened by an employee Not Release Notes content Module: Cluster Type: Test-Failure labels Apr 19, 2022

ramizdundar added this to the 5.2 milestone Apr 19, 2022

AyberkSorgun changed the title ~~Refactor DynamicConfigSlowPreJoinBouncingTest~~ Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] May 9, 2022

ramizdundar added 2 commits May 9, 2022 15:44

Revert "Refactor test"

e43c7e9

This reverts commit 8668208.

Delete DynamicConfigSlowPreJoinBouncingTest

56ea89e

ramizdundar marked this pull request as ready for review May 9, 2022 12:56

vbekiaris approved these changes May 9, 2022

View reviewed changes

ufukyilmaz approved these changes May 16, 2022

View reviewed changes

ramizdundar merged commit 17b92e6 into hazelcast:master May 30, 2022

ramizdundar deleted the replace_prejoin_test branch May 30, 2022 12:18

ramizdundar mentioned this pull request Jun 21, 2022

[BACKPORT 5.1.z] Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] #21656

Merged

3 tasks

ramizdundar added a commit that referenced this pull request Jun 23, 2022

Delete DynamicConfigSlowPreJoinBouncingTest [HZ-978] (#21255) (#21656)

7bb03a5

Backport of: #21255 (cherry picked from commit 17b92e6)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] #21255

Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] #21255

ramizdundar commented Apr 19, 2022 •

edited

ramizdundar commented May 9, 2022

Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] #21255

Refactor DynamicConfigSlowPreJoinBouncingTest [HZ-978] #21255

Conversation

ramizdundar commented Apr 19, 2022 • edited

ramizdundar commented May 9, 2022

ramizdundar commented Apr 19, 2022 •

edited