Tests for CP Membership restart issues #24903

lprimak · 2023-06-27T02:31:22Z

Demos CP member restart and rejoin issues
Relates to #24897

devOpsHazelcast · 2023-06-27T02:31:26Z

Can one of the admins verify this patch?

devOpsHazelcast · 2023-06-27T02:31:26Z

Can one of the admins verify this patch?

devOpsHazelcast · 2023-06-27T02:31:26Z

Can one of the admins verify this patch?

devOpsHazelcast · 2023-06-27T02:31:28Z

Can one of the admins verify this patch?

arodionov · 2023-08-16T09:51:41Z

If a cluster lost the majority of its members it will be blocked and should be recovered manually https://docs.hazelcast.com/hazelcast/5.3/cp-subsystem/management#handling-a-lost-majority

lprimak · 2023-08-17T01:26:39Z

@arodionov I think it's premature to close this for the following reasons:

There is no event sent by Hazelcast when members shut down normally and majority is lost, thus there is no concrete way to find out when the "unsafe state" occurs.
There is no way (even manually) to recover the cluster to a working state unless at east 3 members exist and are functioning.
"Unrecoverable state" is dubious at best
100% CPU usage is seen under certain cicrumstances
Requiring manual recovery is also dubious.

I would suggest reopening this PR.

arodionov · 2023-08-17T10:48:32Z

@lprimak thanks for your points!

Regarding,

There is no event sent by Hazelcast when members shut down normally and majority is lost, thus there is no concrete way to find out when the "unsafe state" occurs.

there is a CP Group Availability Listeners https://docs.hazelcast.com/hazelcast/5.3/cp-subsystem/management#cp-group-availability-listeners

Other points, I'll copy to #24912

lprimak · 2023-08-17T16:02:50Z

Thanks @arodionov All of this is already described in #24897

there is a CP Group Availability Listeners https://docs.hazelcast.com/hazelcast/5.3/cp-subsystem/management#cp-group-availability-listeners

Just want to reiterate that those listeners are not called when members are shut down properly, only when they die / freeze unexpectedly. This is why the above cannot relied upon currently.

If you run https://github.com/flowlogix/hazelcast-issues on 3 terminals, (stop/restart, use Ctrl-z) in about 10 minutes you will see all those issues in action.

devOpsHazelcast · 2024-04-16T22:03:09Z

PR closed by Hazelcast automation as no activity (>6 months). Please reopen with comments, if necessary. Thank you for using Hazelcast and your valuable contributions

lprimak · 2024-04-16T23:49:35Z

Please reopen. Still valid

added contractAndReExpandRaftGroup* tests

d73eb4f

hz-devops-test added the Source: Community PR or issue was opened by a community user label Jun 27, 2023

lprimak mentioned this pull request Jun 27, 2023

CP Subsystem/Raft: Instability when restarting members #24897

Open

lprimak added 2 commits June 26, 2023 22:14

failover test

109fe94

renamed FailoverTest to SplitBrainTest

528b85c

lprimak changed the title ~~Tests for RAFT issues~~ Tests for CP Membership restart issues Jun 27, 2023

arodionov closed this Aug 16, 2023

arodionov mentioned this pull request Aug 17, 2023

Align k8s pods auto-restart with CP members removal and promotion #24912

Open

arodionov reopened this Aug 17, 2023

lprimak added 4 commits September 14, 2023 01:21

Merge branch 'master' into RAFT_ISSUES

a929f11

added test for missing majorityLost messages upon normal termination

e1094ce

Merge branch 'master' into RAFT_ISSUES

c2e18ff

revert CPGroupAvailabilityListenerTest.java - no longer necessary

c2ab24b

devOpsHazelcast closed this Apr 16, 2024

devOpsHazelcast added the Automation: PR auto closed label Apr 16, 2024

arodionov reopened this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for CP Membership restart issues #24903

Tests for CP Membership restart issues #24903

lprimak commented Jun 27, 2023 •

edited

devOpsHazelcast commented Jun 27, 2023

devOpsHazelcast commented Jun 27, 2023

devOpsHazelcast commented Jun 27, 2023

devOpsHazelcast commented Jun 27, 2023

arodionov commented Aug 16, 2023

lprimak commented Aug 17, 2023

arodionov commented Aug 17, 2023

lprimak commented Aug 17, 2023

devOpsHazelcast commented Apr 16, 2024

lprimak commented Apr 16, 2024

Tests for CP Membership restart issues #24903

Are you sure you want to change the base?

Tests for CP Membership restart issues #24903

Conversation

lprimak commented Jun 27, 2023 • edited

devOpsHazelcast commented Jun 27, 2023

devOpsHazelcast commented Jun 27, 2023

devOpsHazelcast commented Jun 27, 2023

devOpsHazelcast commented Jun 27, 2023

arodionov commented Aug 16, 2023

lprimak commented Aug 17, 2023

arodionov commented Aug 17, 2023

lprimak commented Aug 17, 2023

devOpsHazelcast commented Apr 16, 2024

lprimak commented Apr 16, 2024

lprimak commented Jun 27, 2023 •

edited