Revert conversion to Lite member during Split-Brain healing #12691

ahmetmircik · 2018-03-22T11:42:36Z

ee counterpart: https://github.com/hazelcast/hazelcast-enterprise/pull/2021

reverts conversion to lite member
disposes stores upon merge
unifies hd merge code with heap merge codes, ee side merge code will be removed

closes https://github.com/hazelcast/hazelcast-mono/issues/1427
closes #12405

mdogan · 2018-03-29T10:37:11Z

hazelcast/src/main/java/com/hazelcast/internal/cluster/impl/ClusterMergeTask.java

@@ -106,19 +100,6 @@ private void disposeTasks(Collection<Runnable>... tasks) {
        }
    }

-    private void tryToPromoteLocalLiteMember() {
-        if (wasLiteMember) {


wasLiteMember field can be deleted too.

Donnerbart · 2018-03-29T15:05:20Z

hazelcast/src/main/java/com/hazelcast/cache/impl/AbstractCacheService.java

@@ -228,7 +227,7 @@ public DistributedObject createDistributedObject(String cacheNameWithPrefix) {
                    cacheConfig.setManagerPrefix(HazelcastCacheManager.CACHE_MANAGER_PREFIX);
                }

-                CacheMergePolicyProvider mergePolicyProvider = splitBrainHandlerService.getMergePolicyProvider();
+                CacheMergePolicyProvider mergePolicyProvider = new CacheMergePolicyProvider(nodeEngine);


We should have a single CacheMergePolicyProvider instance per HZ node, which is used for the config check and the split-brain merging. The provider class caches the created merge policy instances, so with a single instance we know for sure that the merge policy instance, which survived the config check is used in the split-brain process later. It also saves us a bunch of Class.for() calls and garbage.

That's why the legacy merge policy providers and all other data structures use the single instance of SplitBrainMergeProvider from the NodeEngine.

Same applies for IMap and ReplicatedMap.

So basically the services or map service context should store the created merge policy provider and make it accessible to the merge process.

Maybe we can define a LegacySplitBrainMergePolicyProvider interface with a Object getMergePolicy(String mergePolicyName) method. Then we can store a single instance in the AbstractSplitBrainHandlerService and make it easily accessible for the config checks and merge runnables.

changed it to one-instance-per-service.

mdogan

I just scanned lite-member related parts. 👍

mmedenjak · 2018-04-03T06:51:07Z

hazelcast/src/main/java/com/hazelcast/cache/impl/AbstractCacheService.java

-    // this method is overridden on ee
-    protected CacheSplitBrainHandlerService newSplitBrainHandlerService(NodeEngine nodeEngine) {
-        return new CacheSplitBrainHandlerService(nodeEngine, configs, segments);
+    public CacheMergePolicyProvider getMergePolicyProvider() {


Minor: This can be package private if the test is moved to the same package. Also, if the field is package-private, we can remove the getter altogether.

kept this as is, it is being used by different packages on ee side

mmedenjak · 2018-04-03T06:53:02Z

hazelcast/src/main/java/com/hazelcast/spi/impl/merge/AbstractSplitBrainHandlerService.java


-                        if (!isDiscardPolicy(mergePolicy)) {


Now we will collect even the stores that will be discarded and discard them later in the runnable. Any special reason for doing so? Can we revert to the old version on this?

put this checks to split-brain-handler-service.

mmedenjak · 2018-04-03T07:22:08Z

hazelcast/src/main/java/com/hazelcast/spi/impl/merge/AbstractMergeRunnable.java

+    }
+
+    private void asyncDestroyStores(final int partitionID, final Collection<Store> stores) {
+        operationExecutor.execute(new PartitionSpecificRunnable() {


Can it happen that the partition is migrating at this point (e.g. because more nodes are joining the cluster)? If so, can you use the InvocationUtil#executeLocallyWithRetry?

mmedenjak · 2018-04-03T08:20:17Z

hazelcast/src/main/java/com/hazelcast/map/impl/recordstore/RecordStore.java

+     * Like {@link #destroy()} but does not touch state on other services
+     * like lock service or event journal service.
+     */
+    void destroyInternals();


It's getting a bit cofusing - we have destroy (destroys the internals and the connected services), destroyInternals (destroys only the map internals), reset (I'm not sure how this one fits in, it also resets some data but it keeps some other like indexes and connected services like locks and journal) and clearPartition (just clears everything, also connected services).
I don't see a simple way of improving this. Maybe a task for a separate PR.

Couldn't agree more 💯

This is needed for this PR to pass the tests but filed an improvement issue, this problem is a bit tricky. #12779

ahmetmircik · 2018-04-04T06:44:54Z

run-lab-run

pveentjer · 2018-04-05T05:37:45Z

@ahmetmircik can you rebase?

mmedenjak · 2018-04-05T14:01:11Z

hazelcast/src/main/java/com/hazelcast/spi/impl/merge/BaseSplitBrainHandlerService.java

+    }
+
+    void asyncDestroyStores(final Collection<Store> stores, final int partitionID) {
+        operationExecutor.execute(new PartitionSpecificRunnable() {


Can the partition be migrating? Do we need to retry? See - #12691 (comment)

No need to retry, PartitionSpecificRunnable can independently run on local without doing any checks like is-migration-in-progress.

You're right. Thanks!

mmedenjak · 2018-04-05T14:10:54Z

hazelcast/src/main/java/com/hazelcast/spi/impl/merge/AbstractMergeRunnable.java

+    }
+
+    protected void onMerge(String dataStructureName) {
+        // override to take action on prepare of merging data structures


Minor: this is executed on merge. You can fix this in a separate PR.

mmedenjak · 2018-04-05T15:53:02Z

hazelcast/src/main/java/com/hazelcast/spi/impl/merge/BaseSplitBrainHandlerService.java

+    }
+
+    void asyncDestroyStores(final Collection<Store> stores, final int partitionID) {
+        operationExecutor.execute(new PartitionSpecificRunnable() {


You're right. Thanks!

hz-devops-test · 2024-04-22T12:45:48Z

The job Hazelcast-pr-compiler of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file

--------------------------
---------SUMMARY----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] error: Source option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast: Compilation failure: Compilation failure: 
--------------------------
---------ERRORS-----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] error: Source option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] error: Target option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast: Compilation failure: Compilation failure: 
--------------------------
[ERROR] error: Source option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] error: Target option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] -> [Help 1]
--------------------------
[ERROR] 
--------------------------
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
--------------------------
[ERROR] 
--------------------------
[ERROR] For more information about the errors and possible solutions, please read the following articles:
--------------------------
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
--------------------------
[ERROR] 
--------------------------
[ERROR] After correcting the problems, you can resume the build with the command
--------------------------
[ERROR]   mvn  -rf :hazelcast
--------------------------

hz-devops-test · 2024-04-22T12:46:42Z

The job Hazelcast-pr-EE-compiler of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file

--------------------------
---------SUMMARY----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] error: Source option 6 is no longer supported. Use 7 or later.
--------------------------
---------ERRORS-----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] error: Source option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] error: Target option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast: Compilation failure: Compilation failure: 
--------------------------
[ERROR] error: Source option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] error: Target option 6 is no longer supported. Use 7 or later.
--------------------------
[ERROR] -> [Help 1]
--------------------------
[ERROR] 
--------------------------
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
--------------------------
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
--------------------------
[ERROR] 
--------------------------
[ERROR] For more information about the errors and possible solutions, please read the following articles:
--------------------------
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
--------------------------
[ERROR] 
--------------------------
[ERROR] After correcting the problems, you can resume the build with the command
--------------------------
[ERROR]   mvn  -rf :hazelcast
--------------------------

hz-devops-test · 2024-04-22T12:47:22Z

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file

---------ERRORS-----------
--------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project hazelcast: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test failed: java.lang.NoClassDefFoundError: java/sql/Timestamp: java.sql.Timestamp -> [Help 1]
--------------------------
[ERROR] 
--------------------------
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
--------------------------
[ERROR] 
--------------------------
[ERROR] For more information about the errors and possible solutions, please read the following articles:
--------------------------
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
--------------------------
[ERROR] 
--------------------------
[ERROR] After correcting the problems, you can resume the build with the command
--------------------------
[ERROR]   mvn  -rf :hazelcast
--------------------------

ahmetmircik added this to the 3.10 milestone Mar 22, 2018

ahmetmircik force-pushed the fix/3.10/icmpFix branch 4 times, most recently from 12485c5 to 39f80fc Compare March 28, 2018 20:48

ahmetmircik changed the title ~~WIP~~ Revert conversion to lite member during heal Mar 29, 2018

ahmetmircik force-pushed the fix/3.10/icmpFix branch 3 times, most recently from 3fa85e6 to 8ab074c Compare March 29, 2018 10:30

ahmetmircik added Type: Defect Team: Core labels Mar 29, 2018

ahmetmircik requested review from mmedenjak, mdogan and Donnerbart March 29, 2018 10:34

mdogan reviewed Mar 29, 2018

View reviewed changes

ahmetmircik force-pushed the fix/3.10/icmpFix branch from 8ab074c to 071a875 Compare March 29, 2018 10:47

Donnerbart reviewed Mar 29, 2018

View reviewed changes

ahmetmircik force-pushed the fix/3.10/icmpFix branch 3 times, most recently from 5c11902 to 4c5d483 Compare April 2, 2018 08:00

mdogan approved these changes Apr 2, 2018

View reviewed changes

mmedenjak reviewed Apr 3, 2018

View reviewed changes

Donnerbart changed the title ~~Revert conversion to lite member during heal~~ Revert conversion to Lite member during Split-Brain healing Apr 3, 2018

Donnerbart mentioned this pull request Apr 3, 2018

Split-Brain Merge Policies for Additional Data Structures #11969

Open

ahmetmircik force-pushed the fix/3.10/icmpFix branch from 4c5d483 to e447279 Compare April 3, 2018 21:48

tkountis approved these changes Apr 4, 2018

View reviewed changes

ahmetmircik force-pushed the fix/3.10/icmpFix branch from e447279 to f3bc2e6 Compare April 5, 2018 08:12

mmedenjak reviewed Apr 5, 2018

View reviewed changes

mmedenjak approved these changes Apr 5, 2018

View reviewed changes

ahmetmircik force-pushed the fix/3.10/icmpFix branch from f3bc2e6 to 1eb4c10 Compare April 5, 2018 19:56

ahmetmircik added 2 commits April 5, 2018 23:21

Revert conversion to lite member during heal

0ca109b

Dispose store immediately when unneeded during merge

8d23020

ahmetmircik force-pushed the fix/3.10/icmpFix branch from 1eb4c10 to 8d23020 Compare April 5, 2018 20:21

ahmetmircik merged commit 8e2e01f into hazelcast:master Apr 6, 2018

ahmetmircik deleted the fix/3.10/icmpFix branch April 6, 2018 07:41

mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert conversion to Lite member during Split-Brain healing #12691

Revert conversion to Lite member during Split-Brain healing #12691

ahmetmircik commented Mar 22, 2018 •

edited

mdogan Mar 29, 2018

ahmetmircik Mar 29, 2018

Donnerbart Mar 29, 2018

Donnerbart Mar 29, 2018

Donnerbart Mar 29, 2018

ahmetmircik Mar 30, 2018

mdogan left a comment

mmedenjak Apr 3, 2018

ahmetmircik Apr 4, 2018

mmedenjak Apr 3, 2018

ahmetmircik Apr 4, 2018

mmedenjak Apr 3, 2018

mmedenjak Apr 3, 2018

tkountis Apr 4, 2018

ahmetmircik Apr 4, 2018

ahmetmircik commented Apr 4, 2018

pveentjer commented Apr 5, 2018

mmedenjak Apr 5, 2018

ahmetmircik Apr 5, 2018

mmedenjak Apr 5, 2018

mmedenjak Apr 5, 2018

mmedenjak Apr 5, 2018

hz-devops-test commented Apr 22, 2024

hz-devops-test commented Apr 22, 2024

hz-devops-test commented Apr 22, 2024

Revert conversion to Lite member during Split-Brain healing #12691

Revert conversion to Lite member during Split-Brain healing #12691

Conversation

ahmetmircik commented Mar 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdogan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahmetmircik commented Apr 4, 2018

pveentjer commented Apr 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hz-devops-test commented Apr 22, 2024

hz-devops-test commented Apr 22, 2024

hz-devops-test commented Apr 22, 2024

ahmetmircik commented Mar 22, 2018 •

edited