New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert conversion to Lite member during Split-Brain healing #12691
Conversation
12485c5
to
39f80fc
Compare
3fa85e6
to
8ab074c
Compare
@@ -106,19 +100,6 @@ private void disposeTasks(Collection<Runnable>... tasks) { | |||
} | |||
} | |||
|
|||
private void tryToPromoteLocalLiteMember() { | |||
if (wasLiteMember) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasLiteMember
field can be deleted too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
8ab074c
to
071a875
Compare
@@ -228,7 +227,7 @@ public DistributedObject createDistributedObject(String cacheNameWithPrefix) { | |||
cacheConfig.setManagerPrefix(HazelcastCacheManager.CACHE_MANAGER_PREFIX); | |||
} | |||
|
|||
CacheMergePolicyProvider mergePolicyProvider = splitBrainHandlerService.getMergePolicyProvider(); | |||
CacheMergePolicyProvider mergePolicyProvider = new CacheMergePolicyProvider(nodeEngine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a single CacheMergePolicyProvider
instance per HZ node, which is used for the config check and the split-brain merging. The provider class caches the created merge policy instances, so with a single instance we know for sure that the merge policy instance, which survived the config check is used in the split-brain process later. It also saves us a bunch of Class.for()
calls and garbage.
That's why the legacy merge policy providers and all other data structures use the single instance of SplitBrainMergeProvider
from the NodeEngine
.
Same applies for IMap
and ReplicatedMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically the services or map service context should store the created merge policy provider and make it accessible to the merge process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can define a LegacySplitBrainMergePolicyProvider
interface with a Object getMergePolicy(String mergePolicyName)
method. Then we can store a single instance in the AbstractSplitBrainHandlerService
and make it easily accessible for the config checks and merge runnables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to one-instance-per-service.
5c11902
to
4c5d483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just scanned lite-member related parts. 👍
// this method is overridden on ee | ||
protected CacheSplitBrainHandlerService newSplitBrainHandlerService(NodeEngine nodeEngine) { | ||
return new CacheSplitBrainHandlerService(nodeEngine, configs, segments); | ||
public CacheMergePolicyProvider getMergePolicyProvider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This can be package private if the test is moved to the same package. Also, if the field is package-private, we can remove the getter altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept this as is, it is being used by different packages on ee side
|
||
if (!isDiscardPolicy(mergePolicy)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we will collect even the stores that will be discarded and discard them later in the runnable. Any special reason for doing so? Can we revert to the old version on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this checks to split-brain-handler-service.
} | ||
|
||
private void asyncDestroyStores(final int partitionID, final Collection<Store> stores) { | ||
operationExecutor.execute(new PartitionSpecificRunnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it happen that the partition is migrating at this point (e.g. because more nodes are joining the cluster)? If so, can you use the InvocationUtil#executeLocallyWithRetry
?
* Like {@link #destroy()} but does not touch state on other services | ||
* like lock service or event journal service. | ||
*/ | ||
void destroyInternals(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting a bit cofusing - we have destroy
(destroys the internals and the connected services), destroyInternals
(destroys only the map internals), reset
(I'm not sure how this one fits in, it also resets some data but it keeps some other like indexes and connected services like locks and journal) and clearPartition
(just clears everything, also connected services).
I don't see a simple way of improving this. Maybe a task for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't agree more 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for this PR to pass the tests but filed an improvement issue, this problem is a bit tricky. #12779
4c5d483
to
e447279
Compare
run-lab-run |
@ahmetmircik can you rebase? |
e447279
to
f3bc2e6
Compare
} | ||
|
||
void asyncDestroyStores(final Collection<Store> stores, final int partitionID) { | ||
operationExecutor.execute(new PartitionSpecificRunnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the partition be migrating? Do we need to retry? See - #12691 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to retry, PartitionSpecificRunnable
can independently run on local without doing any checks like is-migration-in-progress
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks!
} | ||
|
||
protected void onMerge(String dataStructureName) { | ||
// override to take action on prepare of merging data structures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this is executed on merge. You can fix this in a separate PR.
} | ||
|
||
void asyncDestroyStores(final Collection<Store> stores, final int partitionID) { | ||
operationExecutor.execute(new PartitionSpecificRunnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks!
f3bc2e6
to
1eb4c10
Compare
1eb4c10
to
8d23020
Compare
The job Click to expand the log file-------------------------- ---------SUMMARY---------- -------------------------- [ERROR] COMPILATION ERROR : -------------------------- [ERROR] error: Source option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast: Compilation failure: Compilation failure: -------------------------- ---------ERRORS----------- -------------------------- [ERROR] COMPILATION ERROR : -------------------------- [ERROR] error: Source option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] error: Target option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast: Compilation failure: Compilation failure: -------------------------- [ERROR] error: Source option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] error: Target option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] -> [Help 1] -------------------------- [ERROR] -------------------------- [ERROR] Re-run Maven using the -X switch to enable full debug logging. -------------------------- [ERROR] -------------------------- [ERROR] For more information about the errors and possible solutions, please read the following articles: -------------------------- [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException -------------------------- [ERROR] -------------------------- [ERROR] After correcting the problems, you can resume the build with the command -------------------------- [ERROR] mvn -rf :hazelcast -------------------------- |
The job Click to expand the log file-------------------------- ---------SUMMARY---------- -------------------------- [ERROR] COMPILATION ERROR : -------------------------- [ERROR] error: Source option 6 is no longer supported. Use 7 or later. -------------------------- ---------ERRORS----------- -------------------------- [ERROR] COMPILATION ERROR : -------------------------- [ERROR] error: Source option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] error: Target option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast: Compilation failure: Compilation failure: -------------------------- [ERROR] error: Source option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] error: Target option 6 is no longer supported. Use 7 or later. -------------------------- [ERROR] -> [Help 1] -------------------------- [ERROR] -------------------------- [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. -------------------------- [ERROR] Re-run Maven using the -X switch to enable full debug logging. -------------------------- [ERROR] -------------------------- [ERROR] For more information about the errors and possible solutions, please read the following articles: -------------------------- [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException -------------------------- [ERROR] -------------------------- [ERROR] After correcting the problems, you can resume the build with the command -------------------------- [ERROR] mvn -rf :hazelcast -------------------------- |
The job Click to expand the log file---------ERRORS----------- -------------------------- [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project hazelcast: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test failed: java.lang.NoClassDefFoundError: java/sql/Timestamp: java.sql.Timestamp -> [Help 1] -------------------------- [ERROR] -------------------------- [ERROR] Re-run Maven using the -X switch to enable full debug logging. -------------------------- [ERROR] -------------------------- [ERROR] For more information about the errors and possible solutions, please read the following articles: -------------------------- [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException -------------------------- [ERROR] -------------------------- [ERROR] After correcting the problems, you can resume the build with the command -------------------------- [ERROR] mvn -rf :hazelcast -------------------------- |
ee counterpart: https://github.com/hazelcast/hazelcast-enterprise/pull/2021
closes https://github.com/hazelcast/hazelcast-mono/issues/1427
closes #12405