New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offload replica sync to async executor #20813
Offload replica sync to async executor #20813
Conversation
run-ee-tests |
The job Click to expand the log file-------------------------- ---------SUMMARY---------- -------------------------- [ERROR] Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=2048m; support was removed in 8.0 -------------------------- -------------------------- -------TEST FAILURE------- -------------------------- [INFO] Results: [INFO] [ERROR] Failures: [ERROR] SerializedObjectsCompatibilityTest.testObjectsAreDeserializedInCurrentVersion_whenEESerializationService:122->assertObjectsAreDeserialized:157 Failed to deserialize classes: com.hazelcast.config.MapConfig com.hazelcast.internal.partition.operation.PartitionReplicaSyncResponse com.hazelcast.internal.partition.operation.MigrationOperation com.hazelcast.jet.impl.execution.init.ExecutionPlan com.hazelcast.internal.partition.operation.MigrationRequestOperation com.hazelcast.jet.core.DAG |
(1) https://github.com/hazelcast/hazelcast-enterprise/pull/4815 includes a test for the priority thread blocking issue. Sample stack trace of failing migration due to persistence engine being shut down:
|
.../java/com/hazelcast/internal/partition/operation/PartitionReplicaSyncRequestOffloadable.java
Show resolved
Hide resolved
PartitionReplicaSyncRequestOffloadable would block the priority generic op thread while waiting for merkle tree comparison to occur, leading to deadlocks.
NodeExtension#shutdown should be called after graceful-shutdown-aware services are already shutdown. Otherwise persistence is shut down before data services, resulting in exceptions during migrations
41afa65
to
a7e983d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for the fix.
kudos @arodionov for the finding & reproducer. |
PartitionReplicaSyncRequestOffloadable
would block the prioritygeneric op thread while waiting for merkle tree comparison to occur,
leading to deadlocks.
Also restores node shutdown sequence of persistence engine as it was prior to c284b61