Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hanging cluster safe query from common pool #21208

Merged

Conversation

vbekiaris
Copy link
Collaborator

When all FJP#commonPool threads are busy querying isClusterSafe
and partition assignments are not in sync (eg during initial
partition arrangement), then there is no chance for an important
callback to be executed after PartitionBackupReplicaAntiEntropyOperation
is done, resulting in neither partition replica sync nor cluster-safe
query being able to make any progress.
The fix is to use the Hazelcast internal async executor (instead of
the common pool) for the callback that processes replica antientropy
operation result.

(cherry picked from commit 434d731)
Backport of #21145 to 5.1.z

When all FJP#commonPool threads are busy querying isClusterSafe
and partition assignments are not in sync (eg during initial
partition arrangement), then there is no chance for an important
callback to be executed after PartitionBackupReplicaAntiEntropyOperation
is done, resulting in neither partition replica sync nor cluster-safe
query being able to make any progress.
The fix is to use the Hazelcast internal async executor (instead of
the common pool) for the callback that processes replica antientropy
operation result.

(cherry picked from commit 434d731)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants