New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry map and cache partition destroy operations #12686
Retry map and cache partition destroy operations #12686
Conversation
Test failed: https://hazelcast-l337.ci.cloudbees.com/job/new-lab-fast-pr/14605 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, left some questions / comments. Also, the new execution utility lacks tests.
* @see InvocationUtil#invokeLocallyWithRetry(NodeEngine, Operation) | ||
*/ | ||
public class LocalRetryableExecution implements Runnable, OperationResponseHandler { | ||
/** Number of times an operation is retried before being logger at WARNING level */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "logger" -> "logged"
* @see Operation#getOperationResponseHandler() | ||
* @see Operation#validatesTarget() | ||
*/ | ||
public static LocalRetryableExecution invokeLocallyWithRetry(NodeEngine nodeEngine, Operation operation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe executeLocallyWithRetry
since it's using OperationService.execute
? Also would help avoid confusion with full-blown invocations of OperationService.invokeXXX
.
public int getPartitionId() { | ||
return partitionContainer.getPartitionId(); | ||
public boolean returnsResponse() { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to override, true
is the default in Operation.returnsResponse()
|
||
@Override | ||
public boolean isUrgent() { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why turn this into an urgent operation? Could it interfere with cluster operations (eg join ops)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have copied it over from CacheSegmentDestroyOperation
. I removed it from both operations.
@vbekiaris thanks for the review. Addressed all comments and added test for retry logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job!
9f97122
to
d9e17bc
Compare
@ahmetmircik can you please finish the review? Is there anything that I need to address? |
@mmedenjak seems ok to me |
When the partition is migrating, the operations might fail with a PartitionMigratingException. We then need to retry the operation as otherwise we are leaking memory. As for other partition operations, we wait for the migration to complete, using the default try count and wait count. Fixes: https://github.com/hazelcast/hazelcast-enterprise/issues/930 Fixes: https://github.com/hazelcast/hazelcast-enterprise/issues/1933
d9e17bc
to
890c299
Compare
Can one of the admins verify this patch? |
2 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
When the partition is migrating, the operations might fail with a
PartitionMigratingException. We then need to retry the operation as
otherwise we are leaking memory. As for other partition operations,
we wait for the migration to complete, using the default try count and
wait count.
Fixes:
https://github.com/hazelcast/hazelcast-enterprise/issues/930
https://github.com/hazelcast/hazelcast-mono/issues/1427
EE PR: https://github.com/hazelcast/hazelcast-enterprise/pull/1991