Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry map and cache partition destroy operations #12686

Merged
merged 1 commit into from Mar 27, 2018

Conversation

mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Mar 22, 2018

When the partition is migrating, the operations might fail with a
PartitionMigratingException. We then need to retry the operation as
otherwise we are leaking memory. As for other partition operations,
we wait for the migration to complete, using the default try count and
wait count.

Fixes:
https://github.com/hazelcast/hazelcast-enterprise/issues/930
https://github.com/hazelcast/hazelcast-mono/issues/1427

EE PR: https://github.com/hazelcast/hazelcast-enterprise/pull/1991

@vbekiaris
Copy link
Collaborator

Test failed: ClientRegressionWithMockNetworkTest.testOperationRedo

https://hazelcast-l337.ci.cloudbees.com/job/new-lab-fast-pr/14605

Copy link
Collaborator

@vbekiaris vbekiaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, left some questions / comments. Also, the new execution utility lacks tests.

* @see InvocationUtil#invokeLocallyWithRetry(NodeEngine, Operation)
*/
public class LocalRetryableExecution implements Runnable, OperationResponseHandler {
/** Number of times an operation is retried before being logger at WARNING level */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "logger" -> "logged"

* @see Operation#getOperationResponseHandler()
* @see Operation#validatesTarget()
*/
public static LocalRetryableExecution invokeLocallyWithRetry(NodeEngine nodeEngine, Operation operation) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe executeLocallyWithRetry since it's using OperationService.execute? Also would help avoid confusion with full-blown invocations of OperationService.invokeXXX.

public int getPartitionId() {
return partitionContainer.getPartitionId();
public boolean returnsResponse() {
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to override, true is the default in Operation.returnsResponse()


@Override
public boolean isUrgent() {
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why turn this into an urgent operation? Could it interfere with cluster operations (eg join ops)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have copied it over from CacheSegmentDestroyOperation. I removed it from both operations.

@hazelcast hazelcast deleted a comment from vbekiaris Mar 22, 2018
@mmedenjak
Copy link
Contributor Author

@vbekiaris thanks for the review. Addressed all comments and added test for retry logic.

Copy link
Collaborator

@vbekiaris vbekiaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

@mmedenjak
Copy link
Contributor Author

@ahmetmircik can you please finish the review? Is there anything that I need to address?

@ahmetmircik
Copy link
Member

@mmedenjak seems ok to me

When the partition is migrating, the operations might fail with a
PartitionMigratingException. We then need to retry the operation as
otherwise we are leaking memory. As for other partition operations,
we wait for the migration to complete, using the default try count and
wait count.

Fixes: https://github.com/hazelcast/hazelcast-enterprise/issues/930
Fixes: https://github.com/hazelcast/hazelcast-enterprise/issues/1933
@mmedenjak mmedenjak merged commit 9c10d20 into hazelcast:master Mar 27, 2018
@mmedenjak mmedenjak deleted the HiDensityCacheStatsTest-fix branch March 27, 2018 16:11
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020
@devOpsHazelcast
Copy link
Collaborator

Can one of the admins verify this patch?

2 similar comments
@devOpsHazelcast
Copy link
Collaborator

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Collaborator

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants