Restore shutdown sequence & offload replica sync #20883

vbekiaris · 2022-03-04T09:16:41Z

Restore shutdown sequence & offload replica sync

PartitionReplicaSyncRequestOffloadable would block the priority
generic op thread while waiting for merkle tree comparison to occur,
leading to deadlocks.

NodeExtension#shutdown should be called after graceful-shutdown-aware
services are already shutdown. Otherwise persistence is shut down
before data services, resulting in exceptions during migrations

Foward-port of #20813 to main branch

Restore shutdown sequence & offload replica sync PartitionReplicaSyncRequestOffloadable would block the priority generic op thread while waiting for merkle tree comparison to occur, leading to deadlocks. NodeExtension#shutdown should be called after graceful-shutdown-aware services are already shutdown. Otherwise persistence is shut down before data services, resulting in exceptions during migrations

ahmetmircik · 2022-03-04T09:30:14Z

private static final boolean ALLOW_OFFLOAD =
            Boolean.parseBoolean(System.getProperty(PARTITION_REPLICA_ALLOW_OFFLOAD, "true"));

@vbekiaris When ALLOW_OFFLOAD is false, there must be no usage of merkle tree, otherwise deadlock is inevitable.
Is my understanding correct here? If yes, in what ways we can improve user experience? Can we remove PARTITION_REPLICA_ALLOW_OFFLOAD property or can we fail fast when it is false and used with merkle tree? WDYT?

vbekiaris · 2022-03-04T09:38:22Z

Can we remove PARTITION_REPLICA_ALLOW_OFFLOAD property or can we fail fast when it is false and used with merkle tree?

When ALLOW_OFFLOAD is false, we should not use merkle trees for partition replication purposes (migrations & anti-entropy).
But merkle trees can still be useful for WAN sync, so ALLOW_OFFLOAD==false and merkle-trees enabled on some IMaps/ICaches is still a valid configuration.
I think the property is useful as a kill-switch in case we find issues and it is desirable to switch to pre-5.0 replication behaviour.

ufukyilmaz · 2022-03-04T09:49:24Z

When ALLOW_OFFLOAD is false, there must be no usage of merkle tree, otherwise deadlock is inevitable.

In the nonoffloaded case, this merkle tree comparison will run on partition threads and blocking them may not be as critical as blocking priority generic/ generic threads. I think we cannot exactly say that the deadlock is inevitable in this case by only considering this issue.

In this offload enabled case, the thread which was expected to offload the partition sync task was running on priority/generic threads that is more prone to deadlock if we don't offload the main task. See that we set this offload tasks' partitionId to -1 here: https://github.com/hazelcast/hazelcast/pull/20883/files#diff-2ccd9b76b1a1019e2129104193e9b88e7cb7b239062a80587b429231b3c520c2R80-R81, it was resulting in blocking priority generic/generic threads.

ufukyilmaz

Thanks for the backports/forwardports.

ahmetmircik · 2022-03-04T10:19:29Z

When ALLOW_OFFLOAD is false, we should not use merkle trees for partition replication purposes (migrations & anti-entropy).

What about checking ALLOW_OFFLOAD before creating migration operations to decide which kind of migration mechanism we follow? This is not to fall into this known issue unexpectedly later. So if ALLOW_OFFLOAD is false, service will not use merkle tree comparison for migrations.

vbekiaris · 2022-03-04T10:43:40Z

run-lab-run

vbekiaris · 2022-03-04T11:32:51Z

What about checking ALLOW_OFFLOAD before creating migration operations to decide which kind of migration mechanism we follow? This is not to fall into this known issue unexpectedly later.

We already control whether we instantiate the offloadable or not partition replica sync response for anti-entropy mechanism using this property

hazelcast/hazelcast/src/main/java/com/hazelcast/internal/partition/impl/PartitionReplicaManager.java

Lines 249 to 251 in 750c504

    
           PartitionReplicaSyncRequest syncRequest = shouldOffload() 
        
                           ? new PartitionReplicaSyncRequestOffloadable(partitionId, namespaces, replicaIndex) 
        
                           : new PartitionReplicaSyncRequest(partitionId, namespaces, replicaIndex);

.

or maybe I misunderstood your suggestion?

ahmetmircik · 2022-03-04T11:49:58Z

I meant adding a new if here: https://github.com/hazelcast/hazelcast-enterprise/blob/463f8919c379882fcd3e6578041bcd9fce1bf34e/hazelcast-enterprise/src/main/java/com/hazelcast/map/impl/EnterpriseMapMigrationAwareService.java#L85

So when ALLOW_OFFLOAD is false, we directly call super. Maybe with a log message that we don't use merkle tree.

vbekiaris · 2022-03-04T12:02:19Z

I see, this is already covered here because we call super if prepareReplicationOperation is running on partition thread. I think this is a stronger guarantee that you won't arrive to a potential deadlock with merkle tree comparison on partition threads because:

if ALLOW_OFFLOAD is false, then the anti-entropy replication op is already running on partition thread -> no deadlock is possible
even if ALLOW_OFFLOAD is true (default) and we somehow end up preparing the replication op on partition thread (probably due to a bug?), we will still avoid deadlock because of the existing check.

ahmetmircik · 2022-03-04T12:13:56Z

@vbekiaris thanks for the explanation, now i see that case is already covered.

vbekiaris · 2022-03-04T12:32:59Z

thanks @ufukyilmaz & @ahmetmircik !

vbekiaris added Type: Defect Team: Core Source: Internal PR or issue was opened by an employee Module: Cluster labels Mar 4, 2022

vbekiaris added this to the 5.2 milestone Mar 4, 2022

vbekiaris self-assigned this Mar 4, 2022

vbekiaris requested a review from ufukyilmaz March 4, 2022 09:18

ufukyilmaz approved these changes Mar 4, 2022

View reviewed changes

vbekiaris merged commit f934d9b into hazelcast:master Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore shutdown sequence & offload replica sync #20883

Restore shutdown sequence & offload replica sync #20883

vbekiaris commented Mar 4, 2022

ahmetmircik commented Mar 4, 2022

vbekiaris commented Mar 4, 2022 •

edited

ufukyilmaz commented Mar 4, 2022 •

edited

ufukyilmaz left a comment

ahmetmircik commented Mar 4, 2022 •

edited

vbekiaris commented Mar 4, 2022

vbekiaris commented Mar 4, 2022

ahmetmircik commented Mar 4, 2022 •

edited

vbekiaris commented Mar 4, 2022

ahmetmircik commented Mar 4, 2022

vbekiaris commented Mar 4, 2022

Restore shutdown sequence & offload replica sync #20883

Restore shutdown sequence & offload replica sync #20883

Conversation

vbekiaris commented Mar 4, 2022

ahmetmircik commented Mar 4, 2022

vbekiaris commented Mar 4, 2022 • edited

ufukyilmaz commented Mar 4, 2022 • edited

ufukyilmaz left a comment

Choose a reason for hiding this comment

ahmetmircik commented Mar 4, 2022 • edited

vbekiaris commented Mar 4, 2022

vbekiaris commented Mar 4, 2022

ahmetmircik commented Mar 4, 2022 • edited

vbekiaris commented Mar 4, 2022

ahmetmircik commented Mar 4, 2022

vbekiaris commented Mar 4, 2022

vbekiaris commented Mar 4, 2022 •

edited

ufukyilmaz commented Mar 4, 2022 •

edited

ahmetmircik commented Mar 4, 2022 •

edited

ahmetmircik commented Mar 4, 2022 •

edited