Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlock operations get timeout exception when member dies #15328

Closed
frapana opened this issue Jul 18, 2019 · 2 comments
Closed

Unlock operations get timeout exception when member dies #15328

frapana opened this issue Jul 18, 2019 · 2 comments

Comments

@frapana
Copy link

frapana commented Jul 18, 2019

Like #13551, when a client tries to release a lock that is hold by an unresponsive member, it gets an OperationTimeoutException and the lock is not released.

Example (hazelcast.operation.call.timeout.millis was set to 300000, HZ 3.11)

com.hazelcast.core.OperationTimeoutException: UnlockOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2019-07-11 19:02:48.049. Start time: 2019-07-11 18:52:48.046. Total elapsed time: 600004 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2019-07-11 18:52:01.505. Invocation{op=com.hazelcast.concurrent.lock.operations.UnlockOperation{serviceName='hz:impl:lockService', identityHash=2128122754, partitionId=149, replicaIndex=0, callId=-390858, invocationTime=1562871168046 (2019-07-11 18:52:48.046), waitTimeout=-1, callTimeout=300000, namespace=InternalLockNamespace{service='hz:impl:lockService', objectName=triggerAwakeJobEvent}, threadId=1903}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=300000, firstInvocationTimeMs=1562871168046, firstInvocationTime='2019-07-11 18:52:48.046', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[10.232.4.135]:56935, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=9, /10.232.4.134:56935->/10.232.4.135:54348, endpoint=[10.232.4.135]:56935, alive=true, type=MEMBER]}|

@mmedenjak
Copy link
Contributor

Hi @frapana !

True, if the member running the operation isn't responsive, we can only log that the operation is unable to be completed. You might want to try and see why the member is unresponsive by profiling and monitoring it. Or, if you expect such pauses, you might want to increase the heartbeat timeout by increasing the com.hazelcast.spi.properties.GroupProperty#OPERATION_CALL_TIMEOUT_MILLIS.

On this note, with Hazelcast 4.0, we have replaced the entire implementation of ILock with the unsafe mode of CP subsystem (https://docs.hazelcast.org/docs/latest-dev/manual/html-single/#removal-of-deprecated-concurrency-api-implementations). If you don't require strong consistency guarantees, that mode might fit your use case and solve the issue. If you do require strong consistency guarantees, you definitely might want to try out using the CP subsystem by turning off unsafe mode and running the appropriate number of members. Can you try it out?

@mmedenjak
Copy link
Contributor

Closing as this issue is related to the discontinued lock implementation. Please try out Hazelcast 4.0 and the new lock implementation as it may solve your issue. In case it doesn't, please reopen this or open a new issue. Happy Hazelcasting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants