Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in Mutex.withLock #3250

Closed
shmuelr opened this issue Apr 13, 2022 · 2 comments
Closed

Data race in Mutex.withLock #3250

shmuelr opened this issue Apr 13, 2022 · 2 comments
Labels

Comments

@shmuelr
Copy link
Contributor

shmuelr commented Apr 13, 2022

Hey there,

Similar to #2660, our internal Java TSAN tests in Google picked up a race in Mutex.kt

WARNING: ThreadSanitizer: data race (pid=8554)
  Write of size 4 at 0x0000cd702f28 by thread T36:
    #0 kotlinx.coroutines.sync.MutexImpl.unlock(Ljava/lang/Object;)V Mutex.kt:341
    #1 [redacted]
    #2 [redacted]
    #3 kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(Ljava/lang/Object;)V ContinuationImpl.kt:33 
    #4 kotlinx.coroutines.DispatchedTask.run()V DispatchedTask.kt:106 

Previous read of size 4 at 0x0000cd702f28 by thread T37:
    #0 kotlinx.coroutines.sync.MutexImpl.tryLock(Ljava/lang/Object;)Z Mutex.kt:173 
    #1 kotlinx.coroutines.sync.MutexImpl.lock(Ljava/lang/Object;Lkotlin/coroutines/Continuation;)Ljava/lang/Object; Mutex.kt:184
    #2 [redacted]
    #3 [redacted]
    #4 kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(Ljava/lang/Object;)V ContinuationImpl.kt:33 
    #5 kotlinx.coroutines.DispatchedTask.run()V DispatchedTask.kt:106 

This surfaced in a bit of code that is using mutex.withLock to ensure thread safety in a caching function.

It appears that state.owner is written to at Mutex.kt:341, and read from Mutex.kt:173, potentially from different threads, is this something that we should be concerned about?

Thanks!

@dkhalanskyjb
Copy link
Collaborator

@ndkoval
Copy link
Member

ndkoval commented Apr 18, 2022

@dkhalanskyjb is correct if the memory model is sequentially consistent. Otherwise, a tryAcquire or holdLock, invoked concurrently with unlock, can observe (line 173 or 316) the updated owner (in line 341) followed by reading the old one on the next attempt (before the update in line 341), as the corresponding field in the LockedQueue class is not volatile. @shmuelr, thank you for detecting the race!

The behavior never occurs on x86, and the race affects the implementation only when the Mutex is used incorrectly, without breaking its internal structure. The issue will be fixed soon with a new Mutex implementation in #3020.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants