Data race in Mutex.withLock #3250

shmuelr · 2022-04-13T13:58:56Z

Hey there,

Similar to #2660, our internal Java TSAN tests in Google picked up a race in Mutex.kt

WARNING: ThreadSanitizer: data race (pid=8554)
  Write of size 4 at 0x0000cd702f28 by thread T36:
    #0 kotlinx.coroutines.sync.MutexImpl.unlock(Ljava/lang/Object;)V Mutex.kt:341
    #1 [redacted]
    #2 [redacted]
    #3 kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(Ljava/lang/Object;)V ContinuationImpl.kt:33 
    #4 kotlinx.coroutines.DispatchedTask.run()V DispatchedTask.kt:106 

Previous read of size 4 at 0x0000cd702f28 by thread T37:
    #0 kotlinx.coroutines.sync.MutexImpl.tryLock(Ljava/lang/Object;)Z Mutex.kt:173 
    #1 kotlinx.coroutines.sync.MutexImpl.lock(Ljava/lang/Object;Lkotlin/coroutines/Continuation;)Ljava/lang/Object; Mutex.kt:184
    #2 [redacted]
    #3 [redacted]
    #4 kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(Ljava/lang/Object;)V ContinuationImpl.kt:33 
    #5 kotlinx.coroutines.DispatchedTask.run()V DispatchedTask.kt:106

This surfaced in a bit of code that is using mutex.withLock to ensure thread safety in a caching function.

It appears that state.owner is written to at Mutex.kt:341, and read from Mutex.kt:173, potentially from different threads, is this something that we should be concerned about?

Thanks!

The text was updated successfully, but these errors were encountered:

dkhalanskyjb · 2022-04-14T08:05:01Z

https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-core/common/src/sync/Mutex.kt#L173 must be there for this behavior:
https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-core/common/src/sync/Mutex.kt#L38
So, the data race looks benign: if either before https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-core/common/src/sync/Mutex.kt#L341 or after that line state.owner is the same as the owner of a tryLock, then it is correct for tryLock to throw.

ndkoval · 2022-04-18T13:40:13Z

@dkhalanskyjb is correct if the memory model is sequentially consistent. Otherwise, a tryAcquire or holdLock, invoked concurrently with unlock, can observe (line 173 or 316) the updated owner (in line 341) followed by reading the old one on the next attempt (before the update in line 341), as the corresponding field in the LockedQueue class is not volatile. @shmuelr, thank you for detecting the race!

The behavior never occurs on x86, and the race affects the implementation only when the Mutex is used incorrectly, without breaking its internal structure. The issue will be fixed soon with a new Mutex implementation in #3020.

…ntly Fixes #3250

…ntly (Kotlin#3286) Fixes Kotlin#3250

qwwdfsad added the bug label May 16, 2022

qwwdfsad added a commit that referenced this issue May 16, 2022

Fix data-race in Mutex owner when mutex is locked/released inconsiste…

f1c64ff

…ntly Fixes #3250

qwwdfsad mentioned this issue May 16, 2022

Fix data-race in Mutex owner when mutex is locked/released inconsiste… #3286

Merged

qwwdfsad closed this as completed in 0fe8f92 May 27, 2022

pablobaxter pushed a commit to pablobaxter/kotlinx.coroutines that referenced this issue Sep 14, 2022

Fix data-race in Mutex owner when mutex is locked/released inconsiste…

fb726eb

…ntly (Kotlin#3286) Fixes Kotlin#3250

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race in Mutex.withLock #3250

Data race in Mutex.withLock #3250

shmuelr commented Apr 13, 2022 •

edited

dkhalanskyjb commented Apr 14, 2022

ndkoval commented Apr 18, 2022 •

edited

Data race in Mutex.withLock #3250

Data race in Mutex.withLock #3250

Comments

shmuelr commented Apr 13, 2022 • edited

dkhalanskyjb commented Apr 14, 2022

ndkoval commented Apr 18, 2022 • edited

shmuelr commented Apr 13, 2022 •

edited

ndkoval commented Apr 18, 2022 •

edited