Deadlock inside scalatest async tests #3848

catap · 2024-03-22T21:42:14Z

I use scalatest/scalatest#2318 to run my test suite for ~1.5k tests with majority of them are async.

If I run it as:

Global / concurrentRestrictions := Seq(
  Tags.limit(Tags.Test, 1)
)

it usually pass, and deadlock is reproduced in one out of ten runs. Anyway, without it, almost each run stoped with this deadlock.

It doesn't consume CPU and when I've attached by gdb to test program it stuck like this:

(gdb) thread apply all bt

Thread 2 (thread 117065):
#0  _thread_sys_recvfrom () at /tmp/-:2
#1  0x352c99796082b033 in ?? ()
#2  0x00000064e1096172 in _libc_recvfrom_cancel (fd=Unhandled dwarf expression opcode 0xa3
) at /usr/src/lib/libc/sys/w_recvfrom.c:28
#3  0x00000062a1e2a786 in _SM32java.net.AbstractPlainSocketImplD4readLAb_iiiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#4  0x00000062a1e31620 in _SM26java.net.SocketInputStreamD4readLAb_iiiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#5  0x00000062a20b0e8f in _SM23java.io.DataInputStreamD14rebufferImpl$1iiiEPT23java.io.DataInputStream ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#6  0x00000062a20b1034 in _SM23java.io.DataInputStreamD7readIntiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#7  0x00000062a20f4bac in _SM41scala.scalanative.testinterface.NativeRPCD4loopiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#8  0x00000062a20f6c2f in _SM41scala.scalanative.testinterface.TestMain$D4mainLAL16java.lang.String_uEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#9  0x00000062a20f6825 in _SM40scala.scalanative.testinterface.TestMainD4mainLAL16java.lang.String_uEo ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#10 0x00000062a1e5ed66 in main () from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test

Thread 1 (thread 301394):
#0  futex () at /tmp/-:2
#1  0x757a9499a3f94aa3 in ?? ()
#2  0x00000064e10dc42a in _rthread_cond_timedwait (cond=0x6525557af0, mutexp=0x65d9375338, abs=0x64f20ddda0) at /usr/src/lib/libc/thread/rthread_cond.c:102
#3  0x00000062a211db98 in _SM26java.lang.impl.PosixThreadD4parkjzuEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#4  0x00000062a1e458d7 in _SM33java.util.concurrent.ForkJoinPoolD44java$util$concurrent$ForkJoinPool$$awaitWorkL43java.util.concurrent.ForkJoinPool$WorkQueueiEO () from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#5  0x00000062a1e47f15 in _SM41java.util.concurrent.ForkJoinWorkerThreadD3runuEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#6  0x00000062a1eb5839 in _SM39scala.scalanative.runtime.NativeThread$D16threadEntryPointL38scala.scalanative.runtime.NativeThreaduEPT39scala.scalanative.runtime.NativeThread$ () from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#7  0x00000062a1eb5b2b in _SM39scala.scalanative.runtime.NativeThread$D24$anonfun$threadRoutine$1L28scala.scalanative.unsafe.PtrL28scala.scalanative.unsafe.PtrEPT39scala.scalanative.runtime.NativeThread$ () from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#8  0x00000062a1eb5b98 in _SM49scala.scalanative.runtime.NativeThread$$$Lambda$1G17$extern$forwarder ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#9  0x00000062a222e1c8 in ProxyThreadStartRoutine () from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#10 0x0000006511df7f42 in _rthread_start (v=Unhandled dwarf expression opcode 0xa3
) at /usr/src/lib/librthread/rthread.c:96
#11 0x00000064e10aff4a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:87
#12 0x00000064e10aff4a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:87
Previous frame identical to this frame (corrupt stack?)
(gdb)

The text was updated successfully, but these errors were encountered:

LeeTibbert · 2024-03-24T15:04:34Z

@catap,

I am sorry that you are experiencing this hang. I know from experience that hangs inside a large
test set can be frustrating.

I am off-line for most of the next two months, but let's see if we can make some progress.

I am guessing that the operating system is OpenBSD? 64bit?

You may have way more experience chasing network and thread problems than I,
so let`s put our heads together.

At the root of a decision tree, I think there are three primary possibilities. In my guestimate
order of probability:

_libc_recvfrom_cancel is legitimately waiting in the recvfrom portion for a message which will never come: i.e.
never sent or dropped message. SN did the transform from the read call to recvfrom.
_libc_recvfrom_cancel has been interrupted, possibly by the indicated or another
_rthread_cond_timedwait and is stuck inside the _cancel portion of _libc_recvfrom_cancel
The java.lang.impl.PosixThreadD4parkjzuEO framework is messed up.

I took a quick look at the OpenBSD $OpenBSD: w_recvfrom.c,v 1.1 2016/05/07 19:05:22 code.
and, due to macros in the code, could not distinguish cases 1 & 2. If we are indeed talking about
OpenBSD, I'd have to trace into the macros.

Something quick to try, worth a 1/2 hour time, not days.

Focusing on _rthread_cond_timedwait:

If the cond timeout value is small, say a few seconds or less, for debugging try increasing to minutes, say 5.
The wild thrash is to give reads time to complete if they are "ever" going to do so. This trial does
not entirely rule out the timer having fired and the signal(?) to recvfrom not arriving or causing a hang.
A decrease in apparent failure rate would give evidence that case 1 is happening.
If the cond timeout value is large, say days or months, try setting it down to, say, 5 times the time
in seconds you expect your computation-under-test to take. If case 1 above is happening, this should
increase the apparent failure rate but also give evidence that case 1 is happening.

Focusing on 'recvfrompart_libc_recvfrom_cancel`:

Try using the javalib Socket setSoTimeout(int timeout) method to set the os fd socket timeout to
a few multiples of your expected computation-under-test time. The argument is Java-style milliseconds
so one must be careful with seconds vs milliseconds.

This (debug) approach should put any 'timeout' handling down at the OS level and avoid any
userspace (quasi-) signal handling and/or C signal blocking and/or C signal insufficient-strength.
(I personally generally design to avoid C signals because of their ~~quirks~~ features. I am told
that on some operating systems, one must enable socket interruptibility-via-signals and/or
avoid some other thread adversely manipulating the signal mask).

Is the intent that this code run on FreeBSD? macOS? Linux?

I am trying to avoid suggesting code changes until we have more evidence pointing to a culprit.
IIUC, your interest is in the underlying crypto code and not the socket I/O and or test framework
part and that you are kindly reporting what appears to be sand in the gears of SN.

Complexity I am trying to avoid

In describing this, I do not mean to offend either you or the code under discussion.

Having one thread cancel I/O in another is a well know problem. Very few people get it right and,
I must confess, I am not one of those people.

The usual solution is to have the read thread do its own timeout using kqueue() EVFILT_TIMER on
*BSD & macOS and epoll with a timerfd on Linux (yes os specific code). Setting SO_Timeout on the
os socket, as suggested for debug above, is a more historical approach. kqueue() allows the
application to attempt some recovery: more flexibility; more complexity.

javalib/src/main/scala/java/lang/process/UnixProcessGen2.scala gives a somewhat close
example for process, instead of timer. Sorry the code is ugly/inefficient but effective.
A timedRead implementation would have to kqueue for the os socket id (fetched from the Java Socket)
using EVFILT_TIMER.

catap · 2024-03-24T15:19:50Z

@LeeTibbert, thanks for reply.

On Sun, 24 Mar 2024 16:04:55 +0100, LeeTibbert wrote: I am guessing that the operating system is OpenBSD? 64bit?

Yes, it is 64bit OpenBSD.

You may have _way_ more experience chasing network and thread problems than I, so let`s put our heads together. At the root of a decision tree, I think there are three primary possibilities. In my guestimate order of probability: 1) `_libc_recvfrom_cancel` is legitimately waiting in the `recvfrom` portion for a message which will never come: i.e. never sent or dropped message. SN did the transform from the `read` call to `recvfrom`.

agreed

2) `_libc_recvfrom_cancel` has been interrupted, possibly by the indicated or another `_rthread_cond_timedwait` and is stuck inside the `_cancel` portion of `_libc_recvfrom_cancel`

I see one other reason to that: attaching by gdb

3) The `java.lang.impl.PosixThreadD4parkjzuEO` framework is messed up.

agreed

I took a quick look at the OpenBSD `$OpenBSD: w_recvfrom.c,v 1.1 2016/05/07 19:05:22 ` code. and, due to macros in the code, could not distinguish cases 1 & 2. If we are indeed talking about OpenBSD, I'd have to trace into the macros.

The right move is run test program via ktrace and see kdump. It should show things like network writes and signals, but not sure about thread interupts. The simplest way is run it with sbt which leads to huge dump output which might not be easy to read and investigate.

Something quick to try, worth a 1/2 hour time, not days. Focusing on `_rthread_cond_timedwait`: 1) If the `cond` timeout value is small, say a few seconds or less, for debugging try increasing to minutes, say 5. The wild thrash is to give reads time to complete if they are "ever" going to do so. This trial does not entirely rule out the timer having fired and the signal(?) to `recvfrom` not arriving or causing a hang. A decrease in apparent failure rate would give evidence that case 1 is happening. 2) If the `cond` timeout value is large, say days or months, try setting it down to, say, 5 times the time in seconds you expect your computation-under-test to take. If case 1 above is happening, this should increase the apparent failure rate but also give evidence that case 1 is happening. Focusing on 'recvfrom` part `_libc_recvfrom_cancel`: 3) Try using the javalib Socket `setSoTimeout(int timeout)` method to set the os fd socket timeout to a few multiples of your expected computation-under-test time. The argument is Java-style milliseconds so one must be careful with seconds vs milliseconds. This (debug) approach should put any 'timeout' handling down at the OS level and avoid any userspace (quasi-) signal handling and/or C signal blocking and/or C signal insufficient-strength. (I personally generally design to avoid C signals because of their ~~quirks~~ features. I am told that on some operating systems, one must enable socket interruptibility-via-signals and/or avoid some other thread adversely manipulating the signal mask). <hr \hr> Is the intent that this code run on FreeBSD? macOS? Linux? I am trying to avoid suggesting code changes until we have more evidence pointing to a culprit. IIUC, your interest is in the underlying crypto code and not the socket I/O and or test framework part and that you are kindly reporting what appears to be sand in the gears of SN.

This code uses in-house libuv wraper, which leads to segfault, I'm still trying to catch it, and until it isn't stable enough I can't roll out possibility some side effects. From antoher hand I've commented the only test which uses that part, probably it doesn't used at all. From another hand, if you willing to help dig it out, I may share the code with promise to not share it yet (feel free to contact me via email which I use for commits).

###### Complexity I am trying to avoid In describing this, I do not mean to offend either you or the code under discussion. Having one thread cancel I/O in another is a well know problem. Very few people get it right and, I must confess, I am not one of those people. The usual solution is to have the `read` thread do its own timeout using ` kqueue() EVFILT_TIMER` on *BSD & macOS and `epoll` with a `timerfd` on Linux (yes os specific code). Setting SO_Timeout on the os socket, as suggested for debug above, is a more historical approach. `kqueue()` allows the application to attempt some recovery: more flexibility; more complexity. `javalib/src/main/scala/java/lang/process/UnixProcessGen2.scala` gives a somewhat close example for process, instead of timer. Sorry the code is ugly/inefficient but effective. A `timedRead` implementation would have to `kqueue` for the os socket id (fetched from the Java Socket) using EVFILT_TIMER.

Yeah, I saw that code when I've porting SN to OpenBSD, and switching to UPG2 leads to broken tests. I've made a trivial shell to dig into it: https://github.com/catap/shell which I plan to reuse to add CI to SN. Frankly speaking I really think that adding BSD to CI is really good move now.

…

-- wbr, Kirill

WojciechMazur · 2024-03-24T16:34:26Z

Have you maybe observed it also on 0.5.0-RC1? In the RC2 we've added #3827 which retries the read upon interruption. It was added, becouse attaching to the test-runner was leading the stopping it's execution - it was throwing unhandled SocketException, but I don't see how we might have received an interrupt when running tests.

The java.lang.impl.PosixThreadD4parkjzuEO framework is messed up.

I don't think that might be the case. ForkJoinPool WorkerThreads are daemons and can scale down to if unused for longer time. The 1 thread we've seen in stack dump is probably just the last alive worker waiting for tasks

I'd probably bet on the missing message on either JVM or test-runner side

I'm not sure if might be related, but just right now Windows CI has failed due to timeout out on Windows, probably by being stuck in the same way. https://github.com/scala-native/scala-native/actions/runs/8410284069/job/23028532861?pr=3849
What's really interesting is this line with java.io.IOException: Could not send the packet to the client. You can check if maybe there is some similiar exception in the outputs of your tests.

catap · 2024-03-25T15:15:21Z

On Sun, 24 Mar 2024 17:34:49 +0100, Wojciech Mazur wrote: Have you maybe observed it also on 0.5.0-RC1? In the RC2 we've added #3827 which retries the `read` upon interruption. It was added, becouse attaching to the test-runner was leading the stopping it's execution - it was throwing unhandled SocketException, but I don't see how we might have received an interrupt when running tests.

Nope, but I'd liek to take future your idea with stabilisation of scalatests' tests. It seems the right move, and I'll dig into it this week.

> The java.lang.impl.PosixThreadD4parkjzuEO framework is messed up. I don't think that might be the case. `ForkJoinPool` WorkerThreads are daemons and can scale down to if unused for longer time. The 1 thread we've seen in stack dump is probably just the last alive worker waiting for tasks I'd probably bet on the missing message on either JVM or test-runner side

Which seems quite unlitkley, but who knows.

…

-- wbr, Kirill

catap · 2024-03-28T23:13:14Z

And I have a pice of strange news. I have achieved the same deadlock without scaltests:

Program received signal SIGINT, Interrupt.
[Switching to thread 602537]
kevent () at /tmp/-:2
2       /tmp/-: No such file or directory.
        in /tmp/-
(gdb) thread apply all bt

Thread 5 (thread 238385):
#0  futex () at /tmp/-:2
#1  0x343eaad106b9bfd2 in ?? ()
#2  0x00000898c047151a in _rthread_cond_timedwait (cond=0x898ab5b36c0, mutexp=0x899e2c55eb8, abs=0x0) at /usr/src/lib/libc/thread/rthread_cond.c:102
#3  0x00000896a511a896 in _SM26java.lang.impl.PosixThreadD4parkjzuEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#4  0x00000896a5000b27 in _SM33java.util.concurrent.ForkJoinPoolD9awaitWorkL43java.util.concurrent.ForkJoinPool$WorkQueueiEPT33java.util.concurrent.ForkJoinPool () from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#5  0x00000896a5002845 in _SM41java.util.concurrent.ForkJoinWorkerThreadD3runuEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#6  0x00000896a5071809 in _SM39scala.scalanative.runtime.NativeThread$D16threadEntryPointL38scala.scalanative.runtime.NativeThreaduEPT39scala.scalanative.runtime.NativeThread$ () from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#7  0x00000896a5071abb in _SM39scala.scalanative.runtime.NativeThread$D24threadRoutine$$anonfun$1L28scala.scalanative.unsafe.PtrL28scala.scalanative.unsafe.PtrEPT39scala.scalanative.runtime.NativeThread$ () from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#8  0x00000896a5071b28 in _SM49scala.scalanative.runtime.NativeThread$$$Lambda$1G17$extern$forwarder ()
   from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#9  0x00000896a51b6028 in ProxyThreadStartRoutine () from /home/catap/Projects/Kcrypt/cryptosystem/sandbox/native/target/scala-3.4.0/sandbox
#10 0x00000899a3997142 in _rthread_start (v=Unhandled dwarf expression opcode 0xa3
) at /usr/src/lib/librthread/rthread.c:96
#11 0x00000898c043082a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:87
#12 0x00000898c043082a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:87
Previous frame identical to this frame (corrupt stack?)
2       in /tmp/-
(gdb)

catap · 2024-03-31T18:12:51Z

@WojciechMazur I can reproduce the deadlock by using scalanative from a06f46d, but now I do have only one thread left:

Thread 1 (thread 483794):
#0  _thread_sys_recvfrom () at /tmp/-:2
#1  0x6aef61975ac7756d in ?? ()
#2  0x00000637bdb71872 in _libc_recvfrom_cancel (fd=Unhandled dwarf expression opcode 0xa3
) at /usr/src/lib/libc/sys/w_recvfrom.c:28
#3  0x00000635b6ff9336 in _SM32java.net.AbstractPlainSocketImplD4readLAb_iiiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#4  0x00000635b70001d0 in _SM26java.net.SocketInputStreamD4readLAb_iiiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#5  0x00000635b727f4cf in _SM23java.io.DataInputStreamD14rebufferImpl$1iiiEPT23java.io.DataInputStream ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#6  0x00000635b727f674 in _SM23java.io.DataInputStreamD7readIntiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#7  0x00000635b72c31bb in _SM41scala.scalanative.testinterface.NativeRPCD4loopiEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#8  0x00000635b72c52bf in _SM41scala.scalanative.testinterface.TestMain$D4mainLAL16java.lang.String_uEO ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#9  0x00000635b72c4eb5 in _SM40scala.scalanative.testinterface.TestMainD4mainLAL16java.lang.String_uEo ()
   from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
#10 0x00000635b702e1ab in main () from /home/catap/Projects/Kcrypt/cryptosystem/native/target/scala-2.12/cryptosystem-test
(gdb)

catap · 2024-03-31T21:08:13Z

I've tried to revert #3827 and deadlock still here.

catap · 2024-04-01T11:52:42Z

Interesting observation with:

  .nativeSettings(nativeConfig ~= { c =>
    import scala.scalanative.build._
    c.withMode(Mode.releaseFast)
  })

deadlock is quite difficult to reproduce in my small test case, but without this block, each attempt leads to deadlock.

catap · 2024-04-01T21:25:46Z

Extracted somehow related issue to dedicated ticket: #3859

catap · 2024-04-03T22:56:47Z

Future investigation.

From ktrace/kdump point of view I see the last syscalls:

 97560 cryptosystem-test 0.000026 RET   thrkill 0
 97560 cryptosystem-test 0.000006 CALL  sigprocmask(SIG_UNBLOCK,0<>)
 97560 cryptosystem-test 0.000003 RET   sigprocmask 0x40004<SIGQUIT|SIGCONT>
 97560 cryptosystem-test 0.001741 CALL  sendto(3,0x36288a61848,0x4,0x400<MSG_NOSIGNAL>,0,0)
 97560 cryptosystem-test 0.000038 GIO   fd 3 wrote 4 bytes
       "\0\0\0""
 97560 cryptosystem-test 0.000009 RET   sendto 4
 97560 cryptosystem-test 0.000016 CALL  sendto(3,0x36288bf94f8,0x44,0x400<MSG_NOSIGNAL>,0,0)
 97560 cryptosystem-test 0.000021 GIO   fd 3 wrote 68 bytes
       "\0\a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^W\0\^[\0[\0003\0002\0m\0-\0 \0w\0h\0e\0n\0 \0o\0b\0j\0e\0c\0t\0s\0\^[\0[\0000\0m"
 97560 cryptosystem-test 0.000005 RET   sendto 68/0x44
 97560 cryptosystem-test 0.000035 CALL  sendto(3,0x36288a61848,0x4,0x400<MSG_NOSIGNAL>,0,0)
 97560 cryptosystem-test 0.000015 GIO   fd 3 wrote 4 bytes
       "\0\0\0n"
 97560 cryptosystem-test 0.000010 RET   sendto 4
 97560 cryptosystem-test 0.000010 CALL  sendto(3,0x36288bfa428,0xdc,0x400<MSG_NOSIGNAL>,0,0)
 97560 cryptosystem-test 0.000040 GIO   fd 3 wrote 220 bytes
       "\0\^D\0\0\0\0\0\0\0\0\0\0\0&\0p\0t\0.\0k\0c\0r\0y\0.\0c\0r\0y\0p\0t\0o\0s\0y\0s\0t\0e\0m\0.\0f\0e\0a\0t\0u\0r\0e\0s\0.\0D\0a\0t\0a\0T\0e\0s\0t\0\^B\0\0\0\0\0\^S\0o\0r\0g\0.\0s\0c\0a\0l\0a\0t\0e\0s\0t\0.\0S\0u\0i\0t\0e\0\^A\0\^B\0\0\0\^Y\0T\0o\0o\0 \0b\0i\0g\
        \0 \0d\0a\0t\0a\0 \0w\0h\0e\0n\0 \0o\0b\0j\0e\0c\0t\0s\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0k"
 97560 cryptosystem-test 0.000009 RET   sendto 220/0xdc
 97560 cryptosystem-test 0.000146 CALL  futex(0x36236219010,0x82<FUTEX_WAKE|FUTEX_PRIVATE_FLAG>,1,0,0)
 97560 cryptosystem-test 0.000013 RET   futex 1
 97560 cryptosystem-test 0.000023 RET   futex 0
 97560 cryptosystem-test 0.000011 CALL  sendto(3,0x36288a61848,0x4,0x400<MSG_NOSIGNAL>,0,0)
 97560 cryptosystem-test 0.000017 CALL  futex(0x36236219010,0x81<FUTEX_WAIT|FUTEX_PRIVATE_FLAG>,3,0,0)
 97560 cryptosystem-test 0.000012 GIO   fd 3 wrote 4 bytes
       "\0\0\0\r"
 97560 cryptosystem-test 0.000009 RET   sendto 4
 97560 cryptosystem-test 0.000011 CALL  sendto(3,0x36288bfd458,0x1a,0x400<MSG_NOSIGNAL>,0,0)
 97560 cryptosystem-test 0.000014 GIO   fd 3 wrote 26 bytes
       "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\0\0\0\0\0\0\0"
 97560 cryptosystem-test 0.000009 RET   sendto 26/0x1a
 97560 cryptosystem-test 0.000015 CALL  futex(0x3619a93d2e0,0x81<FUTEX_WAIT|FUTEX_PRIVATE_FLAG>,3,0x3618282e4f0,0)
 97560 cryptosystem-test 0.000009 STRU  struct timespec { 59.999714661 }
 97560 cryptosystem-test 60.006360 RET   futex -1 errno 60 Operation timed out
 97560 cryptosystem-test 0.000144 CALL  futex(0x36236219010,0x82<FUTEX_WAKE|FUTEX_PRIVATE_FLAG>,1,0,0)
 97560 cryptosystem-test 0.000010 RET   futex 1
 97560 cryptosystem-test 0.000004 RET   futex 0
 97560 cryptosystem-test 0.000018 CALL  kbind(0x3618282e558,24,0x30072d99c45383c)
 97560 cryptosystem-test 0.000040 CALL  futex(0x36236219d20,0x81<FUTEX_WAIT|FUTEX_PRIVATE_FLAG>,0,0x361a80cedf0,0)
 97560 cryptosystem-test 0.000006 RET   kbind 0
 97560 cryptosystem-test 0.000004 STRU  struct timespec { 59.999147605 }
 97560 cryptosystem-test 0.000008 CALL  kbind(0x3618282e558,24,0x30072d99c45383c)
 97560 cryptosystem-test 0.000061 RET   kbind 0
 97560 cryptosystem-test 0.000012 CALL  kbind(0x3618282e608,24,0x30072d99c45383c)
 97560 cryptosystem-test 0.000017 RET   kbind 0
 97560 cryptosystem-test 0.000013 CALL  __threxit(0x36193602830)
 97560 cryptosystem-test 60.009837 RET   futex -1 errno 60 Operation timed out
 97560 cryptosystem-test 0.000048 CALL  __threxit(0x36193602030)
 97560 cryptosystem-test 188.891690 PSIG  SIGINT SIG_DFL

where 3rd column is releative time since the previus event.

The last signal was from my C-c to abort the test.

catap mentioned this issue Mar 24, 2024

Support scala-natve 0.5.0 scalatest/scalatest#2318

Closed

WojciechMazur added the bug label Mar 24, 2024

WojciechMazur mentioned this issue Mar 28, 2024

fix: Prevent deadlocks when mixing threads with queue based execution context #3852

Merged

catap mentioned this issue Apr 3, 2024

fix: Deadlocks in singlethreaded execution context #3860

Merged

catap mentioned this issue Apr 6, 2024

Multithreading brokes memory? #3846

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock inside scalatest async tests #3848

Deadlock inside scalatest async tests #3848

catap commented Mar 22, 2024

LeeTibbert commented Mar 24, 2024

catap commented Mar 24, 2024 via email

WojciechMazur commented Mar 24, 2024 •

edited

catap commented Mar 25, 2024 via email

catap commented Mar 28, 2024

catap commented Mar 31, 2024 •

edited

catap commented Mar 31, 2024

catap commented Apr 1, 2024

catap commented Apr 1, 2024

catap commented Apr 3, 2024

Deadlock inside scalatest async tests #3848

Deadlock inside scalatest async tests #3848

Comments

catap commented Mar 22, 2024

LeeTibbert commented Mar 24, 2024

Complexity I am trying to avoid

catap commented Mar 24, 2024 via email

WojciechMazur commented Mar 24, 2024 • edited

catap commented Mar 25, 2024 via email

catap commented Mar 28, 2024

catap commented Mar 31, 2024 • edited

catap commented Mar 31, 2024

catap commented Apr 1, 2024

catap commented Apr 1, 2024

catap commented Apr 3, 2024

WojciechMazur commented Mar 24, 2024 •

edited

catap commented Mar 31, 2024 •

edited