Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A thread can stop execution without dropping its packet #124466

Open
fuzzypixelz opened this issue Apr 28, 2024 · 1 comment
Open

A thread can stop execution without dropping its packet #124466

fuzzypixelz opened this issue Apr 28, 2024 · 1 comment
Assignees
Labels
C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@fuzzypixelz
Copy link
Contributor

fuzzypixelz commented Apr 28, 2024

Context

A thread holds a reference to a Packet object which it uses to send its result back to thread which holds its JoinHandle.

It's possible for the thread to enter a signaled state and stop its execution without reaching the point where is drops its packet (or return a result at all).

On the other hand, the implementation of JoinHandle::join is the following:

// https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1586
impl<'scope, T> JoinInner<'scope, T> {
    fn join(mut self) -> Result<T> {
        self.native.join();
        Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
    }
}

The first .unwrap() assumes that the thread stopped execution normally, which leads to cryptic panic errors it eventually fails.

How to reproduce this?

One way of reproducing this is to have a Windows DLL with a libc::atexit handler that joins thread created in the main application. If the application exits before the thread finishes execution, the there is a high change the above .unwrap() fails. Otherwise no panic will be observed:

// lib.rs (cdylib)
pub static HANDLE: Mutex<OnceLock<JoinHandle<()>>> = Mutex::new(OnceLock::new());

#[no_mangle]
pub extern "C" fn init(n: libc::c_int) {
    let handle = thread::spawn(move || thread::sleep(Duration::from_millis(n as u64)));
    HANDLE.lock().unwrap().set(handle).unwrap();

    unsafe { libc::atexit(handler) };
}

extern "C" fn handler() {
    let handle = HANDLE.lock().unwrap().take().unwrap();
    handle.join().unwrap();
}
// test.c
#include <stdlib.h>
#include <Windows.h>

extern void init(int);

int main(const int argc, const char *argv[argc])
{
    init(atoi(argv[1]));
    Sleep(atoi(argv[2]));
    return 0;
}

Compile the above test.c and link to the lib.rs DLL, then run the following:

RUST_BACKTRACE=1 ./test 1000 100 || echo "info: test with 1s sleep in main and 0.1s sleep in thread failed"
./test 100 1000 && echo "info: test with 0.1s sleep in main and 1s sleep in thread succeeded"
Unfold output of the above script
thread '<unnamed>' panicked at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src\thread\mod.rs:1517:40:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panicking.rs:647
   1: core::panicking::panic_fmt
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\panicking.rs:72
   2: core::panicking::panic
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\panicking.rs:144
   3: core::option::unwrap_failed
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\option.rs:1978
   4: std::thread::JoinInner<tuple$<> >::join<tuple$<> >
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src\thread\mod.rs:1517
   5: std::thread::JoinHandle<tuple$<> >::join
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src\thread\mod.rs:1650
   6: rust_thread_race_condition::handler
             at .\src\lib.rs:19
   7: execute_onexit_table
   8: execute_onexit_table
   9: execute_onexit_table
  10: dllmain_crt_process_detach
             at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  11: dllmain_dispatch
             at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  12: RtlActivateActivationContextUnsafeFast
  13: LdrShutdownProcess
  14: RtlExitUserProcess
  15: ExitProcess
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: BaseThreadInitThunk
  20: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
info: test with 1s sleep in main and 0.1s sleep in thread failed
info: test with 0.1s sleep in main and 1s sleep in thread succeeded

This example is available in https://github.com/fuzzypixelz/rust-thread-race-condition. There is a simple workflow that can reproduce the above output on windows-2022 runners.

What to do about it?

I think a low hanging fruit here is replacing the .unwrap() with a terse .expect(...) error message.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 28, 2024
@fuzzypixelz
Copy link
Contributor Author

@rustbot claim

@saethlin saethlin added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 9, 2024
@jieyouxu jieyouxu added C-bug Category: This is a bug. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants