Panic messages sometimes eaten in tests #1066

alexcrichton · 2023-06-27T03:22:40Z

This is a problem I've been encountering in the Wasmtime repository for quite some time now and I've only just now taken the time to minimize it and debug it. The problem I'm having is that often a cargo test run will fail for one reason or another but there's no actual logs to what went wrong. Not in the failures section printed by libtest nor interspersed throughout the output of the tests. The test log is 100% clean with no panic messages, yet one of the threads panicked. The odd behavior is that running a single test prints the output of the test and panic message as expected, so it hasn't been a huge issue overall.

I reduced this down to rayon, however, showcased by this example:

use rayon::prelude::*;

#[test]
fn foo() {
    [1, 2].par_iter().for_each(|_| {
        // ..
    });
}

#[test]
fn bar() {
    [1, 2].par_iter().for_each(|i| {
        if *i % 2 == 0 {
            panic!();
        }
    });
}

Most of the time the output looks like this:

$ cargo test -q
running 2 tests
.F
failures:

---- bar stdout ----
thread '<unnamed>' panicked at 'explicit panic', src/main.rs:14:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin wat`

but sometimes the output looks like this:

$ cargo test -q

running 2 tests
.F
failures:

failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin wat`

In this tiny example it's nondeterministic, but in the larger context of Wasmtime at least I've found that it almost never prints panic messages in some circumstances.

My best guess as to what's going on here is that somehow the foo test steals the work of the bar test. This work then panics and rayon transmits the result back to bar, but libtest/libstd's capturing of output means that when foo succeeds it swallows all the output and nothing ends up making its way to bar. I'm not sure if there's a way to confirm that this is happening though (or whether it's something else entirely)

My guess is that rayon probably can't do much about this, but I wanted to open up an issue anyway since it's been quite perplexing to me historically as to why CI runs for example fail but there's nothing in the logs.

The text was updated successfully, but these errors were encountered:

alexcrichton · 2023-06-27T03:30:21Z

Ok scratch my theory, an updated version of the test

use rayon::prelude::*;
use std::io::Write;

#[test]
fn foo() {
    write!(std::io::stdout(), "{:?}", std::thread::current().id()).unwrap();
    [1, 2].par_iter().for_each(|_| {
        // ..
    });
}

#[test]
fn bar() {
    println!("{:?}", std::thread::current().id());
    let obj = std::panic::catch_unwind(|| {
        [1, 2].par_iter().for_each(|i| {
            if *i % 2 == 0 {
                println!("{:?}", std::thread::current().id());
                std::panic::panic_any(std::thread::current().id());
            }
        });
    })
    .unwrap_err();
    panic!("{:?}", obj.downcast_ref::<std::thread::ThreadId>().unwrap());
}

can print on a good day:

running 2 tests
ThreadId(3).F
failures:

---- bar stdout ----
ThreadId(2)
ThreadId(15)
thread '<unnamed>' panicked at 'Box<dyn Any>', src/main.rs:19:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'bar' panicked at 'ThreadId(15)', src/main.rs:24:5


failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

and on a bad day:

running 2 tests
ThreadId(3).F
failures:

---- bar stdout ----
ThreadId(2)
thread 'bar' panicked at 'ThreadId(12)', src/main.rs:24:5


failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

(note the lack of second panic message in this "bad" output) showing that the thread is indeed panicking on what is presumably the rayon thread pool, so now I'm not actually sure how the output is being swallowed.

alexcrichton · 2023-06-27T03:57:44Z

Apologies for the live-debugging of this, but it appears I'm three years too late to the punch and it's libstd's inheriting of streams which is probably causing the problems. My guess is the successful test spawned a rayon worker thread which then ended up running the panicking work of the failing test, but because the stdout/stderr capture was hooked up to the successful test nothing got printed.

An example of this is:

use std::sync::{Condvar, Mutex};
use std::thread::JoinHandle;

static LOCK: Mutex<Option<JoinHandle<()>>> = Mutex::new(None);
static COND: Condvar = Condvar::new();

#[test]
fn foo() {
    *LOCK.lock().unwrap() = Some(std::thread::spawn(|| {
        panic!("this message will not be seen");
    }));
    COND.notify_one();
}

#[test]
fn bar() {
    let lock = LOCK.lock().unwrap();
    let mut lock = COND.wait_while(lock, |state| state.is_none()).unwrap();
    lock.take().unwrap().join().unwrap();
}

where the foo test starts a thread which is "used" by the bar test. Not exactly Rayon-like but gets the point across.

cuviper · 2023-06-30T17:27:38Z

My guess is that rayon probably can't do much about this,

Yeah, I don't know anything we could do about it. As a user, testing with --no-capture is probably the best bet to figure out what happened. If you have a #[should_panic] test that you know is going to span stderr, then you probably should give that its own pool to be captured with that test thread, as I think the redirection is inherited.

alexcrichton mentioned this issue Jun 27, 2023

Capturing thread output during tests interacts poorly with global thread pools rust-lang/rust#113080

Open

alexcrichton mentioned this issue Feb 5, 2024

Cranelift: Use a fixpoint loop to compute the best value for each eclass bytecodealliance/wasmtime#7859

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic messages sometimes eaten in tests #1066

Panic messages sometimes eaten in tests #1066

alexcrichton commented Jun 27, 2023

alexcrichton commented Jun 27, 2023

alexcrichton commented Jun 27, 2023

cuviper commented Jun 30, 2023

Panic messages sometimes eaten in tests #1066

Panic messages sometimes eaten in tests #1066

Comments

alexcrichton commented Jun 27, 2023

alexcrichton commented Jun 27, 2023

alexcrichton commented Jun 27, 2023

cuviper commented Jun 30, 2023