Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic messages sometimes eaten in tests #1066

Open
alexcrichton opened this issue Jun 27, 2023 · 3 comments
Open

Panic messages sometimes eaten in tests #1066

alexcrichton opened this issue Jun 27, 2023 · 3 comments

Comments

@alexcrichton
Copy link

This is a problem I've been encountering in the Wasmtime repository for quite some time now and I've only just now taken the time to minimize it and debug it. The problem I'm having is that often a cargo test run will fail for one reason or another but there's no actual logs to what went wrong. Not in the failures section printed by libtest nor interspersed throughout the output of the tests. The test log is 100% clean with no panic messages, yet one of the threads panicked. The odd behavior is that running a single test prints the output of the test and panic message as expected, so it hasn't been a huge issue overall.

I reduced this down to rayon, however, showcased by this example:

use rayon::prelude::*;

#[test]
fn foo() {
    [1, 2].par_iter().for_each(|_| {
        // ..
    });
}

#[test]
fn bar() {
    [1, 2].par_iter().for_each(|i| {
        if *i % 2 == 0 {
            panic!();
        }
    });
}

Most of the time the output looks like this:

$ cargo test -q
running 2 tests
.F
failures:

---- bar stdout ----
thread '<unnamed>' panicked at 'explicit panic', src/main.rs:14:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin wat`

but sometimes the output looks like this:

$ cargo test -q

running 2 tests
.F
failures:

failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin wat`

In this tiny example it's nondeterministic, but in the larger context of Wasmtime at least I've found that it almost never prints panic messages in some circumstances.

My best guess as to what's going on here is that somehow the foo test steals the work of the bar test. This work then panics and rayon transmits the result back to bar, but libtest/libstd's capturing of output means that when foo succeeds it swallows all the output and nothing ends up making its way to bar. I'm not sure if there's a way to confirm that this is happening though (or whether it's something else entirely)

My guess is that rayon probably can't do much about this, but I wanted to open up an issue anyway since it's been quite perplexing to me historically as to why CI runs for example fail but there's nothing in the logs.

@alexcrichton
Copy link
Author

Ok scratch my theory, an updated version of the test

use rayon::prelude::*;
use std::io::Write;

#[test]
fn foo() {
    write!(std::io::stdout(), "{:?}", std::thread::current().id()).unwrap();
    [1, 2].par_iter().for_each(|_| {
        // ..
    });
}

#[test]
fn bar() {
    println!("{:?}", std::thread::current().id());
    let obj = std::panic::catch_unwind(|| {
        [1, 2].par_iter().for_each(|i| {
            if *i % 2 == 0 {
                println!("{:?}", std::thread::current().id());
                std::panic::panic_any(std::thread::current().id());
            }
        });
    })
    .unwrap_err();
    panic!("{:?}", obj.downcast_ref::<std::thread::ThreadId>().unwrap());
}

can print on a good day:

running 2 tests
ThreadId(3).F
failures:

---- bar stdout ----
ThreadId(2)
ThreadId(15)
thread '<unnamed>' panicked at 'Box<dyn Any>', src/main.rs:19:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'bar' panicked at 'ThreadId(15)', src/main.rs:24:5


failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

and on a bad day:

running 2 tests
ThreadId(3).F
failures:

---- bar stdout ----
ThreadId(2)
thread 'bar' panicked at 'ThreadId(12)', src/main.rs:24:5


failures:
    bar

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

(note the lack of second panic message in this "bad" output) showing that the thread is indeed panicking on what is presumably the rayon thread pool, so now I'm not actually sure how the output is being swallowed.

@alexcrichton
Copy link
Author

Apologies for the live-debugging of this, but it appears I'm three years too late to the punch and it's libstd's inheriting of streams which is probably causing the problems. My guess is the successful test spawned a rayon worker thread which then ended up running the panicking work of the failing test, but because the stdout/stderr capture was hooked up to the successful test nothing got printed.

An example of this is:

use std::sync::{Condvar, Mutex};
use std::thread::JoinHandle;

static LOCK: Mutex<Option<JoinHandle<()>>> = Mutex::new(None);
static COND: Condvar = Condvar::new();

#[test]
fn foo() {
    *LOCK.lock().unwrap() = Some(std::thread::spawn(|| {
        panic!("this message will not be seen");
    }));
    COND.notify_one();
}

#[test]
fn bar() {
    let lock = LOCK.lock().unwrap();
    let mut lock = COND.wait_while(lock, |state| state.is_none()).unwrap();
    lock.take().unwrap().join().unwrap();
}

where the foo test starts a thread which is "used" by the bar test. Not exactly Rayon-like but gets the point across.

@cuviper
Copy link
Member

cuviper commented Jun 30, 2023

My guess is that rayon probably can't do much about this,

Yeah, I don't know anything we could do about it. As a user, testing with --no-capture is probably the best bet to figure out what happened. If you have a #[should_panic] test that you know is going to span stderr, then you probably should give that its own pool to be captured with that test thread, as I think the redirection is inherited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants