"blocking waitpid returned pid=0" from `AsyncGroupChild::wait` #21

9999years · 2023-10-27T00:24:15Z

Seems to be a cancel safety issue, this happens when i wait() with a timeout and then wait() again:

This reproducer:

use command_group::AsyncCommandGroup;
use tokio::process::Command;

#[tokio::main]
async fn main() {
    let mut group = Command::new("sh")
        .arg("-c")
        .arg("sleep 30; echo done!")
        .group_spawn()
        .unwrap();
    println!("spawned");

    match tokio::time::timeout(std::time::Duration::from_secs(1), group.wait()).await {
        Ok(res) => {
            println!("command exited or waiting failed: {res:?}");
        }
        Err(_) => {
            println!("command took too long");
        }
    }

    group.kill().unwrap();
    println!("killed");
    group.wait().await.unwrap();
    println!("finished waiting");
}

Prints this on my machine:

spawned
command took too long
killed
thread 'main' panicked at src/main.rs:26:24:
called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "blocking waitpid returned pid=0" }

The text was updated successfully, but these errors were encountered:

passcod · 2023-10-27T01:56:06Z

Sure, to conform to the Tokio API this future should be fused. Under unix you're not really supposed to do that (call wait twice) though.

(Also love your "people i'm not" website page btw)

9999years · 2023-10-27T18:07:58Z

Under unix you're not really supposed to do that (call wait twice) though.

Ah, that's the context I was missing. I read the waitpid(2) man page but I didn't get that.

I hoped I could work around this by calling wait(), storing the future, then calling kill() and waiting on the wait() future, but wait() is a mutable borrow so that doesn't work.

9999years · 2023-10-27T19:29:16Z

This seems to work, although I had to write the killpg call myself because the borrow checker complained about calling AsyncCommandGroup::kill when I had the AsyncCommandGroup::wait future still pending.

use command_group::AsyncCommandGroup;
use nix::sys::signal;
use nix::sys::signal::Signal;
use nix::unistd::Pid;
use tokio::process::Command;

#[tokio::main]
async fn main() {
    let mut group = Command::new("sh")
        .arg("-c")
        .arg("sleep 30; echo done!")
        .group_spawn()
        .unwrap();
    println!("spawned");
    let pgid = Pid::from_raw(group.id().unwrap() as i32);

    let mut wait = std::pin::pin!(group.wait());
    match tokio::time::timeout(std::time::Duration::from_secs(1), &mut wait).await {
        Ok(res) => {
            println!("command exited or waiting failed: {res:?}");
        }
        Err(_) => {
            println!("command took too long");
        }
    }

    signal::killpg(pgid, Signal::SIGKILL).unwrap();
    println!("killed");
    wait.await.unwrap();
    println!("finished waiting");
}

passcod · 2023-10-30T09:57:00Z

I'd be happy with a PR that made this cancel-safe, for the record. Had a look earlier but it wasn't obvious.

passcod · 2023-11-04T10:15:33Z

Right, I had a look at this again and I don't think the issue is really cancel safety. What happens is that when you call wait (or waitpid), that tells the kernel that it can clean up the resources of the process once it's exited. We can't cancel that call, so when you cancel the .wait() in application code all that does is drop the thread that has called wait. If the process stops anyway in between, the kernel will clean it up, and calling wait again will error (at best; if the PID is recycled you might call the second wait on the wrong process).

Indeed the wait() in command-group is fused already: calling it again after it went all the way to process completion will return the same ExitStatus.

I believe Tokio's 'trick' is that it also listens for SIGCHLD so that, in the background, it can handle if a process exits while nothing was "actively" wait()ing it.

9999years changed the title ~~"blocking waitpid returned pid=0"~~ "blocking waitpid returned pid=0" from AsyncGroupChild::wait Oct 27, 2023

passcod mentioned this issue Oct 29, 2023

Update deps and add note to wait() #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"blocking waitpid returned pid=0" from `AsyncGroupChild::wait` #21

"blocking waitpid returned pid=0" from `AsyncGroupChild::wait` #21

9999years commented Oct 27, 2023 •

edited

passcod commented Oct 27, 2023

9999years commented Oct 27, 2023

9999years commented Oct 27, 2023

passcod commented Oct 30, 2023

passcod commented Nov 4, 2023

"blocking waitpid returned pid=0" from AsyncGroupChild::wait #21

"blocking waitpid returned pid=0" from AsyncGroupChild::wait #21

Comments

9999years commented Oct 27, 2023 • edited

passcod commented Oct 27, 2023

9999years commented Oct 27, 2023

9999years commented Oct 27, 2023

passcod commented Oct 30, 2023

passcod commented Nov 4, 2023

"blocking waitpid returned pid=0" from `AsyncGroupChild::wait` #21

"blocking waitpid returned pid=0" from `AsyncGroupChild::wait` #21

9999years commented Oct 27, 2023 •

edited