Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"blocking waitpid returned pid=0" from AsyncGroupChild::wait #21

Open
9999years opened this issue Oct 27, 2023 · 5 comments
Open

"blocking waitpid returned pid=0" from AsyncGroupChild::wait #21

9999years opened this issue Oct 27, 2023 · 5 comments

Comments

@9999years
Copy link

9999years commented Oct 27, 2023

Seems to be a cancel safety issue, this happens when i wait() with a timeout and then wait() again:

This reproducer:

use command_group::AsyncCommandGroup;
use tokio::process::Command;

#[tokio::main]
async fn main() {
    let mut group = Command::new("sh")
        .arg("-c")
        .arg("sleep 30; echo done!")
        .group_spawn()
        .unwrap();
    println!("spawned");

    match tokio::time::timeout(std::time::Duration::from_secs(1), group.wait()).await {
        Ok(res) => {
            println!("command exited or waiting failed: {res:?}");
        }
        Err(_) => {
            println!("command took too long");
        }
    }

    group.kill().unwrap();
    println!("killed");
    group.wait().await.unwrap();
    println!("finished waiting");
}

Prints this on my machine:

spawned
command took too long
killed
thread 'main' panicked at src/main.rs:26:24:
called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "blocking waitpid returned pid=0" }
@9999years 9999years changed the title "blocking waitpid returned pid=0" "blocking waitpid returned pid=0" from AsyncGroupChild::wait Oct 27, 2023
@passcod
Copy link
Member

passcod commented Oct 27, 2023

Sure, to conform to the Tokio API this future should be fused. Under unix you're not really supposed to do that (call wait twice) though.

(Also love your "people i'm not" website page btw)

@9999years
Copy link
Author

Under unix you're not really supposed to do that (call wait twice) though.

Ah, that's the context I was missing. I read the waitpid(2) man page but I didn't get that.

I hoped I could work around this by calling wait(), storing the future, then calling kill() and waiting on the wait() future, but wait() is a mutable borrow so that doesn't work.

@9999years
Copy link
Author

This seems to work, although I had to write the killpg call myself because the borrow checker complained about calling AsyncCommandGroup::kill when I had the AsyncCommandGroup::wait future still pending.

use command_group::AsyncCommandGroup;
use nix::sys::signal;
use nix::sys::signal::Signal;
use nix::unistd::Pid;
use tokio::process::Command;

#[tokio::main]
async fn main() {
    let mut group = Command::new("sh")
        .arg("-c")
        .arg("sleep 30; echo done!")
        .group_spawn()
        .unwrap();
    println!("spawned");
    let pgid = Pid::from_raw(group.id().unwrap() as i32);

    let mut wait = std::pin::pin!(group.wait());
    match tokio::time::timeout(std::time::Duration::from_secs(1), &mut wait).await {
        Ok(res) => {
            println!("command exited or waiting failed: {res:?}");
        }
        Err(_) => {
            println!("command took too long");
        }
    }

    signal::killpg(pgid, Signal::SIGKILL).unwrap();
    println!("killed");
    wait.await.unwrap();
    println!("finished waiting");
}

@passcod
Copy link
Member

passcod commented Oct 30, 2023

I'd be happy with a PR that made this cancel-safe, for the record. Had a look earlier but it wasn't obvious.

@passcod
Copy link
Member

passcod commented Nov 4, 2023

Right, I had a look at this again and I don't think the issue is really cancel safety. What happens is that when you call wait (or waitpid), that tells the kernel that it can clean up the resources of the process once it's exited. We can't cancel that call, so when you cancel the .wait() in application code all that does is drop the thread that has called wait. If the process stops anyway in between, the kernel will clean it up, and calling wait again will error (at best; if the PID is recycled you might call the second wait on the wrong process).

Indeed the wait() in command-group is fused already: calling it again after it went all the way to process completion will return the same ExitStatus.

I believe Tokio's 'trick' is that it also listens for SIGCHLD so that, in the background, it can handle if a process exits while nothing was "actively" wait()ing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants