Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block_in_place + block_on can hang on runtime shutdown / runtime drop #6463

Open
dpc opened this issue Apr 5, 2024 · 4 comments
Open

block_in_place + block_on can hang on runtime shutdown / runtime drop #6463

dpc opened this issue Apr 5, 2024 · 4 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime

Comments

@dpc
Copy link
Contributor

dpc commented Apr 5, 2024

Version
List the versions of all tokio crates you are using. The easiest way to get
this information is using cargo tree subcommand:

1.37.0

Platform
The output of uname -a (UNIX), or version and 32 or 64-bit (Windows)

Linux

Description

We're using a lot the block_in_place + block_on pattern described in #5843 . It has many caveats, but it seems to work OK for us, as a async drop workaround.

However today I'm debugging a hang on shutdown. Basically Runtime is dropping and the whole process hangs. When I attach to gdb I can see that only a handful worker threads remain, and a timer thread as well. All worker threads seems to be inside block_in_place + block_on section, parked, waiting for something to wake them up, but I don't think there's any thread left to actually poll the event loop anymore.

I don't know how well supported this pattern should be, and I might be wrong about the whole thing altogether, but it seems to me that if tokio just reserved a single worker for the purpose of polling events and shut it down last, or somehow just avoided getting all worker threads block_in_placed, or shut down the event polling thread last (if a dedicated thread is used) the whole thing would just work.

@dpc dpc added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Apr 5, 2024
@Darksonn Darksonn added the M-runtime Module: tokio/runtime label Apr 5, 2024
@Darksonn
Copy link
Contributor

Darksonn commented Apr 5, 2024

I would expect IO resources and timers to return errors once runtime shutdown start, so it surprises me that it is hanging. Could you give more details?

@dpc
Copy link
Contributor Author

dpc commented Apr 5, 2024

Hmmm...

Maybe it is specific to what I'm doing.

impl Drop for ProcessHandleInner {
    fn drop(&mut self) {
        let Some(child) = &mut self.child else {
            return;
        };
        let name = self.name.clone();
        block_in_place(move || {
            tokio::runtime::Handle::current().block_on(async move {
                debug!(
                    target: LOG_DEVIMINT,
                    "sending SIGKILL to {name} and waiting for it to exit"
                );
                send_sigkill(child);
                if let Err(e) = child.wait().await {
                    warn!(target: LOG_DEVIMINT, "failed to wait for {name}: {e:?}");
                }
            })
        })
    }
}

@Darksonn the child is tokio::process::Child. Could it be it's just this one particular case does doesn't get handled?

@dpc
Copy link
Contributor Author

dpc commented Apr 5, 2024

Oh, shoot. Now I see it's not actually a send_sigkill and I'm not 100% sure if the process didn't hang. I don't think it did, because they all get killed on ctrl+c all the time reliably, but I'll try to verify.

Edit:
Nah, I changed to send_sigkill and I get the same result.

@dpc
Copy link
Contributor Author

dpc commented Apr 5, 2024

I pasted relevant part of gdb session: https://pastebin.com/VzHF0B5T , including list of threads and stackstrace that is mostly tokio functions if it's of any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime
Projects
None yet
Development

No branches or pull requests

2 participants