Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokio::time::sleep causes hyper.rs http server to get stuck and unresponsive. #4961

Closed
4lisalehi opened this issue Aug 31, 2022 · 4 comments
Closed
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.

Comments

@4lisalehi
Copy link

4lisalehi commented Aug 31, 2022

Version

tokio v1.20.1
tokio-macros v1.8.0
tokio-utils v0.6.9
hyper v0.14.20

Platform

Linux 5.4.0-80-generic #90-Ubuntu x86_64 x86_64 x86_64 GNU/Linux

Description

In a test module of a Rust project, I am using hyper http server defining its make_service_fn and service_fn handlers to test a functionality. I'm also testing use of tokio::time::sleep function in my testing code.

The problem is, in the testing code block, whenever I use tokio::time::sleep function before the http request to the hyper http server, the hyper server does not send back any response anymore and seems to get stuck. i.e. its service_fn request handler does not run at all anymore.

In this scenario, I'm using clients such as isahc and curl to send http requests. Both of them timeout waiting for the response from the hyper http server and the problem seems not be on the client side.

Also, as I tested, it appears the problem is related to a change introduced in one of tokio v1.16.x releases. All versions starting from tokio v1.16.x (including tokio v1.20.1) seem to have this bug.

If not mistaken, I think there is a potential scheduling bug in the runtime when tokio::time::sleep is called in the code.

I tried this code:

#[tokio::test(flavor = "multi_thread")]
async fn test_sleep() {

    let server = Http::new("127.0.0.1:12345").unwrap(); // a wrapper constructor around hyper.rs server

    tokio::spawn(async move { server.serve().await.unwrap() });

    let _ = tokio::spawn(async move {
        let request_body = "valid data";

        isahc::post(
            "http://127.0.0.1:12345/test/sleep",
            request_body,
        )
        .unwrap(); // runs successfully and the request is delivered to the http server and its handler.

        println!("this line is printed");

        tokio::time::sleep(std::time::Duration::from_secs(1)).await;

        isahc::post("http://127.0.0.1:12345/test/sleep", request_body).unwrap();

        println!("this line is never printed in tokio runtime versions starting from `v1.16.x`, although expected");
    })
    .await;
}

In the above code, doing one of the following resolves the issue and causes the last println! macro to run.

  • removing the tokio::time::sleep function call.
  • using the non-async standard library version of the sleep function std::thread::sleep instead of tokio::time::sleep.
  • using tokio versions below v1.16.x, leaving tokio::time::sleep function call intact.

Additional Information

curl http://127.0.0.1:12345/test/sleep --data "valid data" -v output while the test is running and stuck:

* Trying 127.0.0.1:12345...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 12345 (#0)
> POST /test/sleep HTTP/1.1
> Host: 127.0.0.1:12345
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Length: 10
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 10 out of 10 bytes`
...

tcpdump -i any port 12345 output while the test is running and stuck:

16:45:54.465469 IP localhost.51366 > localhost.12110: Flags [F.], seq 179, ack 1, win 512, options [nop,nop,TS val 3523454835 ecr 3523445100], length 0
16:45:54.506717 IP localhost.12110 > localhost.51366: Flags [.], ack 180, win 511, options [nop,nop,TS val 3523454876 ecr 3523454835], length 0
16:45:56.288871 IP localhost.51380 > localhost.12110: Flags [S], seq 1965835364, win 65495, options [mss 65495,sackOK,TS val 3523456658 ecr 0,nop,wscale 7], length 0
16:45:56.288885 IP localhost.12110 > localhost.51380: Flags [S.], seq 2904765061, ack 1965835365, win 65483, options [mss 65495,sackOK,TS val 3523456658 ecr 3523456658,nop,wscale 7], length 0
16:45:56.288896 IP localhost.51380 > localhost.12110: Flags [.], ack 1, win 512, options [nop,nop,TS val 3523456658 ecr 3523456658], length 0
16:45:56.288941 IP localhost.51380 > localhost.12110: Flags [P.], seq 1:179, ack 1, win 512, options [nop,nop,TS val 3523456659 ecr 3523456658], length 178
16:45:56.288946 IP localhost.12110 > localhost.51380: Flags [.], ack 179, win 511, options [nop,nop,TS val 3523456659 ecr 3523456659], length 0
@4lisalehi 4lisalehi added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Aug 31, 2022
@Darksonn
Copy link
Contributor

What you're seeing is #4730. The problem is that you're calling the blocking function isahc in async code, and it has nothing to do with time::sleep. Consider reading this for more info.

@4lisalehi
Copy link
Author

4lisalehi commented Sep 3, 2022

Thank you for the guide and the references.
My question is, why would removing the tokio::time::sleep code from the above function make both blocking isahc::post functions to run normally and the test_sleep function to finish?

Also, using tokio versions below v1.16.x without any change in the code above would resolve the issue as well. Is it because of a modification introduced since v1.16.x releases which has changed the runtime behavior with blocking functions in async code? If so, I'd appreciate if you send me a link to those change notes/commits in the repository so that I can understand the new behavior better.

And sorry in advance if I have misunderstood any parts of the issue.
Thank you.

@Darksonn
Copy link
Contributor

Darksonn commented Sep 3, 2022

As far as I can tell, the sleep causes the issue because it makes your blocking task yield and get migrated to the thread that holds the IO/timer driver. As for v1.16.x, the change in question is #4383.

@4lisalehi 4lisalehi changed the title tokio::time::sleep causes hyper.rs http server to get stuck and unresponsive; potential runtime scheduler bug. tokio::time::sleep causes hyper.rs http server to get stuck and unresponsive. Sep 5, 2022
@4lisalehi
Copy link
Author

OK. Thank you for your guide. I think I can close this issue now. If possible, I will re-open it in case I encountered a relevant problem. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants