Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang with no activity in certain high-concurrency situations #3559

Open
abizjak opened this issue Feb 3, 2024 · 5 comments
Open

Hang with no activity in certain high-concurrency situations #3559

abizjak opened this issue Feb 3, 2024 · 5 comments
Assignees
Labels
A-http2 Area: HTTP/2 specific. C-bug Category: bug. Something is wrong. This is bad!

Comments

@abizjak
Copy link

abizjak commented Feb 3, 2024

Version

[[package]]
name = "hyper"
version = "0.14.28"

[[package]]
name = "h2"
version = "0.3.24"

[[package]]
name = "tonic"
version = "0.10.2"

Platform

Linux ****** 6.6.13-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 20 18:03:28 UTC 2024 x86_64 GNU/Linux

and also

Linux ****** 6.5.11-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov  8 22:37:57 UTC 2023 x86_64 GNU/Linux

Description

In case of a lot of concurrent queries being requested by the client, and with a limit on the number of concurrent queries by the server (but with no load shedding) both the client and server eventually seem to just hang, neither of then doing anything (i.e, there is no CPU usage, and the server, if queried by a different client responds to the new requests from that client).

The exact reproducibility is hard to nail down, but it seems to have something to do with the amount of data being transferred, since if I make messages that are being transferred very small I cannot reproduce.

During attempts I have once also witnessed this

thread 'tokio-runtime-worker' panicked at 'assertion failed: self.window_size.0 >= sz as i32', /home/abizjak/.cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.24/src/proto/streams/flow_control.rs:181:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

but unfortunately I was not running with RUST_BACKTRACE, and I have been unable to reproduce it again.

One way to structure the description:

The server code is this

use tonic::{transport::Server, Request, Response, Status};

use hello_world::greeter_server::{Greeter, GreeterServer};
use hello_world::{HelloReply, HelloRequest};

pub mod hello_world {
    tonic::include_proto!("helloworld");
}

#[derive(Default)]
pub struct MyGreeter {}

#[tonic::async_trait]
impl Greeter for MyGreeter {
    async fn say_hello(
        &self,
        request: Request<HelloRequest>,
    ) -> Result<Response<HelloReply>, Status> {
        let reply = hello_world::HelloReply {
            message: format!(
                "{:?}, {}",
                [0; 10000], // I have been unable to reproduce with size 1000 here.
                request.into_inner().name,
            ),
        };
        Ok(Response::new(reply))
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let addr = "[::1]:50051".parse().unwrap();
    let greeter = MyGreeter::default();

    Server::builder()
        .concurrency_limit_per_connection(100)
        .add_service(GreeterServer::new(greeter))
        .serve(addr)
        .await?;

    Ok(())
}

And the client code is

use futures::stream::FuturesUnordered;
use futures::TryStreamExt;
use hello_world::greeter_client::GreeterClient;
use hello_world::HelloRequest;
use tracing_subscriber::filter::LevelFilter;

pub mod hello_world {
    tonic::include_proto!("helloworld");
}

// 300 -> Hang

// 150 -> Hang

// 100 -> Hang

// 50 -> No hang
const N: usize = 80;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // It seems a lot harder to reproduce with logging. I have been unable to do so with trace logging.
    // So consider commenting out.
    {
        use tracing_subscriber::prelude::*;
        let log_filter = tracing_subscriber::filter::Targets::new()
            .with_target("h2", LevelFilter::DEBUG)
            .with_target("hyper", LevelFilter::DEBUG);
        tracing_subscriber::registry()
            .with(tracing_subscriber::fmt::layer())
            .with(log_filter)
            .init();
    }

    let client = GreeterClient::connect("http://[::1]:50051").await?;

    let futs = FuturesUnordered::new();
    let mut message = String::with_capacity(N);
    for _ in 0..N {
        message.push('1');
    }
    // The goal is to make concurrent requests.
    for i in 0..100_000 {
        let request = tonic::Request::new(HelloRequest { name: message.clone() });
        let mut client = client.clone();
        futs.push(async move {
            client.say_hello(request).await.map(|r| (r, i))
        });
    }
    futs.try_for_each_concurrent(None, move |(r, i)| async move {
        let msg = r.into_inner().message;
        let len = msg.len();
        eprintln!("{len}, {i}",);
        Ok(())
    })
    .await?;

    Ok(())
}

I am attaching a complete Rust project with lock files that can be run as follows

  1. Start server
cargo run --release --bin server
  1. Run client
cargo run --release --bin client

Omitting --release also seems to reproduce it.

I am attaching two things

@abizjak abizjak added the C-bug Category: bug. Something is wrong. This is bad! label Feb 3, 2024
@abizjak
Copy link
Author

abizjak commented Feb 3, 2024

FYI, I have reported this to hyper, but I don't really know whether it's a hyper issue or in tonic or h2. It seems unlikely it's a tonic issue, the only reason I'm using it is that the issue originally arose in a grpc server we were testing.

@seanmonstar seanmonstar added the A-http2 Area: HTTP/2 specific. label Feb 5, 2024
@dswij dswij self-assigned this Feb 13, 2024
@arthurprs
Copy link
Contributor

This is probably the same as #2312 😬

@abizjak
Copy link
Author

abizjak commented Feb 22, 2024

@dswij (in Discord) if I understand correctly, narrowed this issue down to the interaction between the concurrency limit and maximum concurrent streams limit.

@dswij
Copy link
Member

dswij commented Feb 22, 2024

Yes. Setting a low concurrency_limit together with a high maximum_concurrent_streams in tonic will result in a deadlock. Here, it's because the concurrency limit is lower than the default HTTP/2 max concurrent streams.

Setting concurrency_limit_per_connection >= max_concurrent_streams should work to get around this issue for the moment. i.e.

    Server::builder()
		.max_concurrent_streams(100)
        .concurrency_limit_per_connection(100)
        .add_service(GreeterServer::new(greeter))
        .serve(addr)
        .await?;

For details, this happens because of the flow control in HTTP/2 and how the ConcurrencyLimitLayer is implemented using Semaphores.

#2132 is probably because of pool implementation.

@mladedav
Copy link

mladedav commented Mar 3, 2024

Small correction, the last comment should have mentioned #2312 - client hang in hyper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-http2 Area: HTTP/2 specific. C-bug Category: bug. Something is wrong. This is bad!
Projects
None yet
Development

No branches or pull requests

5 participants