Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Stream object in a pending state won't be dropped after the client disconnect #1313

Open
masnagam opened this issue Jan 23, 2020 · 52 comments · May be fixed by #2830
Open

A Stream object in a pending state won't be dropped after the client disconnect #1313

masnagam opened this issue Jan 23, 2020 · 52 comments · May be fixed by #2830
Assignees
Labels
A-http project: actix-http C-bug Category: bug

Comments

@masnagam
Copy link
Contributor

masnagam commented Jan 23, 2020

I'm not sure if it was intended, but a Stream object in a pending state won't be dropped after the client disconnects.

I found this issue in actix-web 2.0.0 and created a small test program like below:
https://github.com/masnagam/rust-case-studies/blob/master/actix-web-streaming-and-tcp-disconnect/src/main.rs

When executing the following command:

timeout 3 curl -v http://localhost:3000/pending

we never see PendingStream: Dropped in the log messages from the test program.

If a stream returns a Poll::Ready, it will be dropped shortly. We can see that by executing:

timeout 3 curl -v http://localhost:3000/alternate

Is there any workaround to stop polling the pending stream shortly when the client disconnects?

@masnagam masnagam changed the title A Stream object in a pending state is never dropped even when the client disconnect A Stream object in a pending state is never dropped after the client disconnect Jan 23, 2020
@masnagam masnagam changed the title A Stream object in a pending state is never dropped after the client disconnect A Stream object in a pending state won't be dropped after the client disconnect Jan 23, 2020
@niveau0
Copy link

niveau0 commented Jan 23, 2020

In my understanding (still new with Rust), if you return pending, your future/stream will never be scheduled again if not awoken by any trigger. See:
https://doc.rust-lang.org/std/task/enum.Poll.html#variant.Pending

I think this also will prevent closing down the stream. I had a similar issue and never return pending from the stream, if not waiting on another future/stream.

@robjtede
Copy link
Member

I've run this code and it does print out multiple "PendingStream: Pending" messages after one request so it seems it is getting awoken.

@masnagam
Copy link
Contributor Author

@niveau0

As @robjtede wrote, poll_next() is called periodically even after the client disconnects. However, it won't be dropped until it returns a Poll::Ready.

One of the possible causes is that actix-web calls write_xxx() only when receiving data from the Stream object. And calling write_xxx() is the only way to detect the TCP disconnection in tokio.

your future/stream will never be scheduled again if not awoken by any trigger.

In my understanding, it depends on how the streaming function is implemented. For example, we might be able to implement it like this:

// Below is a pseudo-code

struct StreamingResponse<SO, ST> {
    socket: SO,
    stream: ST,
}

impl<SO, ST> Future for StreamingResponse<SO, ST>
where
    SO: AsyncWrite,
    ST: Stream,
{
    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        // Implement the following sequence with a state machine:
        //
        // (1) polling self.socket.poll_write() with an empty buffer
        //       in order to detect the TCP disconnection.
        // (2) polling self.stream.poll_next() in order to receive data
        // (3) polling self.socket.poll_write() in order to write the received data
        //
        // I'm not sure whether (1) is possible...
    }

@niveau0
Copy link

niveau0 commented Jan 23, 2020

@robjtede ok thanks for clarifying, I had no time to test the code, I just had a similar issue and the documentation says

Poll::Pending means that this stream's next value is not ready yet. Implementations will ensure that the current task will be notified when the next value may be ready.

What I really do not understand here is, why the thread executor calls poll_next() ever again after it returned a Pending (independent from the clients disconnect).
As far as my understanding goes, the notification must be done by some sort of waker (passed within the context). Here is an example:
https://rust-lang.github.io/async-book/02_execution/03_wakeups.html
An extra thread is used to call the waker to wake up the task.

So I'm out for now, first need to investigate how the executor really works :-)

@masnagam
Copy link
Contributor Author

Prepared a reproduction environment:
https://github.com/masnagam/rust-case-studies/tree/master/actix-web-streaming-and-tcp-disconnect

It supports VS Code Remote Containers.

@masnagam
Copy link
Contributor Author

Investigated about the periodic PendingStream::poll() calls.

The root cause of the periodic PendingStream::poll() calls is a timer for the keep-alive check which is enabled by default. See:
https://github.com/actix/actix-web/blob/master/actix-http/src/config.rs#L263

We can disable it by setting keep_alive(0). In this case, the periodic PendingStream::poll() calls stop.

@masnagam
Copy link
Contributor Author

Investigated how actix-web detects client disconnect.

When the client disconnects, poll_write() in InnerDispatcher::poll_flush() returns Poll::Ready(Ok(0)).

When the write_buf is an empty, InnerDispatcher::poll_flush() returns Ok(false) immediately. This is the reason why actix-web cannot detect the client disconnect while the stream is pending.

We may detect the client disconnect quickly if we can check it like below:

impl<T, S, B, X, U> Future for Dispatcher<T, S, B, X, U>
{
    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        ...
                    loop {
                        ...
                        if inner.check_disconnect() {
                            // client disconnected
                            return Err(...);
                        }

                        if inner.poll_flush(cx)? || !drain {
                            break;
                        }
                    }
    }
}

@masnagam
Copy link
Contributor Author

masnagam commented Jan 27, 2020

I concluded that it's difficult to detect client disconnects while streams are pending.

tokio's TcpStream seems to be based on Rust's TcpStream which has a traditional socket interface. TcpStream's read/write function call with an empty buffer returns Ok(0). So, it's impossible to distinguish between client disconnect and success.

Issue#29 in the tokio project is a similar topic.

I checked behavior of tokio::io::PollEvented::poll_write_ready() on Linux. I analyzed my test program by using VS Code debugger and found that this function doesn't detect client disconnect. tokio::net::TcpStream::poll_write_priv() reached mio::net::TcpStream::write() which detected the client disconnect.

Read/Write with non-empty buffer can detect the client disconnect. But that breaks the HTTP request pipelining and the HTTP response.

There might be a platform specific workaround, but there is no useful functions in crates that actix-web uses at this moment.

Finally, I decided to feed some data at short intervals from my stream implementation in order to detect client disconnect quickly.

@masnagam
Copy link
Contributor Author

@robjtede You can close this issue if you don't have any additional investigation.

@robjtede
Copy link
Member

I haven't investigated further. Great work finding out. It's comforting that the Keep-Alive checks were the cause of the polls.

Are we considering this correct behavior?

@masnagam
Copy link
Contributor Author

If actix-web writes response data from a stream object for a request that a client disconnected, I think that it's incorrect behavior. Because the response data will be garbage for subsequent pipelined requests on the keep-alive socket. I haven't confirmed, but actix-web might write garbage response data...

If actix-web wakes up tasks which process I/O on non-keep-alive sockets, it's not correct behavior. My test program cannot test that. So, we need to make another test program for that.

@masnagam
Copy link
Contributor Author

masnagam commented Feb 9, 2020

In an old version of tokio, it's possible to observe tcp disconnect by using TcpStream::poll_read_ready(), but it's already removed.

This function was deleted at tokio-rs/tokio@6d8cc4e.

Probably, we can see the reason for the deletion in a discussion on gitter: tokio-rs/tokio#1392 (comment)

@masnagam
Copy link
Contributor Author

masnagam commented Feb 9, 2020

Posted a question about this issue:
tokio-rs/tokio/issues/2228

@drogus
Copy link

drogus commented Aug 5, 2020

Thanks for all the investigation. I've run into it as well and ideally the solution would be similar to what Go does, ie. wait for closing a read stream (ie. using TcpStream::poll_read_ready()). I'm not sure how many people care about it, but right now it's kind of a differentiator between Go's net/http and Tokio based ecosystem, ie. if a knowledge about a disconnect is needed, like for a presence system, Tokio based libraries (including actix-web) are not the best choice.

@masnagam
Copy link
Contributor Author

masnagam commented Aug 7, 2020

I've created tokio-rs/tokio#2743 which restores TcpStream::{poll_read_ready, poll_write_ready}.

@robjtede robjtede added C-bug Category: bug and removed needs-investigation labels Aug 8, 2020
@robjtede
Copy link
Member

robjtede commented Aug 8, 2020

It's encouraging they have acknowledged that issue and PR but I'd rather not wait for a Tokio change since this is the last issue on the v3 blocker list.

Can we say that for now perpetually pending streams are not allowed and remove this as a v3 blocker? It can be fixed a little later. What do you think @masnagam ?

@masnagam
Copy link
Contributor Author

masnagam commented Aug 8, 2020

@robjtede Yes, you can. That's acceptable at least to me.

@robjtede
Copy link
Member

Note that the relevant Tokio PR has been merged and methods are available.

https://docs.rs/tokio/1.0.1/tokio/net/struct.TcpStream.html#method.poll_read_ready

@robjtede robjtede added the A-http project: actix-http label Dec 31, 2020
@robjtede robjtede self-assigned this Dec 31, 2020
@robjtede robjtede added this to the actix-web v4 milestone Jan 2, 2021
@fakeshadow
Copy link
Contributor

fakeshadow commented Jan 29, 2021

The h1 dispatcher calls on your stream and give you the waker.
It's your job to take that waker and properly wake up your pending stream and that is the right way to implement a stream. If you don't do any sort of book keeping and want the dispatcher wake up itself then you are using Rust's async wrong.

But anyway what you want to look into is PollResponse::DoNothing. Your meaningless pending state would trigger this and could lead to pending state of dispatcher. But be ware this could be a breaking change as there are people who do stream the right way by storing their waker and wake up a stream properly.

@masnagam
Copy link
Contributor Author

@fakeshadow I've confirmed that, too. Thank you!

@robjtede robjtede modified the milestones: actix-web v4, actix-web post-v4 Apr 22, 2021
@ghost
Copy link

ghost commented Sep 2, 2021

As a caution to anyone still using Actix v3, this problem may come up if you using Websockets. I had a very nasty bug come up where my server has a cap on the number of connections for a particular endpoint and they were being exhausted. I checked my netstat outputs and couldn't find living connections but the behavior of the server was clear; the connection cap was being reached (I have it set up to return HTTP 503 if the connection cap is reached). I eventually traced the problem to the web::Payload stream not terminating if a client closes a TCP connection abnormally e.g. in the event of a program crash:

#[get("/the-endpoint")]
async fn endpoint(
    connection_tracker: web::Data<ConnectionTracker>,
    http_request: HttpRequest,
    mut http_stream: web::Payload,
)
    -> Result<HttpResponse, actix_web::Error>
{
    /* Websocket and connection setup boilerplate */

    // THE INTERESTING BIT
    tokio::task::spawn_local(async move {
        debug!("Stream reader opening");

        let mut codec = actix_http::ws::Codec::new();
        let mut buf = BytesMut::with_capacity(65536);

        while let Some(chunk_or_err) = http_stream.next().await {
            let chunk = match chunk_or_err {
                Ok(c) => c,
                Err(_) => break,
            };
            buf.put(chunk);
        
            match codec.decode(&mut buf) {
                Ok(maybe_frame) => {
                    if let Some(frame) = maybe_frame {
                        if let Err(_) = incoming_mesgs_tx.send(Ok(frame)).await {
                            break;
                        }
                    }
                },
                Err(protocol_err) => {
                    if let Err(_) = incoming_mesgs_tx.send(Err(protocol_err)).await {
                        break;
                    }
                },
            }
        }

        // I noticed these were not showing up consistently in the logs
        // Ergo the stream was never returning None
        debug!("Stream reader closing");
    });

    /* More response and setup boilerplate */
}

Caveats

I am using Websockets to bridge into a Tokio application so I'm not using the actix actor framework and the associated actix-web-actors system for implementing Websockets. However I see no reason why the actix-web-actors implementation wouldn't also be affected. After reviewing the web actors code, I saw that it simply wraps web::Payload and iterates on it just like my Tokio code. And the problem is coming directly from the web::Payload stream, ergo actor-based Websockets would probably also be afflicted.

Workaround

With Websockets, I found this is fairly easy to work around, albeit imperfectly. After reading this thread, it figured I just need to get the server to trigger a socket read or write semi-regularly in case a client disconnects abnormally.

I set up the server to emit Websocket pings once a second for each client and this has eliminated the problem. Probably just good practice to do this anyways 😄

@robjtede robjtede removed this from the actix-web post-v4 milestone Mar 1, 2022
simorex80 added a commit to simorex80/actix-web that referenced this issue Jul 23, 2022
simorex80 pushed a commit to simorex80/actix-web that referenced this issue Jul 23, 2022
@simorex80 simorex80 mentioned this issue Jul 26, 2022
5 tasks
@simorex80 simorex80 linked a pull request Aug 7, 2022 that will close this issue
5 tasks
simorex80 added a commit to simorex80/actix-web that referenced this issue Aug 7, 2022
simorex80 added a commit to simorex80/actix-web that referenced this issue Aug 8, 2022
simorex80 added a commit to simorex80/actix-web that referenced this issue Aug 8, 2022
simorex80 added a commit to simorex80/actix-web that referenced this issue Sep 15, 2022
simorex80 pushed a commit to simorex80/actix-web that referenced this issue Sep 16, 2022
simorex80 added a commit to simorex80/actix-web that referenced this issue Sep 16, 2022
simorex80 added a commit to simorex80/actix-web that referenced this issue Jan 24, 2023
@ryzhyk
Copy link

ryzhyk commented Jul 15, 2023

@masnagam , @fakeshadow , many thanks for the detailed investigation. Finding this thread in 2023, I am at least able to understand the problem. However, it looks like this is still an issue in the latest version of actix-web and there is no good workaround available, short of making sure the stream produces some new data periodically.

My application creates an HTTP response with a streaming body that may not produce any data for a long time; however I still want to be able to drop the stream when the client disconnects.

It appears that @fakeshadow had a fix, but the PR was never merged.

@masnagam, did you manage to solve this or perhaps you have any ideas?

@fakeshadow
Copy link
Contributor

fakeshadow commented Jul 15, 2023

It appears that @fakeshadow had a fix, but the PR was never merged.

Sorry I don't work on actix anymore. It would thankful to not ping me in future conversation.
That said here is some final thoughts on this issue: a fix can be introduced by either a high level mandatory stream timeout or a rework of low level io cancel.

@masnagam
Copy link
Contributor Author

@masnagam, did you manage to solve this or perhaps you have any ideas?

@ryzhyk There might be some workaround but we simply switched to axum. mirakc/mirakc#223 may help you.

ryzhyk pushed a commit to feldera/feldera that referenced this issue Jul 19, 2023
There is a bug in actix (actix/actix-web#1313)
that prevents it from dropping HTTP connections on client disconnect
unless the endpoint periodically sends some data.  As a result, all
API-based data transfers eventually start to fail in the UI.  As a
workaround, we generate an empty output chunk if there is no real
payload to send for more than 3 seconds.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit to feldera/feldera that referenced this issue Jul 19, 2023
There is a bug in actix (actix/actix-web#1313)
that prevents it from dropping HTTP connections on client disconnect
unless the endpoint periodically sends some data.  As a result, all
API-based data transfers eventually start to fail in the UI.  As a
workaround, we generate an empty output chunk if there is no real
payload to send for more than 3 seconds.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
@ryzhyk
Copy link

ryzhyk commented Jul 19, 2023

This seems like a pretty serious issue in actix-web, making it hard to use in scenarios that require streaming data continuously over HTTP (an increasingly popular thing to do nowadays).

@robjtede , based on this thread, when this issue was originally created three years ago there really was not a good way to fix it. Do you think things have changed since? Do you see a solution to this?

I notice there is another PR by @simorex80 #2830 that claims to fix this bug. @simorex80 and @robjtede , do you believe the PR solves the issue? Is it just a matter of adding a test for it? If so, I can look into writing such a test.

Again, this seems like a pretty significant gap in the otherwise excellent framework. I'd love to see it fixed, so that users like myself don't need to resort to ugly workarounds.

@robjtede
Copy link
Member

As it happens, I'm looking to clean up the remaining issue for v4.4 release, including this one, in the coming few days. I'll be assessing the open PRs that claim to solve this issue. Hopefully they include good test cases 🤞🏻.

@ryzhyk
Copy link

ryzhyk commented Jul 19, 2023

Awesome, really looking forward to it. And happy to help with testing.

@simorex80
Copy link

simorex80 commented Jul 20, 2023

This seems like a pretty serious issue in actix-web, making it hard to use in scenarios that require streaming data continuously over HTTP (an increasingly popular thing to do nowadays).

@robjtede , based on this thread, when this issue was originally created three years ago there really was not a good way to fix it. Do you think things have changed since? Do you see a solution to this?

I notice there is another PR by @simorex80 #2830 that claims to fix this bug. @simorex80 and @robjtede , do you believe the PR solves the issue? Is it just a matter of adding a test for it? If so, I can look into writing such a test.

Again, this seems like a pretty significant gap in the otherwise excellent framework. I'd love to see it fixed, so that users like myself don't need to resort to ugly workarounds.

Hi @ryzhyk , I can confirm that we are using the fix #2830 in production from more than one year. (overriding cargo source definition)
Currently we didn't find any issue both for normal http request and for SSE request (SSE was was the reason for this fix).

How you can reproduce the issue:

  1. create an actix_web method for handle a SSE request (server send event)
  2. connect to the specific url using Chrome (but also with Edge)
  3. after a while close the browser
  • Without the fix the connection on the server still exists and it will never be dropped.
  • With the fix after a while the connection on the server will be dropped and the memory is freed.

This is the scenario we faced and I fixed with that commit.
Due the scenario replication complexity if you can help us to create the test case it whould be great!

@ryzhyk
Copy link

ryzhyk commented Jul 20, 2023

Thanks for the detailed summary, @simorex80 !

@ryzhyk
Copy link

ryzhyk commented Aug 25, 2023

@robjtede , just wondering if there's been any progress on this. Thanks!

simorex80 added a commit to simorex80/actix-web that referenced this issue Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-http project: actix-http C-bug Category: bug
Projects
None yet
8 participants