Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage streaming large responses #1012

Closed
via opened this issue Jul 27, 2020 · 8 comments · Fixed by #1018
Closed

Memory usage streaming large responses #1012

via opened this issue Jul 27, 2020 · 8 comments · Fixed by #1018
Labels
bug Something isn't working

Comments

@via
Copy link

via commented Jul 27, 2020

We've been running into memory issues when providing very large async generators to a streaming response. We have these generators producing large (larger than memory set) responses in a way that allows us to only keep small chunks in memory at a time. However, it looks like the BaseHTTPMiddleware implementation uses an asyncio queue to store the individual chunks:

https://github.com/encode/starlette/blob/master/starlette/middleware/base.py#L30

This prevents any network backpressure handling -- if the client that is receiving the streaming response is on a slow connection, the queue will happily grow without bound and consume all memory, triggering kernel out-of-memory, when the ideal handling here would be for send to block (yield) when this happens. I believe this would naturally happen if there were no queue here at all, so I am wondering why it needs to be here?

Would a PR to remove the queueing be accepted?

If not, what is the appropriate way to override this to not use a queue? We can write our own, but the use of BaseHTTPMiddleware is hardcoded:

self.add_middleware(BaseHTTPMiddleware, dispatch=func)
, leaving only some fairly hacky approaches to preventing this queueing.

@euri10
Copy link
Member

euri10 commented Jul 27, 2020 via email

@via
Copy link
Author

via commented Jul 27, 2020

I did see that ticket, but I do think its important to distinguish between the high level backpressure logic that I think that ticket is mostly about (specifically managing concurrent request limits, etc), and network backpressure. I do not believe that ticket is correct when it suggests that uvicorn/gnuicorn manages the low-level network socket backpressure, as the use of the asyncio.Queue there is completely in starlette, and will happily absorb the entire response payload until it dies.

@florimondmanca
Copy link
Member

florimondmanca commented Jul 31, 2020

I think queueing is fine, but the queue is unbounded which results in losing the backpressure properties provided by the server. (As you said, it’s allowed to grow forever even if the other side isn’t consuming chunks fast enough.)

In my mind setting a maxsize of eg 10 would resolve the memory issues to you’re seeing, because then if the client isn’t consuming chunks fast enough the app will block on send() until there’s room in the queue.

Edit: actually as discussed by @erewok in the chat, a maxsize of 1 should be just enough. This way we stick to the backpressure behavior of the server as close as we can, and I don't think there's a risk of deadlock. In any case, any fix would be worth validating on a real-life big upload too. :-)

@erewok
Copy link
Contributor

erewok commented Aug 1, 2020

For a bit of background, the BaseHTTPMiddleware offers a way for users to write middleware that takes a request and builds a response. This is a bit closer to the request-handling logic than other middleware, but in order to assemble a response, the BaseHTTPMiddleware needs some generic way to handle any potential response the handler would produce, which is the reason for the queue and streaming.

Also, as @florimondmanca mentioned above, @via If you have an easy opportunity to do so, you may consider experimentally running the branch from PR #1017 (branch: stream_response_bg) and seeing if your memory issues in the case of large responses are resolved. If you do find an opportunity to try it, please report back.

@erewok
Copy link
Contributor

erewok commented Aug 5, 2020

@via Starlette version 0.13.7 has been released to address this issue.

@via
Copy link
Author

via commented Aug 5, 2020

@erewok thank you for the information!

We have been using a similar change (queue size of 3), and were going to try the PR you referenced. We will just update and can report back. Thank you for the speedy responses.

@erewok
Copy link
Contributor

erewok commented Aug 13, 2020

Due to issues with unevaluated responses, we have elected to revert the maxsize fix (see PR #1028 ), and so I want to leave this issue open for now hopefully as guidance to anyone who encounters the same issue. A documentation update will be provided with this information soon as well.

As stated in the related issue on streaming with BaseHTTPMiddleware (#919):

  • We would like to discourage the use of BaseHTTPMiddleware in favor of raw ASGI middleware which includes a method def __call__(self, scope, receive, send):, and

  • If people still want to use this middleware class, it should never be used with StreamingResponse/FileResponse endpoints.

Unfortunately, while the BaseHTTPMiddleware class offers an easier interface because it provides a request argument and promises something like a response, this middleware class also encourages users to think about the asgi app functionality in a "complete" or "finished" way. This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic.

Again, these problems should be absent if you avoid subclassing BaseHTTPMiddleware.

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Feb 20, 2021
this is still not ideal, as the entire response will be loaded into
memory. This is a problem with streaming responses.
encode/starlette#1012 (comment)
is actually enlightening here:

> "This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (galaxyproject#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic."

Which is exactly what we'd be doing with a middleware, keeping resources
open for longer than necessary, which we need to avoid if we ever want
to run background / async tasks.  I think the solution here what we
already had, path operation dependencies.
mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Feb 20, 2021
this is still not ideal, as the entire response will be loaded into
memory. This is a problem with streaming responses.
encode/starlette#1012 (comment)
is actually enlightening here:

> "This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (galaxyproject#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic."

Which is exactly what we'd be doing with a middleware, keeping resources
open for longer than necessary, which we need to avoid if we ever want
to run background / async tasks.  I think the solution here what we
already had, path operation dependencies.
@JayH5
Copy link
Member

JayH5 commented Jun 23, 2021

I believe this should be fixed by #1157. We now use an AnyIO stream in BaseHTTPMiddleware that is not unbounded:

send_stream, recv_stream = anyio.create_memory_object_stream()

We've also avoided the problem described in #1022 (context).

JayH5 added a commit that referenced this issue Jun 23, 2021
@JayH5 JayH5 closed this as completed Jun 23, 2021
JayH5 added a commit that referenced this issue Jun 23, 2021
* Prepare version 0.15.0

* Remember to add a note about websocket_connect

* Add date and blurb to release notes

* Bump version to 0.15.0

* Add note about fixing #1012
sirex added a commit to atviriduomenys/spinta that referenced this issue Feb 28, 2022
After a lot of debugging, I dug out, that streaming wasn't working
because of BaseHTTPMiddleware, which it seems collects all the stream
into memory and then returns it all at once with the response.

That means, if you stream a lot of data, request will not give any
answer until all data is collected into memory. This can take time and
can result in read timeout or can simply run out of memory.

Streaming was the main reason, why I chose Starlette, and one and most
important thing didn't worked. Not good. But at least I found how to fix
it.

Related issues:

encode/starlette#1012
encode/starlette#472
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants