Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

erewok · 2020-08-01T13:38:02Z

There is one issue ~~are currently two issues~~ with the call_next method inside BaseHTTPMiddleware:

~~The queue is unbounded, which means that handlers returning StreamingResponses will have their responses loaded fully into the queue.~~ (Fixed in Fix high memory usage when using BaseHTTPMiddleware middleware classes and streaming responses #1018)
The StreamingResponse that BaseHTTPMiddleware creates to wrap the handler's response is waiting on the completion from the handler to stop streaming, which requires that any background tasks must be run before the response can be completed.

This PR addresses the latter. Here's a gist which provides an example of running various tests against this branch.

Important Notes

~~This change limits queue maxsize to 1 (thanks for the suggestion, @florimondmanca)~~
The body_stream function now explicitly checks whether we are wrapping a StreamingResponse and iterates if so or completes the response if not.
In the straightforward case, the None added to the queue is only needed in case of exception, so we opt to complete the response and leave the None on the queue (can be seen in queue._unfinished_tasks attribute).
Check task.result() only for raised exceptions.

Other Notes

~~Added task_done calls so that the queue would increment or decrement _unfinished_tasks as a sanity check. It's not required for this to work.~~
Added a streaming and a broken-streaming test for tests/middleware/test_base.py.

Closes #919, ~~#1012~~

Also separate streaming response in base http middleware from background

starlette/middleware/base.py

erewok · 2020-08-01T13:42:11Z

I wrote this for Python3.8. I'll amend for 3.6 and 3.7.

starlette/middleware/base.py

erewok · 2020-08-02T03:44:26Z

Thanks, @JayH5 for the review!

florimondmanca

Great digging, thanks! The solutions you've found make sense to me. Just nagging a bit in favor of clearly separating in two dedicated PRs that we could review, merge and keep in the history independently…

starlette/middleware/base.py

florimondmanca

Good for me on the queue fix. Again if submitted separately we could merge it right away, but I'll leave that to your appreciation. :)

As for background tasks, one last item I'm not 100% sure about is that it's not guaranteed by the middleware anymore that the app task will be awaited. Previously we always called task.result() (which AFAIK essentially equals to doing await task). But now we don't always do it.

So for example, I'm wondering if asyncio wouldn't raise a "task was not awaited" warning in case of a streaming response w/ a background task. Do you have thoughts? I don't know if we should then pass an additional background task to the constructed StreamingResponse that calls await task, to ensure that the task is properly awaited...

Resolved

erewok · 2020-08-03T04:15:39Z

As for background tasks, one last item I'm not 100% sure about is that it's not guaranteed by the middleware anymore that the app task will be awaited. Previously we always called task.result() (which AFAIK essentially equals to doing await task). But now we don't always do it.

So for example, I'm wondering if asyncio wouldn't raise a "task was not awaited" warning in case of a streaming response w/ a background task. Do you have thoughts? I don't know if we should then pass an additional background task to the constructed StreamingResponse that calls await task, to ensure that the task is properly awaited...

This is the part I spent the most time thinking about in looking at this function. One point is that task.result() won't actually await the result of the task: if it's called before the task has completed, an InvalidStateError will be thrown and if it's called after a task has been cancelled, it will raise a CancelledErrror. The return value wasn't being used before, so it was really present in the code to reraise any exceptions thrown, but the existing function was relying on the fact that the task must have completed already because the loop inside body_stream needed a None val to stop, and this would only be present after the app had been fully awaited.

In this new version, of course, we're relying on the fact that if there is background present, then the app must not have completed because we want to get through the body_stream first and then hand off control. We're also limiting the queue size, which prevents the await self.app from making progress until the body has been dealt with completely. I guess this is why I tend to connect these two issues, queue size and background; it seems like they complicate each other.

I did test a scenario of streaming with a background task (if you look at the gist, you can see the examples I stole from the docs and the issue reported on this), but I never had it throw "task not awaited." I was also looking for cancelled coroutines. It still seems odd to me, though, to schedule a task and then never await it.

As for dealing more explicitly with the background, I think it would be pretty involved. The Response classes are calling await on the background tasks after they call send, so there would need to be some way to listen for or interrupt those background task await calls and send them to the wrapped response instead, but all the solutions I can think of to do that are pretty invasive. There isn't an easy way to capture those background tasks in something like a queue in the same way that the middleware is doing for the response bodies by swapping out the send function.

Because the changes here are small and because my own tests were successful, I got over my discomfort with not explicitly awaiting the task, but I sympathize with that perspective.

I will keep trying to break this code.

erewok · 2020-08-03T04:27:14Z

@florimondmanca I think you're right about separating these: it would be good to continue testing the proposed changes included here around fixing the background but the other would be good to address immediately. I created a new PR with only maxsize kwarg added: #1018

erewok · 2020-08-04T15:37:40Z

I have tested this code in a variety of scenarios and inspected the tasks remaining after constructing responses using asyncio.Task.all_tasks() and I believe this PR is ready.

florimondmanca

It's a fiddly area which is why I'm personally still a bit wary of merging...

Is there any way we can test that this resolves the "don't wait for background task to send the response" case?

tests/middleware/test_base.py

erewok · 2020-08-06T15:23:56Z

@florimondmanca that's a fair request. I had intended to write a test that would demonstrate this behavior but I had overlooked it.

I have added a test now that will fail in master with the following:

        filepath = tmp_path / "background_test.txt"
        filepath.write_text("Test Start")
        response = client.get("/background-after-streaming?filepath={}".format(filepath))
        assert response.headers["Custom-Header"] == "Example"
        assert response.text == "1\n2\n3\n"
        with filepath.open() as fl:
>           assert fl.read() == "handler first"
E           AssertionError: assert 'background last' == 'handler first'
E             - handler first
E             + background last

tests/middleware/test_base.py:228: AssertionError

tests/middleware/test_base.py

tomchristie · 2020-08-06T19:25:13Z

I'll also try to take a look at this tomorrow. Sounds like it's been a fiddly one so would be good to get another pair of eyes on it before it goes in.

erewok · 2020-08-11T04:01:41Z

I think this PR is going in the same direction as the fix we had to revert in version 0.13.8 (see issue #1022, so I think it's probably a non-starter at this point.

In general, I am beginning to wonder if we shouldn't discourage usage of this particular middleware class while we try to come up with a new means for users to write middleware that works with request and response objects. Indeed, if users don't need a request or response in their middleware and if they can write middleware that handles the scope, receive, and send args, then I believe they'll probably have more success writing middleware similar to other examples in the codebase.

Limit queue size in base http middleware for backpressure

f178000

Also separate streaming response in base http middleware from background

erewok commented Aug 1, 2020

View reviewed changes

starlette/middleware/base.py Outdated Show resolved Hide resolved

erewok added 2 commits August 1, 2020 06:56

Fix import exception in python3.6 and python3.7

8684534

Handle case where first message is None

2150e01

erewok mentioned this pull request Aug 1, 2020

Memory usage streaming large responses #1012

Closed

JayH5 reviewed Aug 1, 2020

View reviewed changes

Incorporate PR review feedback.

e86aecd

florimondmanca previously requested changes Aug 2, 2020

View reviewed changes

starlette/middleware/base.py Outdated Show resolved Hide resolved

florimondmanca reviewed Aug 2, 2020

View reviewed changes

erewok mentioned this pull request Aug 3, 2020

Fix high memory usage when using BaseHTTPMiddleware middleware classes and streaming responses #1018

Merged

Pull queue size fix from master

f521491

erewok changed the title ~~Limit queue size in BaseHTTPMiddleware for backpressure; run Background last~~ Fix Streaming with BaseHTTPMiddleware forces background to run after response completes Aug 5, 2020

erewok changed the title ~~Fix Streaming with BaseHTTPMiddleware forces background to run after response completes~~ Streaming with BaseHTTPMiddleware: force background to run after response completes Aug 5, 2020

Merge branch 'master' into stream_response_bg

d61ba23

florimondmanca reviewed Aug 6, 2020

View reviewed changes

tests/middleware/test_base.py Outdated Show resolved Hide resolved

erewok added 2 commits August 6, 2020 08:17

Add test which demonstrates background runs afterward

67440c9

Fix import ordering

3253d56

florimondmanca reviewed Aug 6, 2020

View reviewed changes

tests/middleware/test_base.py Show resolved Hide resolved

erewok added 3 commits August 6, 2020 13:13

Reorganize and separate base middleware streaming tests

2ff0324

Merge branch 'master' into stream_response_bg

e7c3821

New version which should address issue 1022

8fa655d

erewok closed this Aug 11, 2020

erewok mentioned this pull request Aug 12, 2020

Background tasks don't work with middleware that subclasses BaseHTTPMiddleware #919

Closed

Kludex deleted the stream_response_bg branch August 11, 2022 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

erewok commented Aug 1, 2020 •

edited

erewok commented Aug 1, 2020

erewok commented Aug 2, 2020

florimondmanca left a comment

florimondmanca left a comment

erewok commented Aug 3, 2020

erewok commented Aug 3, 2020

erewok commented Aug 4, 2020

florimondmanca left a comment

erewok commented Aug 6, 2020

tomchristie commented Aug 6, 2020

erewok commented Aug 11, 2020

Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

Conversation

erewok commented Aug 1, 2020 • edited

Important Notes

Other Notes

erewok commented Aug 1, 2020

erewok commented Aug 2, 2020

florimondmanca left a comment

Choose a reason for hiding this comment

florimondmanca left a comment

Choose a reason for hiding this comment

erewok commented Aug 3, 2020

erewok commented Aug 3, 2020

erewok commented Aug 4, 2020

florimondmanca left a comment

Choose a reason for hiding this comment

erewok commented Aug 6, 2020

tomchristie commented Aug 6, 2020

erewok commented Aug 11, 2020

erewok commented Aug 1, 2020 •

edited