Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

Closed
wants to merge 11 commits into from

Conversation

erewok
Copy link
Contributor

@erewok erewok commented Aug 1, 2020

There is one issue are currently two issues with the call_next method inside BaseHTTPMiddleware:

This PR addresses the latter. Here's a gist which provides an example of running various tests against this branch.

Important Notes

  • This change limits queue maxsize to 1 (thanks for the suggestion, @florimondmanca)
  • The body_stream function now explicitly checks whether we are wrapping a StreamingResponse and iterates if so or completes the response if not.
  • In the straightforward case, the None added to the queue is only needed in case of exception, so we opt to complete the response and leave the None on the queue (can be seen in queue._unfinished_tasks attribute).
  • Check task.result() only for raised exceptions.

Other Notes

  • Added task_done calls so that the queue would increment or decrement _unfinished_tasks as a sanity check. It's not required for this to work.

  • Added a streaming and a broken-streaming test for tests/middleware/test_base.py.

Closes #919, #1012

Also separate streaming response in base http middleware from background
starlette/middleware/base.py Outdated Show resolved Hide resolved
@erewok
Copy link
Contributor Author

erewok commented Aug 1, 2020

I wrote this for Python3.8. I'll amend for 3.6 and 3.7.

starlette/middleware/base.py Outdated Show resolved Hide resolved
starlette/middleware/base.py Outdated Show resolved Hide resolved
starlette/middleware/base.py Outdated Show resolved Hide resolved
starlette/middleware/base.py Outdated Show resolved Hide resolved
starlette/middleware/base.py Outdated Show resolved Hide resolved
@erewok
Copy link
Contributor Author

erewok commented Aug 2, 2020

Thanks, @JayH5 for the review!

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great digging, thanks! The solutions you've found make sense to me. Just nagging a bit in favor of clearly separating in two dedicated PRs that we could review, merge and keep in the history independently…

starlette/middleware/base.py Outdated Show resolved Hide resolved
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me on the queue fix. Again if submitted separately we could merge it right away, but I'll leave that to your appreciation. :)

As for background tasks, one last item I'm not 100% sure about is that it's not guaranteed by the middleware anymore that the app task will be awaited. Previously we always called task.result() (which AFAIK essentially equals to doing await task). But now we don't always do it.

So for example, I'm wondering if asyncio wouldn't raise a "task was not awaited" warning in case of a streaming response w/ a background task. Do you have thoughts? I don't know if we should then pass an additional background task to the constructed StreamingResponse that calls await task, to ensure that the task is properly awaited...

@erewok
Copy link
Contributor Author

erewok commented Aug 3, 2020

As for background tasks, one last item I'm not 100% sure about is that it's not guaranteed by the middleware anymore that the app task will be awaited. Previously we always called task.result() (which AFAIK essentially equals to doing await task). But now we don't always do it.

So for example, I'm wondering if asyncio wouldn't raise a "task was not awaited" warning in case of a streaming response w/ a background task. Do you have thoughts? I don't know if we should then pass an additional background task to the constructed StreamingResponse that calls await task, to ensure that the task is properly awaited...

This is the part I spent the most time thinking about in looking at this function. One point is that task.result() won't actually await the result of the task: if it's called before the task has completed, an InvalidStateError will be thrown and if it's called after a task has been cancelled, it will raise a CancelledErrror. The return value wasn't being used before, so it was really present in the code to reraise any exceptions thrown, but the existing function was relying on the fact that the task must have completed already because the loop inside body_stream needed a None val to stop, and this would only be present after the app had been fully awaited.

In this new version, of course, we're relying on the fact that if there is background present, then the app must not have completed because we want to get through the body_stream first and then hand off control. We're also limiting the queue size, which prevents the await self.app from making progress until the body has been dealt with completely. I guess this is why I tend to connect these two issues, queue size and background; it seems like they complicate each other.

I did test a scenario of streaming with a background task (if you look at the gist, you can see the examples I stole from the docs and the issue reported on this), but I never had it throw "task not awaited." I was also looking for cancelled coroutines. It still seems odd to me, though, to schedule a task and then never await it.

As for dealing more explicitly with the background, I think it would be pretty involved. The Response classes are calling await on the background tasks after they call send, so there would need to be some way to listen for or interrupt those background task await calls and send them to the wrapped response instead, but all the solutions I can think of to do that are pretty invasive. There isn't an easy way to capture those background tasks in something like a queue in the same way that the middleware is doing for the response bodies by swapping out the send function.

Because the changes here are small and because my own tests were successful, I got over my discomfort with not explicitly awaiting the task, but I sympathize with that perspective.

I will keep trying to break this code.

@erewok
Copy link
Contributor Author

erewok commented Aug 3, 2020

@florimondmanca I think you're right about separating these: it would be good to continue testing the proposed changes included here around fixing the background but the other would be good to address immediately. I created a new PR with only maxsize kwarg added: #1018

@erewok
Copy link
Contributor Author

erewok commented Aug 4, 2020

I have tested this code in a variety of scenarios and inspected the tasks remaining after constructing responses using asyncio.Task.all_tasks() and I believe this PR is ready.

@erewok erewok changed the title Limit queue size in BaseHTTPMiddleware for backpressure; run Background last Fix Streaming with BaseHTTPMiddleware forces background to run after response completes Aug 5, 2020
@erewok erewok changed the title Fix Streaming with BaseHTTPMiddleware forces background to run after response completes Streaming with BaseHTTPMiddleware: force background to run after response completes Aug 5, 2020
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fiddly area which is why I'm personally still a bit wary of merging...

Is there any way we can test that this resolves the "don't wait for background task to send the response" case?

tests/middleware/test_base.py Outdated Show resolved Hide resolved
@erewok
Copy link
Contributor Author

erewok commented Aug 6, 2020

@florimondmanca that's a fair request. I had intended to write a test that would demonstrate this behavior but I had overlooked it.

I have added a test now that will fail in master with the following:

        filepath = tmp_path / "background_test.txt"
        filepath.write_text("Test Start")
        response = client.get("/background-after-streaming?filepath={}".format(filepath))
        assert response.headers["Custom-Header"] == "Example"
        assert response.text == "1\n2\n3\n"
        with filepath.open() as fl:
>           assert fl.read() == "handler first"
E           AssertionError: assert 'background last' == 'handler first'
E             - handler first
E             + background last

tests/middleware/test_base.py:228: AssertionError

@tomchristie
Copy link
Member

I'll also try to take a look at this tomorrow. Sounds like it's been a fiddly one so would be good to get another pair of eyes on it before it goes in.

@erewok
Copy link
Contributor Author

erewok commented Aug 11, 2020

I think this PR is going in the same direction as the fix we had to revert in version 0.13.8 (see issue #1022, so I think it's probably a non-starter at this point.

In general, I am beginning to wonder if we shouldn't discourage usage of this particular middleware class while we try to come up with a new means for users to write middleware that works with request and response objects. Indeed, if users don't need a request or response in their middleware and if they can write middleware that handles the scope, receive, and send args, then I believe they'll probably have more success writing middleware similar to other examples in the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Background tasks don't work with middleware that subclasses BaseHTTPMiddleware
4 participants