Allow background tasks to run with custom `BaseHTTPMiddleware`'s #1441

Kludex · 2022-01-28T14:44:12Z

Closes Improve Documentation on Writing Custom ASGI Middleware and BaseHTTPMiddleware #1029
Closes Background tasks don't work with middleware that subclasses BaseHTTPMiddleware #919
Closes Background tasks are cancelled if the client closes connection #1438

This is an old bug. There are some issues that talks about it.

I was debugging #1438, so I'm going to start with that one. Let me show you an example of this issue, retrieved from the mentioned issue:

import traceback

import anyio
from starlette.applications import Starlette
from starlette.background import BackgroundTasks
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import Response
from starlette.routing import Route


async def passthrough(request, call_next):
    return await call_next(request)


async def _sleep(identifier, delay):
    print(identifier, "started")
    try:
        await anyio.sleep(delay)
        print(identifier, "completed")
    except BaseException:
        print(identifier, "error")
        traceback.print_exc()
        raise


async def response_with_sleeps(request):
    background_tasks = BackgroundTasks()
    background_tasks.add_task(_sleep, "background task 1", 2)
    return Response(background=background_tasks)


app = Starlette(
    middleware=[
        Middleware(BaseHTTPMiddleware, dispatch=passthrough),
    ],
    routes=[
        Route("/", response_with_sleeps),
    ],
)

Running the above code, you'll be able to see:

Traceback (most recent call last):
  File "/home/marcelo/Development/./main.py", line 18, in _sleep
    await anyio.sleep(delay)
  File "/home/marcelo/anaconda3/envs/classify/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 69, in sleep
    return await get_asynclib().sleep(delay)
  File "/home/marcelo/anaconda3/envs/starlette3.8/lib/python3.8/asyncio/tasks.py", line 659, in sleep
    return await future
asyncio.exceptions.CancelledError

Looking at that, it led me to the culprit:

starlette/starlette/middleware/base.py

Line 65 in 2b54f42

task_group.cancel_scope.cancel()

I've removed that cancellation, and then I understood why it was there. We have a test that handles this case:

starlette/tests/middleware/test_base.py

Lines 148 to 160 in 2b54f42

    
           def test_fully_evaluated_response(test_client_factory): 
        
               # Test for https://github.com/encode/starlette/issues/1022 
        
               class CustomMiddleware(BaseHTTPMiddleware): 
        
                   async def dispatch(self, request, call_next): 
        
                       await call_next(request) 
        
                       return PlainTextResponse("Custom") 
        
               app = Starlette() 
        
               app.add_middleware(CustomMiddleware) 
        
               client = test_client_factory(app) 
        
               response = client.get("/does_not_exist") 
        
               assert response.text == "Custom"

The test is basically ignoring the StreamingResponse object returned by call_next. The problem here is that our application uses the send_stream (as you can see below), but there's no one to receive that stream, so it blocks. The idea of the above cancellation was to prevent this block.

starlette/starlette/middleware/base.py

Lines 27 to 38 in 2b54f42

    
           send_stream, recv_stream = anyio.create_memory_object_stream() 
        
           async def coro() -> None: 
        
               nonlocal app_exc 
        
               async with send_stream: 
        
                   try: 
        
                       await self.app(scope, request.receive, send_stream.send) 
        
                   except Exception as exc: 
        
                       app_exc = exc 
        
           task_group.start_soon(coro)

Ok. Now we solved an issue, but it led to another. What I thought:

As we're discarding a Response object, I'm going to consume the stream from that object.

And that is what you see on this PR.

Some references:

A question so I can test this:

How do I simulate the client disconnecting itself after receiving the response using the TestClient?

Kludex · 2022-01-28T14:45:52Z

starlette/middleware/base.py

+            if call_next_response and response is not call_next_response:
+                async with recv_stream:
+                    async for _ in recv_stream:
+                        ...


I'm just discarding the message coro has sent here.

Kludex · 2022-01-28T16:44:46Z

starlette/middleware/base.py

+                async with recv_stream:
+                    async for _ in recv_stream:
+                        ...  # pragma: no cover


I can create a function that performs this logic, if wanted.

jhominal · 2022-01-30T22:26:26Z

starlette/middleware/base.py

+            if call_next_response and response is not call_next_response:
+                async with recv_stream:
+                    async for _ in recv_stream:
+                        ...  # pragma: no cover
            await response(scope, receive, send)


It seems to me that we will wait for app to make all of its calls to send_stream.send before actually processing the response returned by dispatch_func. If that is the case, and given that all the messages from app are being discarded, I think it would be preferable, either to call await response(scope, receive, send) before draining recv_stream, or to put draining of recv_stream in another task of task_group.

Of course, I do not know the relevant technologies (ASGI/the server implementations/Starlette) as well as you do, so there could very well be a reason that either my analysis is wrong, or that my suggestions are unworkable.

Suggested change

if call_next_response and response is not call_next_response:

async with recv_stream:

async for _ in recv_stream:

... # pragma: no cover

await response(scope, receive, send)

if call_next_response and response is not call_next_response:

async def drain_stream():

async with recv_stream:

async for _ in recv_stream:

... # pragma: no cover

task_group.start_soon(drain_stream)

await response(scope, receive, send)

Kludex · 2022-05-19T18:34:51Z

Unfortunately, it looks like this PR introduces a regression.

I'll need to check what. For now, I'm going to close it while I investigate...

Kludex · 2022-06-21T05:48:10Z

Unfortunately, it looks like this PR introduces a regression.

I'll need to check what. For now, I'm going to close it while I investigate...

I don't remember what was the supposed regression... 😞

jhominal · 2022-06-21T19:22:59Z

I suspect that the change, as written, would fail in the case where someone returned a Response object that wraps the streaming content from call_next_response - e.g. if I wrap the body_iterator for debug purposes:

async def debug_streaming_middleware(request, call_next):
    call_next_response = await call_next(request)
    return StreamingResponse(
         status_code=call_next_response.status_code,
         headers=call_next_response.headers,
         content=log_content_length(call_next_response.body_iterator),
    )

async def log_content_length(bytes_iterator: AsyncIterator[bytes]):
     total_length = 0
     async for chunk in bytes_iterator:
          total_length += len(chunk)
          yield chunk
     logger.info(f"Body length: {total_length}")

In such a case, call_next_response.body_iterator would be consumed both by the new response, and by drain_stream.

I think that, in order to fix issues related to background tasks (#919 and #1438), it would be better to think about shielding background tasks from cancellation in order to ensure that they are run even if a client closes the connection.

adriangb · 2022-06-21T19:34:18Z

it would be better to think about shielding background tasks from cancellation in order to ensure that they are run even if a client closes the connection.

This has been proposed in #1654. It's not great to try to fix BaseHTTPMiddleware by adding code somewhere else, and shielding from any cancellation seems like it could open up the door for other issues down the road.

I think that a similar but better solution would be to run background tasks outside of the request/response cycle, e.g. by creating an anyio.TaskGroup on application startup, passing that task group down into the request/response and scheduling the BackgroundTask's in that TaskGroup. Of course this is something that users could implement themselves, maybe a writeup of how to do it somewhere would help?

Allow background tasks to run with custom BaseHTTPMiddleware's

149aa7c

Kludex commented Jan 28, 2022

View reviewed changes

Kludex marked this pull request as draft January 28, 2022 14:46

Ignore coverage for now

e161bae

Kludex commented Jan 28, 2022

View reviewed changes

Kludex requested a review from tomchristie January 28, 2022 16:44

Kludex marked this pull request as ready for review January 28, 2022 16:45

Kludex mentioned this pull request Jan 28, 2022

Background tasks are cancelled if the client closes connection #1438

Closed

2 tasks

Merge branch 'master' into fix/remove-cancel-scope

e1af17b

Kludex mentioned this pull request Jan 29, 2022

TestClient timeout simulates http.disconnect #1446

Closed

2 tasks

jhominal reviewed Jan 30, 2022

View reviewed changes

adriangb added the bug Something isn't working label Feb 2, 2022

Kludex added 2 commits February 8, 2022 10:48

Merge branch 'master' into fix/remove-cancel-scope

3011702

Merge branch 'master' into fix/remove-cancel-scope

20406e7

Kludex mentioned this pull request Mar 1, 2022

Background Tasks stuck other requests when i use a http middleware tiangolo/fastapi#4616

Closed

9 tasks

Merge branch 'master' into fix/remove-cancel-scope

8e6063d

Kludex mentioned this pull request Apr 18, 2022

middleware breaks behavior of background task tiangolo/fastapi#2215

Closed

Merge branch 'master' into fix/remove-cancel-scope

5843a57

adriangb mentioned this pull request Apr 25, 2022

Problems with await in async dependencies (when using dependencies and middleware simultaneously) tiangolo/fastapi#4719

Closed

9 tasks

Kludex added this to the Version 0.21.0 milestone Apr 26, 2022

Kludex and others added 2 commits May 8, 2022 21:24

Merge branch 'master' into fix/remove-cancel-scope

346f893

Merge branch 'master' into fix/remove-cancel-scope

470b31c

Kludex mentioned this pull request May 13, 2022

Seemingly random error RuntimeError: No Reponse returned. #1634

Closed

1 task

Kludex closed this May 19, 2022

Kludex mentioned this pull request Jun 20, 2022

Avoid unexpected background task cancellation #1699

Closed

1 task

Kludex reopened this Jun 21, 2022

Kludex closed this Jun 30, 2022

jhominal mentioned this pull request Jul 1, 2022

Replace task cancellation in BaseHTTPMiddleware with http.disconnect+recv_stream.close #1715

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow background tasks to run with custom `BaseHTTPMiddleware`'s #1441

Allow background tasks to run with custom `BaseHTTPMiddleware`'s #1441

Kludex commented Jan 28, 2022 •

edited

Kludex Jan 28, 2022

Kludex Jan 28, 2022

jhominal Jan 30, 2022 •

edited

Kludex commented May 19, 2022

Kludex commented Jun 21, 2022

jhominal commented Jun 21, 2022

adriangb commented Jun 21, 2022 •

edited

	def test_fully_evaluated_response(test_client_factory):
	# Test for https://github.com/encode/starlette/issues/1022
	class CustomMiddleware(BaseHTTPMiddleware):
	async def dispatch(self, request, call_next):
	await call_next(request)
	return PlainTextResponse("Custom")

	app = Starlette()
	app.add_middleware(CustomMiddleware)

	client = test_client_factory(app)
	response = client.get("/does_not_exist")
	assert response.text == "Custom"

	send_stream, recv_stream = anyio.create_memory_object_stream()

	async def coro() -> None:
	nonlocal app_exc

	async with send_stream:
	try:
	await self.app(scope, request.receive, send_stream.send)
	except Exception as exc:
	app_exc = exc

	task_group.start_soon(coro)

Allow background tasks to run with custom BaseHTTPMiddleware's #1441

Allow background tasks to run with custom BaseHTTPMiddleware's #1441

Conversation

Kludex commented Jan 28, 2022 • edited

Kludex Jan 28, 2022

Choose a reason for hiding this comment

Kludex Jan 28, 2022

Choose a reason for hiding this comment

jhominal Jan 30, 2022 • edited

Choose a reason for hiding this comment

Kludex commented May 19, 2022

Kludex commented Jun 21, 2022

jhominal commented Jun 21, 2022

adriangb commented Jun 21, 2022 • edited

Allow background tasks to run with custom `BaseHTTPMiddleware`'s #1441

Allow background tasks to run with custom `BaseHTTPMiddleware`'s #1441

Kludex commented Jan 28, 2022 •

edited

jhominal Jan 30, 2022 •

edited

adriangb commented Jun 21, 2022 •

edited