Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background tasks don't work with middleware that subclasses BaseHTTPMiddleware #919

Closed
retnikt opened this issue Apr 28, 2020 · 42 comments · Fixed by #1715
Closed

Background tasks don't work with middleware that subclasses BaseHTTPMiddleware #919

retnikt opened this issue Apr 28, 2020 · 42 comments · Fixed by #1715

Comments

@retnikt
Copy link
Contributor

retnikt commented Apr 28, 2020

When using background tasks with middleware, requests are not processed until the background task has finished.

  1. Use the example below
  2. Make several requests in a row - works as expected
  3. Uncomment app.add_middleware(TransparentMiddleware) and re-run
  4. Make several requests in a row - subsequent ones are not processed until the 10 second sleep has finished (the first request returns before then though).

The same behaviour occurs with asyncio.sleep (async) and time.sleep (sync, run in threadpool)

import asyncio
import uvicorn
from starlette.applications import Starlette
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.background import BackgroundTask
from starlette.responses import JSONResponse


class TransparentMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        # simple middleware that does absolutely nothing
        response = await call_next(request)
        return response


app = Starlette(debug=True)
# uncomment to break
# app.add_middleware(TransparentMiddleware)


@app.route("/")
async def test(_):
    task = BackgroundTask(asyncio.sleep, 10)
    return JSONResponse("hello", background=task)


if __name__ == '__main__':
    uvicorn.run(app)
@JayH5
Copy link
Member

JayH5 commented Apr 29, 2020

What versions of Python, Starlette, and Uvicorn are you using? I'm not able to recreate with Python 3.8.2, Starlette 0.13.3, and Uvicorn 0.11.5.

@retnikt
Copy link
Contributor Author

retnikt commented Apr 30, 2020

Python 3.8.2, Uvicorn 0.11.4, Starlette 0.13.2. I'll try updating to see if that makes any difference.

@retnikt
Copy link
Contributor Author

retnikt commented Apr 30, 2020

Nope, now on Starlette 0.13.3 and Uvicorn 0.11.5, but I'm still getting the same behaviour. What OS are you on? I'm using Arch Linux

@ariefprabowo
Copy link

I have the same issue (Windows 10, Python 3.7, Starlette 0.13.4, Uvicorn 0.11.5). Is there any workaround?

@retnikt
Copy link
Contributor Author

retnikt commented Jun 16, 2020

@ariefprabowo You can use asyncio.create_task(coro)

@ariefprabowo
Copy link

Thanks for the hint @retnikt, will try that.

@schmidtc
Copy link

Having the same issue. My background task is blocking and needs to run in the thread_pool. Everything works as expected without middleware, but adding middleware causes some requests to block.

@erewok
Copy link
Contributor

erewok commented Jul 29, 2020

If I run this sample and I make a number of requests with wrk with five connections, I see the first five return immediately, and then the remaining requests all hang until wrk ceases after which I see the following exceptions (one for each connection) all stating that the handler is closed:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/miniconda3/envs/starlette/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/miniconda3/envs/starlette/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "./starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "./starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "./starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "./starlette/middleware/base.py", line 26, in __call__
    await response(scope, receive, send)
  File "./starlette/responses.py", line 228, in __call__
    await run_until_first_complete(
  File "./starlette/concurrency.py", line 18, in run_until_first_complete
    [task.result() for task in done]
  File "./starlette/concurrency.py", line 18, in <listcomp>
    [task.result() for task in done]
  File "./starlette/responses.py", line 225, in stream_response
    await send({"type": "http.response.body", "body": b"", "more_body": False})
  File "./starlette/middleware/errors.py", line 156, in _send
    await send(message)
  File "/miniconda3/envs/starlette/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 510, in send
    self.transport.write(body)
  File "uvloop/handles/stream.pyx", line 673, in uvloop.loop.UVStream.write
  File "uvloop/handles/handle.pyx", line 159, in uvloop.loop.UVHandle._ensure_alive
RuntimeError: unable to perform operation on <TCPTransport closed=True reading=False 0x7fdb3008d990>; the handler is closed

Only five requests are successfully handled in this case.

However, if I modify the TransparentMiddleware class given here to have a simple __call__ instead of dispatch, there are no exceptions and it will return 2,500 requests per second without dropping any:

class TransparentMiddleware(BaseHTTPMiddleware):
    async def __call__(self, scope, receive, send) -> None:
         if scope["type"] != "http":
             await self.app(scope, receive, send)
             return

         await self.app(scope, receive, send)

More interesting, if I add some logging calls, we can see when things are firing. First, in the exceptions case:

class TransparentMiddleware(BaseHTTPMiddleware):
    # async def __call__(self, scope, receive, send) -> None:
    #     if scope["type"] != "http":
    #         await self.app(scope, receive, send)
    #         return

    #     log.info("__calling__ the middlewares!")
    #     await self.app(scope, receive, send)

    async def dispatch(self, request, call_next):
        # simple middleware that does absolutely nothing
        log.info("dispatching!")
        response = await call_next(request)
        return response

async def some_sleeper():
    log.info("sleeping!")
    await asyncio.sleep(10)
    log.info("waking up now!")

@app.route("/")
async def test(_):
    task = BackgroundTask(some_sleeper)
    return JSONResponse("hello", background=task)


if __name__ == '__main__':
    uvicorn.run(app, host="0.0.0.0", port=8000)

Results in the following:

❯ python -m starlette_middleware_background
INFO:     Started server process [53410]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     dispatching!
INFO:     dispatching!
INFO:     sleeping!
INFO:     sleeping!
INFO:     dispatching!
INFO:     dispatching!
INFO:     dispatching!
INFO:     sleeping!
INFO:     sleeping!
INFO:     sleeping!
INFO:     127.0.0.1:59302 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:59303 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:59304 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:59305 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:59306 - "GET / HTTP/1.1" 200 OK
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
ERROR:    Exception in ASGI application
Traceback (most recent call last):
...[EXCEPTIONS: one per connection]...

In the no-exceptions case things are interleaved in a manner closer to what I would have expected:

INFO:     __calling__ the middlewares!
INFO:     127.0.0.1:59324 - "GET / HTTP/1.1" 200 OK
INFO:     sleeping!
INFO:     __calling__ the middlewares!
INFO:     127.0.0.1:59321 - "GET / HTTP/1.1" 200 OK
INFO:     sleeping!
INFO:     __calling__ the middlewares!
INFO:     127.0.0.1:59325 - "GET / HTTP/1.1" 200 OK
INFO:     sleeping!
INFO:     waking up now!
INFO:     __calling__ the middlewares!
INFO:     127.0.0.1:59325 - "GET / HTTP/1.1" 200 OK
INFO:     sleeping!
INFO:     __calling__ the middlewares!
INFO:     127.0.0.1:59325 - "GET / HTTP/1.1" 200 OK
INFO:     sleeping!
...
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
INFO:     waking up now!
...

The difference in behavior comes down to this call_next function.

@schmidtc
Copy link

As written the call_next function starts a StreamingResponse, but dose not exit until the app being wrapped has completed. This includes background tasks. Some clients such as curl will close the connection as soon as they receive Content-Length bytes, which makes it appear as if the first few responses are returning quickly. curl --ignore-content-length and disabling Keep-Alive can help debug this.

I'm not an expert in asyncio or ASGI, so I'm not sure if this fix has unintended consequences....

Changing:

async def body_stream() -> typing.AsyncGenerator[bytes, None]:
while True:
message = await queue.get()
if message is None:
break
assert message["type"] == "http.response.body"
yield message.get("body", b"")
task.result()

To:

        async def body_stream() -> typing.AsyncGenerator[bytes, None]:
            while True:
                message = await queue.get()
                if message is None:
                    break
                assert message["type"] == "http.response.body"
                yield message.get("body", b"")
                if not message.get("more_body"):
                    break
            # task.result()

allows the StreamingResponse to finish sending data to the client while the task is run in the background.

@erewok
Copy link
Contributor

erewok commented Jul 29, 2020

The notable thing to me is that dispatch in the BaseHTTPMiddleware is using a StreamingResponse and building this tasks-and-queues thing whereas none of the other middleware provided by starlette (examples include authentication, error-handling, etc.) use this same pattern. Why is that? These other classes are all simple classes with a __call__ method defined: no StreamingResponse included.

In my own projects, I've defined a handful of middleware classes, but I haven't implemented dispatch or call_next and I haven't inherited from BaseHTTPMiddleware: I have always naively built mine like the other provided middleware by writing a __call__ method.

@retnikt can you try to implement your middleware with a __call__ method and see if it "fixes" the problem (not a real fix, just a validation that we're seeing the same issue)?

I think at the very least that the order of execution is inconsistent between these two implementations. I'm also not sure why the BaseHTTPMiddleware is the only one to offer streaming by default.

@retnikt
Copy link
Contributor Author

retnikt commented Jul 29, 2020

@erewok yes, this issue doesn't happen with __call__. BTW I think I mentioned that in the OP.

1 similar comment
@retnikt
Copy link
Contributor Author

retnikt commented Jul 29, 2020

@erewok yes, this issue doesn't happen with __call__. BTW I think I mentioned that in the OP.

@AndreyKis
Copy link

AndreyKis commented Aug 5, 2020

I have a similar issue in my current project I'm working on.

We are using BackgroundTask for one long operation we have. Lately, I tried to add Middleware inherited from BaseHTTPMiddleware, which is supposed to open SQLAlchemy session, inject it into request and close on request processing finished.

After doing so, I had BackgroundTask functionality broken and frontend waiting for it to finish before realising the request is done.

I saw a reference to current issue in this MR. And it looked like it could solve this issue.

Unfortunately, until it is merged I had to come up with the solution. And, basically, I tried using new version of BaseHTTPMiddleware from that MR, adding my session logic into its dispatch function.
And the issue got solved for me. Background task is running in the background and middleware is doing what it is expected to do.

So for anyone interested in this issue solved, this should happen after that MR gets reviewed and merged :)

Huge thanks to @erewok as this issue appeared to be quite of a blocker. Because it'd be hard for me to use __call__ function as I couldn't modify request.state value there.

@dmig-alarstudios
Copy link

Somehow hypercorn is a workaround for this issue.

@erewok
Copy link
Contributor

erewok commented Aug 12, 2020

Hi all, I had a PR to fix this behavior (#1017 ) but it relied on forcing the response to remain unevaluated, which further confused those users who were relying on having a fully-evaluated response inside their middleware functions (#1022 ). In other words, there are contradictory goals here.

After looking at this BaseHTTPMiddleware class over the last week, I have come to the following conclusions:

  • We should discourage the use of BaseHTTPMiddleware in favor of raw ASGI middleware which includes a method def __call__(self, scope, receive, send):, and

  • If people still want to use this middleware class, it should never be used with StreamingResponse/FileResponse endpoints.

Unfortunately, while the BaseHTTPMiddleware class offers an easier interface because it provides a request argument and promises something like a response, this middleware class also encourages users to think about the asgi app functionality in a "complete" or "finished" way. This means this class will either load the entirety of streaming requests into memory (#1012 ) and run the background before returning the response (this issue), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic.

Again, these problems should be absent if you avoid subclassing BaseHTTPMiddleware.

Lastly, I think it's worth leaving this issue open so that other users who have the same problem can see it and see that it remains an open issue.

@dmig-alarstudios
Copy link

dmig-alarstudios commented Aug 12, 2020

@erewok thanks for your efforts!

But how raw ASGI middleware may solve the issue?

@erewok
Copy link
Contributor

erewok commented Aug 12, 2020

@dmig-alarstudios see @retnikt's comment above to that effect:

@erewok yes, this issue doesn't happen with call. BTW I think I mentioned that in the OP

@retnikt
Copy link
Contributor Author

retnikt commented Aug 12, 2020

IMO this isn't really (just) an issue with BaseHTTPMiddleware, rather with how background tasks sure implemented; this should be fixed by making them either:

  • detached from a Response by using asyncio.create_task or similar, so they actually run in the background
  • or, renamed to something more appropriate to reflect what they're actually for (i.e. for cleaning up after a request is handled)

@erewok
Copy link
Contributor

erewok commented Aug 12, 2020

Yes, it is a broader issue with background tasks but that also implies that something should be done differently with how concurrency is being managed in the app. @retnikt see this comment for further discussion.

@dmig-alarstudios
Copy link

Shame for me: never thought about using asyncio.create_task.

But I'm still wondering: running the same testcase under hypercorn solves this issue, but why? Is it a hypercorn bug?

@AndreyKis
Copy link

@erewok Thank you for your work on this.

After your suggestion on using raw ASGI middleware, link to AuthenticationMiddleware as example and a bit of code digging, I think I found a way to do what I was intending with raw ASGI middleware.

Basically, for anyone, who is trying to inject something into request and perform some cleanup on that injected instance after request has been finished, you can do this by placing your instance inside scope variable and accessing it through request["your_instance_name"]. This is allowed thanks to scope being MuttableMapping and HttpConnection having __getitem__ method

I only hope that wrapping the call of an app inside middleware inside try-finally doesn't break anything.
Screenshot 2020-08-13 at 10 34 39

@erewok erewok changed the title Background tasks don't work with simple dispatched middleware Background tasks don't work with middleware that subclasses BaseHTTPMiddleware Aug 13, 2020
@jacopofar
Copy link

Yes, I can reproduce it with the latest version 0.70.0.

If I enable the Gzip middleware:

from fastapi.middleware.gzip import GZipMiddleware
app.add_middleware(GZipMiddleware, minimum_size=1000)

then running a BackgroundTasks blocks the responses until the background task finishes. If I remove the middleware the background task works as expected and HTTP responses are returned immediately without waiting for it

@kigawas
Copy link

kigawas commented Dec 9, 2021

Finally found a workaround by exploiting run_in_executor:

from asyncio import get_event_loop
from concurrent.futures import ThreadPoolExecutor
from functools import partial

def run_sync_code(task, *args, **kwargs):
    executor = ThreadPoolExecutor()
    loop = get_event_loop()
    loop.run_in_executor(executor, partial(task, *args, **kwargs))

Then in your favorite FastAPI endpoint:

@app.post("/debug/background_task", tags=["debug"])
async def __debug_background_task():
    def sleep_task(*args, **kwargs):
        print("start sleep....", args, kwargs)
        time.sleep(5)
        print("end sleep....")

    run_sync_code(sleep_task, 1, 2, a="a", b="b")

@zoola969
Copy link

I got this bug after updating fastapi 0.67.0 -> 0.70.0

This code worked on my project with few midllewares, but after migrating to newer version, "im alive" is not printing in my logs

async def task():
    logger.info("im a task")
    await sleep(1)
    logger.info("im alive")


@app.get("/")
async def root(background_tasks: BackgroundTasks):
    background_tasks.add_task(task)
    return {"status": "ok"}

It happens even i use only one dummy middleware:

class DM(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        return await call_next(request)

After removing all middlewares this code works well as expected

I tried to reproduce this on new pure project, but i could not.

@mohdnr
Copy link

mohdnr commented Dec 22, 2021

If anyone finds this and is trying to add headers to responses this is how i ended up doing it using the information from the above discussions.

from starlette.types import ASGIApp, Message, Receive, Scope, Send
from starlette.datastructures import MutableHeaders

# Custom class required to add headers in-order to not block background tasks
# https://github.com/encode/starlette/issues/919


class SecurityHeadersMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        async def send_wrapper(message: Message) -> None:
            if message["type"] == "http.response.start":
                headers = MutableHeaders(scope=message)
                headers[
                    "Strict-Transport-Security"
                ] = "max-age=63072000; includeSubDomains; preload"
            await send(message)

        await self.app(scope, receive, send_wrapper)

@maksimt
Copy link

maksimt commented May 25, 2022

A related issue is that BackgroundTasks will get cancelled if you have a middleware and the caller sets the header Connection: close.

# Python 3.8.13
import asyncio
import fastapi  # fastapi==0.75.2
import time
import uvicorn  # uvicorn==0.17.6

import logging

logging.basicConfig(format='%(asctime)s.%(msecs)03dZ [%(levelname)s] %(message)s')
log = logging.getLogger()

app = fastapi.FastAPI()


@app.middleware("http")
async def empty_middleware(request: fastapi.Request, call_next):
    response = await call_next(request)
    return response


@app.get("/")
async def the_route(background_tasks: fastapi.BackgroundTasks):
    background_tasks.add_task(a_little_work)
    # background_tasks.add_task(shielded_work) <-- this one completes

    return {'status': 'ok'}


async def a_little_work(prefix='unshielded'):
    start = time.monotonic()
    log.warning(f"{prefix} background task starts!")
    await asyncio.sleep(0.5)
    end = time.monotonic()
    # this statement never gets printed if the Connection header is set to close:
    log.warning(f"{prefix} background task done, took {end-start}s")


async def shielded_work():
    import asyncio
    await asyncio.shield(a_little_work('shielded'))


if __name__ == '__main__':
    uvicorn.run(app, host="0.0.0.0", port=8000)

@vanishingESCkey
Copy link

I discovered same issue when trying to upgrade my app from fastapi 0.68 to 0.70. Interesting fact is that background tasks work fine on my workstation Ubuntu 20.04 but fail on server with CentOS 7 (same code, python 3.9 & modules versions).

It looks that best solution is to deprecate BaseHTTPMiddleware like mentioned in #1678

@jhominal
Copy link
Member

jhominal commented Jul 7, 2022

I propose that this issue be closed - as far as I can tell, the issues reported here are stale, invalid, or duplicates of other issues:

Here is how I think we should proceed on this issue:

@jhominal jhominal closed this as not planned Won't fix, can't repro, duplicate, stale Jul 7, 2022
@Kludex
Copy link
Sponsor Member

Kludex commented Jul 8, 2022

The GZipMiddleware issue was related to the BaseHTTPMiddleware, but fixed already. 🙏

The other, as you said. 🙌

sam-causalens added a commit to causalens/dara that referenced this issue Feb 6, 2024
* The usage of BackgroundTasks in the new execute_action code was found to be causing blockages of new requests being picked up by the event loop. I think this maybe a reccurance of encode/starlette#919 which they claim is resolved, but still doesn't appear to be working for us. It may be due to one of our middlewares behaving in a particular way that causes this, but the fix applied here appears to be the simplest fix for now.
* In this instance we have replaced the usage of background_tasks.add_task with asyncio.create_task that will run the task in the loop without worrying about its result.
sam-causalens added a commit to causalens/dara that referenced this issue Feb 6, 2024
* The usage of BackgroundTasks in the new execute_action code was found to be causing blockages of new requests being picked up by the event loop. I think this maybe a reccurance of encode/starlette#919 which they claim is resolved, but still doesn't appear to be working for us. It may be due to one of our middlewares behaving in a particular way that causes this, but the fix applied here appears to be the simplest fix for now.
* In this instance we have replaced the usage of background_tasks.add_task with asyncio.create_task that will run the task in the loop without worrying about its result.
krzysztof-causalens pushed a commit to causalens/dara that referenced this issue Feb 6, 2024
* Fix: DOS-625 Address action execution blocking new requests

* The usage of BackgroundTasks in the new execute_action code was found to be causing blockages of new requests being picked up by the event loop. I think this maybe a reccurance of encode/starlette#919 which they claim is resolved, but still doesn't appear to be working for us. It may be due to one of our middlewares behaving in a particular way that causes this, but the fix applied here appears to be the simplest fix for now.
* In this instance we have replaced the usage of background_tasks.add_task with asyncio.create_task that will run the task in the loop without worrying about its result.

* Fix linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment