Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance issues introduced by 9db3e05 #235

Closed
Robin5605 opened this issue Apr 3, 2024 · 2 comments · Fixed by #246
Closed

Investigate performance issues introduced by 9db3e05 #235

Robin5605 opened this issue Apr 3, 2024 · 2 comments · Fixed by #246
Assignees

Comments

@Robin5605
Copy link
Contributor

Robin5605 commented Apr 3, 2024

This script runs sucessfully prior to 9db3e05

async def do_job(client: httpx.AsyncClient):
    # get a job
    res = await client.post("/jobs")
    j = res.json()[0]
    name, version = j["name"], j["version"]

    # submit results
    payload = {"name": name, "version": version, "reason": "Package too large"}
    res = await client.put("/package", json=payload)

async def main():
    client = httpx.AsyncClient(base_url="http://localhost:8000")
    
    async with asyncio.TaskGroup() as tg:
        for _ in range(100):
            tg.create_task(do_job(client))

However, running it with 9db3e05 checked out results in SQLAlchemy throwing a QueuePool limit of size 5 overflow 17 reached, connection timed out, timeout 30.00 error (background)

@Robin5605 Robin5605 changed the title Investigate performance issues introduced by [9db3e05](https://github.com/vipyrsec/dragonfly-mainframe/commit/9db3e05e6a728abd48d460eae13cda3d2545c5cf) Investigate performance issues introduced by 9db3e05 Apr 3, 2024
@jonathan-d-zhang
Copy link
Contributor

jonathan-d-zhang commented Apr 3, 2024

we get a deadlock caused by fastapi scheduling the dependency cleanup in the threadpool shared by endpoint functions. the reason we deadlock is that we have a limited number of database connections, c, a limited number of workers, w > c. when we get w incoming requests, all workers will work on the requests. c of those workers will run queries, the rest will block waiting for connections. when the first c requests end, they need to wait for their dependencies to get cleaned up. this adds an additional c tasks to the queue, for closing the connection. however, all the tasks in the queue are sleeping either waiting for a connection to be closed or waiting for a connection. therefore, deadlock.

This was discussed in this issue: tiangolo/fastapi#6628, leading to a PR: tiangolo/fastapi#5122, released in version 0.82.0, https://fastapi.tiangolo.com/release-notes/#0820. Though, we are on version 0.110.0.

As a stop-gap solution, we can simply manually close sessions ourselves, either using context managers or just .closeing them.

@Robin5605
Copy link
Contributor Author

Further investigation in the Discord server yields a peculiar issue relating to deadlocks. There are a handful of discussions, issues, and PRs that are related to this:

These being merged and closed implies that this issue was fixed - however this does not seem to be the case (at least for us). The issue, quite intuitive in retrospect, boils down to the following: we get some number of requests coming in (let's say 50), which is greater than the amount of internal thread pool workers (let's say 40 worker threads). All 40 worker threads start working on their requests, Only some of these will run queries (let's say we have 5 database connections). The rest of the 35 worker threads will block for a connection to become available. Once the first 5 requests are done with their queries, they schedule 5 tasks to close their sessions. However, since all workers in the threadpool are currently busy (either waiting for their cleanup tasks to run, or waiting for a database connection), the cleanup tasks cannot run - therefore, we deadlock.

The most straightforward solution is to not rely on FastAPI's dependency injection system for database connections, and simply make sessions ourselves when we need it and close it at the end of the path operation. This should ensure that we don't create a whole new task just for closing sessions that could possibly deadlock us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants