refactor: check endpoint handler is async only once #2536

sk- · 2024-03-07T20:35:03Z

Summary

We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async).

Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice AwaitableCallable[Any], which end up swallowing the other types in the union.

Benchmarking

We use a simple json app, with both a sync and async endpoint, and the wrk tool to get the measurements. WE also show the profile shown by the snakeviz tool.

The measurements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1

Before

$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   733.77us   55.57us   3.21ms   78.35%
    Req/Sec     6.84k   147.06     7.15k    87.13%
  137474 requests in 10.10s, 18.35MB read
Requests/sec:  13610.69
Transfer/sec:      1.82MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   717.14us   49.05us   1.83ms   71.11%
    Req/Sec     7.00k   112.14     7.36k    76.24%
  140613 requests in 10.10s, 18.77MB read
Requests/sec:  13922.97
Transfer/sec:      1.86MB

After

$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   721.34us  202.40us  11.13ms   99.32%
    Req/Sec     7.01k   230.04     7.62k    94.00%
  139558 requests in 10.00s, 18.63MB read
Requests/sec:  13956.14
Transfer/sec:      1.86MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   706.04us  109.90us   7.46ms   98.30%
    Req/Sec     7.12k   136.09     7.39k    90.59%
  143188 requests in 10.10s, 19.12MB read
Requests/sec:  14176.95
Transfer/sec:      1.89MB

Test App

The app used for the test is as follows

from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
import uvicorn

async def async_page(request):
    return JSONResponse({'status': 'ok'})

async def sync_page(request):
    return JSONResponse({'status': 'ok'})

app = Starlette(routes=[
    Route('/async', async_page),
    Route('/sync', sync_page),
])

if __name__ == "__main__":
    uvicorn.run("app:app", port=8000, log_level="critical")

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

The two points above are not checked as this is just refactoring code, and it does not require any new tests or docs. If required I could add tests for the added helper function _to_async.

We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async). Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice `AwaitableCAllable[Any]`, which end up swallowing the other types in the union. Benchmarking We use a simple json app, with both a sync and async endpoint, and the wrk tool to get the measurements. The measuerements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1 Before ``` $ wrk http://localhost:8000/sync Running 10s test @ http://localhost:8000/sync 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 733.77us 55.57us 3.21ms 78.35% Req/Sec 6.84k 147.06 7.15k 87.13% 137474 requests in 10.10s, 18.35MB read Requests/sec: 13610.69 Transfer/sec: 1.82MB $ wrk http://localhost:8000/async Running 10s test @ http://localhost:8000/async 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 717.14us 49.05us 1.83ms 71.11% Req/Sec 7.00k 112.14 7.36k 76.24% 140613 requests in 10.10s, 18.77MB read Requests/sec: 13922.97 Transfer/sec: 1.86MB ```` After ``` $ wrk http://localhost:8000/sync Running 10s test @ http://localhost:8000/sync 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 721.34us 202.40us 11.13ms 99.32% Req/Sec 7.01k 230.04 7.62k 94.00% 139558 requests in 10.00s, 18.63MB read Requests/sec: 13956.14 Transfer/sec: 1.86MB $ wrk http://localhost:8000/async Running 10s test @ http://localhost:8000/async 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 706.04us 109.90us 7.46ms 98.30% Req/Sec 7.12k 136.09 7.39k 90.59% 143188 requests in 10.10s, 19.12MB read Requests/sec: 14176.95 Transfer/sec: 1.89MB ``` The app used for the test is as follows ```python from starlette.applications import Starlette from starlette.responses import JSONResponse from starlette.routing import Route import uvicorn async def async_page(request): return JSONResponse({'status': 'ok'}) async def sync_page(request): return JSONResponse({'status': 'ok'}) app = Starlette(routes=[ Route('/async', async_page), Route('/sync', sync_page), ]) if __name__ == "__main__": uvicorn.run("app:app", port=8000, log_level="critical") ```

Kludex

I'm not sure if those changes are worth it, but...

starlette/routing.py

sk- · 2024-03-16T13:11:00Z

I'm not sure if those changes are worth it, but...

Could you share why you think the changes are not worth it?

Any improvements on removing blocking code would help all applications, as by removing unneeded operations we reduce the amount of blocking code, which affects (tail) latencies. This is a no brainer if the code in question is in fact unneeded because it's doing the same thing over an over again.

Kludex requested changes Mar 16, 2024

View reviewed changes

starlette/routing.py Outdated Show resolved Hide resolved

sk- and others added 2 commits March 21, 2024 16:30

Apply PR suggestion

ab53a16

Merge branch 'master' into refactor/check_async_once

cd37ae2

Kludex approved these changes Apr 20, 2024

View reviewed changes

Kludex enabled auto-merge (squash) April 20, 2024 08:24

Kludex merged commit 96c90f2 into encode:master Apr 20, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: check endpoint handler is async only once #2536

refactor: check endpoint handler is async only once #2536

sk- commented Mar 7, 2024 •

edited

Kludex left a comment

sk- commented Mar 16, 2024

refactor: check endpoint handler is async only once #2536

refactor: check endpoint handler is async only once #2536

Conversation

sk- commented Mar 7, 2024 • edited

Summary

Benchmarking

Before

After

Test App

Checklist

Kludex left a comment

Choose a reason for hiding this comment

sk- commented Mar 16, 2024

sk- commented Mar 7, 2024 •

edited