Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate strawberry performance using async + PyPy #104

Closed
cirospaciari opened this issue Feb 24, 2023 · 21 comments
Closed

Investigate strawberry performance using async + PyPy #104

cirospaciari opened this issue Feb 24, 2023 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@cirospaciari
Copy link
Owner

cirospaciari commented Feb 24, 2023

@wedobetter related some weird cases, we need to investigate, not only async but sync, and also CPython and other popular libraries, for this Issue, investigating strawberry async + PyPy is the priority.

In recent raw async coroutines performance PyPy is faster than vibora and in CPython is faster than Japronto, with really good results. The last version removed a lot of overhead from async, probably this is some weird use of async by the library itself.

Recent TFB preliminary tests show that version v0.0.16 (the same as the strawberry problem) is really fast in ASGI and async in both CPython and PyPy so raw async probably is not the problem:

https://www.techempower.com/benchmarks/#section=test&runid=328ef78f-93c0-4e5d-a5c1-69b5ceb95064&test=plaintext&l=hra0hr-35r

Screenshot provided by @wedobetter

Screenshot 2023-02-24 at 14 45 33

PyPy when accessing C Extensions can be slower than CPython (Cython is a popular one, this will change with HPy), maybe this is the case.

@wedobetter
Copy link

wedobetter commented Feb 24, 2023

Thank you @cirospaciari, I should have raised a separate issue, my bad!

Here's how to reproduce it:

from socketify import App
import strawberry
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html

# this is the GQL helper as defined in the socketify docs
from graphiql import graphiql_from


@strawberry.type
class User:
    name: str


@strawberry.type
class Query:
    @strawberry.field
    async def user(self, info: Info) -> User | None:
        return User(name="Hello")


def run(app: App):
    app.get("/", lambda res, req: res.end(get_graphiql_html()))
    app.post("/", graphiql_from(Query))
    # you can also pass a Mutation as second parameter
    # app.post("/", graphiql_from(Query, Mutation))
    app.listen(
        8000,
        lambda config: print(f"Listening on port http://localhost:{config.port} now\n"),
    )
    app.run()


run(App())

If you remove that async from the Query, you get awesome performance again ;)

I have never experienced this perf degradation with fastapi + strawberry, so I doubt it's strawberry itself.

@cirospaciari
Copy link
Owner Author

Thank you @cirospaciari, I should have raised a separate issue, my bad!

Here's how to reproduce it:

from socketify import App
import strawberry
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html

from graphiql import graphiql_from


@strawberry.type
class User:
    name: str


@strawberry.type
class Query:
    @strawberry.field
    async def user(self, info: Info) -> User | None:
        return User(name="Hello")


def run(app: App):
    app.get("/", lambda res, req: res.end(get_graphiql_html()))
    app.post("/", graphiql_from(Query))
    # you can also pass a Mutation as second parameter
    # app.post("/", graphiql_from(Query, Mutation))
    app.listen(
        8000,
        lambda config: print(f"Listening on port http://localhost:{config.port} now\n"),
    )
    app.run()


run(App())

If you remove that async from the Query, you get awesome performance again ;)

No problems, thanks for the feedback this is really helpful to me and to the project.
Maybe this is something with task factory, can you try to run it with: --task-factory-maxitems 0 at the end of the CLI command?

@wedobetter
Copy link

Unfortunately, that doesn't make a difference

@cirospaciari
Copy link
Owner Author

cirospaciari commented Feb 24, 2023

Both CPython and PyPy have weird fluctuations here:

CPython:
image

PyPy:
image

Without async both are way faster!

The process is always async for socketify the code BTW:

def graphiql_from(Query, Mutation=None):
    if Mutation:
        schema = strawberry.Schema(query=Query, mutation=Mutation)
    else:
        schema = strawberry.Schema(Query)

    async def post(res, req):
        # we can pass whatever we want to context, query, headers or params, cookies etc
        context_value = req.preserve()

        # get all incoming data and parses as json
        body = await res.get_json()

        query = body["query"]
        variables = body.get("variables", None)
        root_value = body.get("root_value", None)
        operation_name = body.get("operation_name", None)

        data = await schema.execute(
            query,
            variables,
            context_value,
            root_value,
            operation_name,
        )

        res.cork_end(
            {
                "data": (data.data),
                **({"errors": data.errors} if data.errors else {}),
                **({"extensions": data.extensions} if data.extensions else {}),
            }
        )

    return post
    

In the socketify code path, get_json and req.preserve can be optimized (and we already have issues+tasks for it in progress), but outside it is only strawberry code that changes internally, I will need time to investigate the source code of strawberry.

@cirospaciari
Copy link
Owner Author

Maybe this is related:
strawberry-graphql/strawberry#1886

@cirospaciari
Copy link
Owner Author

cirospaciari commented Feb 24, 2023

Timing only the scheme.execute part:

With and without async
image

start_time = time.monotonic()
data = await schema.execute(
            query,
            variables,
            context_value,
            root_value,
            operation_name,
)
end_time = time.monotonic()
print(timedelta(seconds=end_time - start_time))
        

Using uvloop instead of asyncio don't make much difference:

image

In this case just put at the top:

import uvloop
uvloop.install()

@cirospaciari
Copy link
Owner Author

cirospaciari commented Feb 24, 2023

If is something with or async process this maybe related or not:
#102

@cirospaciari
Copy link
Owner Author

cirospaciari commented Feb 24, 2023

strawberry itself when sync is faster on PyPy with 3.7k req/s vs 1k req/s in CPython
async for PyPy is only 1.2k req/s vs 400 req/s in CPython, it's clear is something in strawberry not in socketify, but we need to investigate further testing using Falcon and other frameworks.
PyPy at first seems to be slower at 200 req/s but after JIT kicks in the performance is a lot better.
Still, we can get faster async in PyPy than sync with CPython.
Caching strategy seems to be really important here since 1k or 4k req/s is really slow, socketify can deliver 1 mi req/s in the same test without graphql

@cirospaciari
Copy link
Owner Author

@wedobetter I will continue this next week, I have a lot of work to do today, but maybe my comments shine some lights here when under stress and after warmup PyPy gets faster than CPython.

@wedobetter
Copy link

wedobetter commented Feb 24, 2023

Maybe this is related:
strawberry-graphql/strawberry#1886

I had never tried fastapi on PyPy and I have just tried PyPy + fastapi (async) + strawberry, with the same code I am getting ~15ms responses

@dacevedo12
Copy link

Maybe related graphql-python/graphql-core#190

@wedobetter
Copy link

@cirospaciari if I decide to go sync, does your server implementation act as a go webserver? i.e. does it handle each request in a system thread, i.e. concurrently or is it blocking on each request?

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

@cirospaciari if I decide to go sync, does your server implementation act as a go webserver? i.e. does it handle each request in a system thread, i.e. concurrently or is it blocking on each request?

Python does not have true threads, will block on CPU and only be viable when doing IO, socketify does not use threads for IO we encourage async, but you can do it with a simple middleware or helper run, take a look on Python ThreadPoolExecutor.

Socketify way is do async only when you need async (IO) you can always use res.run_async(coroutine) and will be faster then ThreadPoolExcutor for IO
For CPU bound the use of JIT (PyPy) and workers (forks) is the way to go.

Python 3.12 or later will start trying to get GIL optional and threads will be viable.

In practice just do some benchmarks at your workload strawberry is not the best performing for PyPy at the moment and need time for warm-up, short tests will not show the true performance, like mentioned earlier on this issue.

In my tests after warming up I can get PyPy async faster than CPython sync with strawberry, JIT just take their time, run wrk or some stress tool to warm up your application, may take more the one run to warm-up.

But in general I will keep this issue because I want to investigate why strawberry performs this way and offer a better experience or alternative

Remember for socketify when using Strawberry will be ALWAYS be async (see the helper implementation) so the problem is something on async on strawberry implementation when running on socketify loop.

Also you need to see if strawberry handles threadpool executor on their non async methods for you, everything I said about middleware for ir is just for socketify not strawberry.

@wedobetter
Copy link

Right, thanks for your reply...
I know the threading deal with Python, I thought requests were handled by a multithreaded C written library, I have just realised I asked a silly question cause the main thread is Python anyway.

As you said I need to experiment and see what works best, by the way I have dropped PyPy for now because I can't build GRPC libs with it, a old old annoying problem.

Thank you again!

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

HPy project is bringing compatibility with Cython and PyPy it may be solved by it self this year or next, I will not recommend PyPy for now if you depends on Cython libraries unless your benchmarks says that is worth.

Socketify + CPython are faster than japronto at this stage and will be even faster at the end of this year.

Hope that I can solve the issues you have with PyPy when I have some time.

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

Some numbers:

Underpressure aka:

wrk --latency -d 5 -c 4096 --timeout 8 -t 8 http://localhost:8000 -s post.lua

post.lua:

wrk.method = "POST"
wrk.body = "{\"query\":\"query MyQuery {\\n  user {\\n    name\\n  }\\n}\",\"operationName\":\"MyQuery\"}"
wrk.headers["Content-Type"] = "application/json"
from socketify import App
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html

# this is the GQL helper as defined in the socketify docs
import strawberry
import strawberry.utils.graphiql

import time
from datetime import timedelta

@strawberry.type
class User:
    name: str


@strawberry.type
class Query:
    @strawberry.field
    async def user(self, info: Info) -> User:
        return User(name="Hello")





def run(app: App):
    schema = strawberry.Schema(Query)
    router = app.router()

    @router.get("/")
    def home(res, req): 
        res.end(get_graphiql_html())


    @router.post("/")
    async def post(res, req):
        # we can pass whatever we want to context, query, headers or params, cookies etc
        context_value = req.preserve()

        # get all incoming data and parses as json
        body = await res.get_json()
        
        query = body["query"]
        
        variables = body.get("variables", None)
        root_value = body.get("root_value", None)
        operation_name = body.get("operation_name", None)

        start_time = time.monotonic()

        data = await schema.execute(
            query,
            variables,
            context_value,
            root_value,
            operation_name,
        )
        end_time = time.monotonic()
        print(timedelta(seconds=end_time - start_time))

        res.cork_end(
            {
                "data": (data.data),
                **({"errors": data.errors} if data.errors else {}),
                **({"extensions": data.extensions} if data.extensions else {}),
            }
        )

Run with:

pypy3 -m socketify strawberry_test:run

Results:

0:00:00.053364
0:00:00.053202
0:00:00.052997
0:00:00.052786
0:00:00.052469
0:00:00.051713
0:00:00.051311
0:00:00.051040
0:00:00.050842
0:00:00.050672
0:00:00.050468
0:00:00.050253
0:00:00.050038
0:00:00.049820
0:00:00.049266
0:00:00.049026
0:00:00.048820
0:00:00.048654
0:00:00.048442
0:00:00.048273
0:00:00.048122
0:00:00.047974
0:00:00.047827
0:00:00.047505
0:00:00.047271
0:00:00.047092
0:00:00.046937
0:00:00.046789
0:00:00.046641
0:00:00.046495
0:00:00.046351
0:00:00.046136
0:00:00.045977
0:00:00.045660
0:00:00.045498
0:00:00.045343
0:00:00.045012
0:00:00.044752
0:00:00.044495
0:00:00.044233
0:00:00.043987
0:00:00.043718
0:00:00.043223
0:00:00.043045
0:00:00.042884
0:00:00.042730
0:00:00.042576
0:00:00.042426
0:00:00.042277
0:00:00.042128
0:00:00.041979
0:00:00.041604
0:00:00.041417
0:00:00.041252
0:00:00.041096
0:00:00.040912
0:00:00.040749
0:00:00.040599
0:00:00.040450
0:00:00.040302
0:00:00.040011
0:00:00.039838
0:00:00.039647
0:00:00.039504
0:00:00.039356
0:00:00.039208
0:00:00.039059
0:00:00.038913
0:00:00.038769
0:00:00.038623
0:00:00.038314
0:00:00.038153
0:00:00.037998
0:00:00.037848
0:00:00.037698
0:00:00.037551
0:00:00.037406
0:00:00.037262
0:00:00.037793
0:00:00.037475
0:00:00.037311
0:00:00.037157
0:00:00.037010
0:00:00.036864
0:00:00.036680
0:00:00.036519
0:00:00.036374
0:00:00.036230
0:00:00.035902
0:00:00.035733
0:00:00.035576
0:00:00.035424
0:00:00.036180
0:00:00.035951
0:00:00.035792
0:00:00.035641
0:00:00.035496
0:00:00.035203
0:00:00.035032
0:00:00.034872
0:00:00.034720
0:00:00.034569

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

With no pressure same code:
Results:

0:00:01.558609
0:00:01.405271
0:00:01.529131
0:00:01.549445
0:00:01.733283
0:00:00.119836
0:00:01.210686
0:00:01.441472
0:00:01.571545

With no warmup and no pressure, strawberry does not perform well (1.5s) with warmup and pressure we got more performance with 35ms, this is with print for logging so is a bad way to measure it, it is probably way lower if you remove the print statements.

This is only measuring the schema.execute time because here is where the problem lives.

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

Traced down to:
https://github.com/strawberry-graphql/strawberry/blob/main/strawberry/schema/execute.py#L128-L142

with is:

from graphql import execute as original_execute

This await call that is taking too long in this case is graphql.execute result that is handing too much for some reason, i will investigate further on graphql package.

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

After a quick look, when using futures (aka get_data/get_json etc) graphql is impacted, when not using it graphql is not affected, this is a really odd behavior, using futures or asyncio.Queue gives the same result.

get_data/get_json performs really well and the waiting time is all on graphql.execute not on socketify.

The new helper avoids using get_data/get_json and is 3x faster when not under pressure, only when not under pressure can we feel the latency, this might be something on socketify loop.py, and #102 can be related or not. I need more investigation, so I will take some time to look at it this weekend.

That may be something in asyncio itself that affects it down the road. Today socketify run asyncio loop + libuv loop on the same thread.
The first Hypothesis is asyncio is not calling libuv loop run_once (calling run_once will not block GIL) instead is doing some weird work, dispatching futures/queue etc.

The second Hypothesis is that asyncio is calling too much libuv run_once and getting blocked by libuv loop itself.

New Helper:
https://github.com/cirospaciari/socketify.py/blob/main/examples/helpers/graphiql.py

@cirospaciari
Copy link
Owner Author

The second Hypothesis was the right answer.
I will change run_once to run_nowait and this will be fixed. I need time to do tests and release a new version so @wedobetter I will commit to main the fix and at the end of the day or tomorrow morning I will publish the new version.

The new helper will have a little less overhead, but in practice will not be that much difference.

@cirospaciari
Copy link
Owner Author

cirospaciari commented Mar 10, 2023

Closed on https://pypi.org/project/socketify/0.0.17/

Now latency is 3 to 6ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants