Significant performance degradation because of usage of gather instead of serial await #40

kevinvalk · 2023-04-06T23:31:12Z

Scenario: Python 3.11 GraphQL gateway using Ariadne with lots of nested data

During development I found a significant performance degradation. I raised this issue in GraphQL core graphql-python/graphql-core#190 . After some more research I found that using gather on CPU bound tasks causes significant overhead (graphql-python/graphql-core#190 (comment)). In the case of CPU bound async tasks it is better to use sequential await.

So I monkey patched gather into serial await in GraphQL core, but I still had very slow responses. Today I finally dove into this problem again and I saw that there was another gather in aiodataloader!

As far as I understand, the goal of the dataloader (when using with cache) is to only cause a few IO bound lookups and serve all other loads directly through the cache. This means that we will have a CPU bound usage of gather. I monkey patched the aiodataloader gather into a serial await and my requests went from 3s -> 500ms.

I am not sure if this is always the case (for example when not using cache), but as long as you want cache you really need to have a serial await. Maybe I am missing something (please let me know), but I would suggest to add a serial await to the load_many if cache is being used.

async def serial_gather(*futures: Awaitable[Any]):
    return [await future for future in futures]

aiodataloader = import_module("aiodataloader")

def load_many(self, keys: Iterable[Any]) -> "Future[List[ReturnT]]":
    """
    Loads multiple keys, returning a list of values

    >>> a, b = await my_loader.load_many([ 'a', 'b' ])

    This is equivalent to the more verbose:

    >>> a, b = await gather(
    >>>    my_loader.load('a'),
    >>>    my_loader.load('b')
    >>> )
    """
    if not isinstance(keys, Iterable):
        raise TypeError(
            ("The loader.load_many() function must be called with Iterable<key> but got: {}.").format(keys)
        )

    return serial_gather(*[self.load(key) for key in keys])

aiodataloader.DataLoader.load_many = load_many

The text was updated successfully, but these errors were encountered:

markedwards · 2023-04-07T07:10:03Z

Doing this would lead to very poor performance when retrieving keys that are not in the cache. It essentially defeats batching, which is the entire point of the DataLoader pattern.

What’s the use-case which leads to calling load_many() on hundreds of thousands of keys where the results are already cached? Are you using aiodataloader as an application-level cache?

kevinvalk · 2023-08-24T08:58:16Z

Found the "problem" (which ended up being a user error). If you are running Python in debug mode the asyncio loop is also set to debug. This ensures that on each context switch the full stack trace is kept. This is quite expensive so when using Gather (depending on your workload) the tasks may end up switching A LOT which completely kills performance. In production this is not a problem because the asyncio loop is not set to debug.

TL;DR; If you want to get the actual performance disable debug on your asyncio loop

loop = asyncio.get_running_loop()
loop.set_debug(False)

kevinvalk mentioned this issue Apr 6, 2023

Significant performance hit when using async resolvers graphql-python/graphql-core#190

Open

kevinvalk closed this as completed Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant performance degradation because of usage of gather instead of serial await #40

Significant performance degradation because of usage of gather instead of serial await #40

kevinvalk commented Apr 6, 2023

markedwards commented Apr 7, 2023

kevinvalk commented Aug 24, 2023

Significant performance degradation because of usage of gather instead of serial await #40

Significant performance degradation because of usage of gather instead of serial await #40

Comments

kevinvalk commented Apr 6, 2023

markedwards commented Apr 7, 2023

kevinvalk commented Aug 24, 2023