Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues due to breadth first execution of grapqhl queries in case of async resolvers during calls burst #200

Open
alessandrolulli opened this issue May 12, 2023 · 4 comments

Comments

@alessandrolulli
Copy link

When we receive calls burst we found that the calls "wait each other" (i.e. the first call of the burst waits the last one).
This result in degradation of performances in both time of execution and memory consumption in the server because we have to keep many calls in fly.

This is particularly evident in big graphql queries where the users request many fields and we have several depth in the queries where each level has many async fields.

Just to be TLDR, looking the implementations of graphql and asyncio we understood that this is due to the following:

  • graphql breadth first way to schedule and resolve the fields
  • asyncio internal FIFO queue of tasks to be executed

As an example lets have queries like that, where data may be like beer vendors and we want for each beer vendor many fields that describes that vendor, a1...a100, b1...b100, ...:

query {
  data {
    a1 {
      b1 {
        c1
        ...
        c100
      }
      ...
      b100 {
        c1
        ...
        c100
      }
    }
    ...
    a100 { ... }
  }
}

If we have n of this calls coming in burst when we arrive to the depth of the c fields we have many many task scheduled in the asyncio queue.

If we check the of order of execution we have that the first query, on each level, "waits" the other queries, because all the queries schedules a lot of tasks.

In the proof of concept, that you may find at the end of the post, you can verify the order of execution of the resolvers.

It could be very nice to have some sort of priority in the order to let the first query not wait the scheduling and resolve of all the queries before ending.
I understand that this is something between graphql and asyncio but i think it could affect the use of graphql in environments receiving many calls.
Fixes, helps and hints in how to improve this would be very appreciated.

import asyncio

from graphene import ObjectType, Schema, String, Field

FIELD_NUMBER = 2
CONCURRENT_QUERIES = 10


def make_resolver(i, j=None):
    async def resolver(self, info):
        print(f"START query {info.context['query_number']} | a{i} | b{j}")
        await asyncio.sleep(0.001)
        print(f"END query {info.context['query_number']} | a{i} | b{j}")
        return i

    return resolver


def create_fields():
    fields = {}
    for i in range(FIELD_NUMBER):
        inner_fields = {}
        for j in range(FIELD_NUMBER):
            inner_fields[f"b{j}"] = String()
            inner_fields[f"resolve_b{j}"] = make_resolver(i, j)

        MyType = type(
            f"MyType",
            (ObjectType,),
            inner_fields,
        )

        fields[f"a{i}"] = Field(MyType)
        fields[f"resolve_a{i}"] = make_resolver(i)

    return fields


async def make_query(schema, query_number):
    inner_query_values = [f"b{i}" for i in range(FIELD_NUMBER)]
    query_values = [
        "a%s {%s}" % (i, " ".join(inner_query_values)) for i in range(FIELD_NUMBER)
    ]
    query_string = "{ %s }" % (" ".join(query_values),)

    await schema.execute_async(
        query_string, context_value=dict(query_number=query_number)
    )


async def main():
    Query = type("Query", (ObjectType,), create_fields())
    schema = Schema(query=Query)

    await asyncio.gather(*[make_query(schema, i) for i in range(CONCURRENT_QUERIES)])


asyncio.run(main())
@Cito
Copy link
Member

Cito commented May 29, 2023

Hi @alessandrolulli. Can you check whether the @defer/@stream functionality of GraphQL.js 17.0.0a2 which is already contained in the main branch of GraphQL-core would solve your problem?

@Cito
Copy link
Member

Cito commented Jun 4, 2023

Note that the @defer/@stream functionality of GraphQL.js 17.0.0a2 is now included in GraphQL-core 3.3.0a3.

@alessandrolulli
Copy link
Author

Hi, sorry for not answering on this.

Unfortunately, our architecture can not support defer/stream. We need to load all the data at once.

Moreover, the problem is still present. Further analysis shows that the garbage collector trigger many many times, but additional analysis is required.

We still experience very slow query response time in large nested schemas and this is particularly evident in query call burst

@alessandrolulli
Copy link
Author

just to give you an idea of our application, the following are the most impactful functions in our app due to graphql:

  • code | call_count | total_time | inline_time
  • /usr/local/lib/python3.8/asyncio/base_events.py:1784(_run_once) | 8837 | 158.9260885 | 0.679319755
  • /usr/local/lib/python3.8/asyncio/events.py:79(_run) | 206937 | 154.2804386 | 0.426225768
  • <method 'run' of 'Context' objects> | 206937 | 150.819749 | 0.874805984
  • <method 'switch' of 'greenlet.greenlet' objects> | 5850 | 38.3539513 | 0.040258509
  • /usr/local/lib/python3.8/selectors.py:451(select) | 8837 | 33.90701106 | 0.115947484
  • <method 'poll' of 'select.epoll' objects> | 8837 | 33.7278848 | 33.7278848
  • graphql/execution/execute.py:525(await_result) | 57063 | 31.60689303 | 0.281550806
  • graphql/execution/execute.py:575(complete_value) | 371926 | 16.89343411 | 1.606329719
  • graphql/execution/execute.py:660(complete_list_value) | 9696 | 16.08275478 | 0.347416812
  • graphql/execution/execute.py:413(execute_fields) | 65469 | 15.92233617 | 3.4270316
  • graphql/execution/execute.py:893(complete_object_value) | 65350 | 15.90468676 | 0.248284859
  • graphql/execution/execute.py:485(execute_field) | 264162 | 15.38942806 | 2.087991986
  • graphene/types/schema.py:494(execute_async) | 238 | 14.13009327 | 0.00221666
  • graphql/graphql.py:19(graphql) | 238 | 14.12762597 | 0.023681461
  • graphql/graphql.py:152(graphql_impl) | 119 | 14.05983234 | 0.007804555
  • graphql/validation/validate.py:19(validate) | 119 | 9.729721089 | 0.004884528
  • graphql/language/visitor.py:170(visit) | 686 | 9.634308354 | 1.888684852

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants