[WIP] Add lazy execution prototype #155

jkimbo · 2022-01-17T08:53:28Z

This PR is a proof of concept of adding lazy evaluation support to
graphql-core.

It follows a similar API to graphql-ruby
and it allows the developer to define a "lazy" type to enable batching
without using asyncio. The tests illustrate how this can be used to enable
a dataloader pattern.

N.B. The DeferredValue object is very similar in functionality to the
Promise library from graphql-core v2. I decided to reimplement a
subset rather than use it directly though because it's scope is bigger
than what I needed. It's a purely internal implementation detail though
and can be replaced in future.

Related issue: #148

This PR is a proof of concept of adding lazy evaluation support to graphql-core. It follows a similar API to [graphql-ruby](https://graphql-ruby.org/schema/lazy_execution.html) and it allows the developer to define "lazy" type to enable batching without using asyncio. The tests illustrate how this can be used to enable a dataloader pattern. N.B. The DeferredValue object is very similar in functionality to the Promise library from graphql-core v2. I decided to reimplement a subset rather than use it directly though because it's scope is bigger than what I needed. It's a purely internal implementation detail though and can be replaced in future.

jkimbo · 2022-01-22T10:24:50Z

@Cito any thoughts on this approach? Do you think it's a viable way of solving this problem?

Cito · 2022-01-22T14:55:13Z

@jkimbo In principle, yes. Will provide some feedback soon (currently working on graphql-relay-py).

Cito · 2022-01-23T15:26:27Z

Hi @jkimbo. Thanks again for getting the ball rolling on this one.

Let me recap what the problem is: Our use case is described in #71 and #148. The main problem is supporting synchronous batch data loaders, the additional problem is how to do this multi-threaded environments, where maybe each data loader would run in a different thread. Let’s start with the first problem.

Your solution in this PR is to make the ExecutionContext support LazyValues as results from resolvers, and internally using DeferredValues to resolve these. The DataLoader would then convert the load calls to LazyValues, so that these are not immediately resolved.

In principle, this works. But I feel not very good about this approach because of the following:

the additional calls of is_lazy() and isinstance(..., DeferredValue) degrade performance a bit
the additional handling of these cases makes the ExecutionContext logic even more complicated
we now have two additional concepts to think about and to distinguish, lazy and deferred values
Deferreds are considered an antipattern
This would still not support loader functions returning Promises
The ExecutionContext needs to manage the deferred_values task queue and processes it sequentially. I think dealing with such queues should better happen outside in the data loader and by an asynchronous executor of any kind, in order to allow running tasks concurrently.
The solution currently does not allow resolvers with lazy values and awaitables to coexist. For instance, in execute_fields if some results are awaitable, and some are deferred, then only the resolution via deferred_dict would take place, and the awaitables would not be gathered. Either the ExecutionContext should allow both to exist - I think this should be possible by refining your solution accordingly - or there should be a separate specialized ExecutionContext that would generally ignore awaitables.

Can we do something else instead?

I thought maybe in addition to Awaitables, we could also support Promises. We could make this universal by passing a Promise class to the ExecutionContext. This would allow to check if something is a promise. We would only require that a Promise has a then method in order to properly return ExecutionResult and GraphQLErrors. And maybe we should require the Promise class to have an all method as replacement for gather. Then we could support resolvers returning promises and a DataLoader could use Promises to implement deferred values. However - this would probably mean to duplicate all of the code in execute and the graphql function since it assumes that an Awaitable is returned, not a Promise. I do not think this is a viable solution.

So currently I think - why do not use what we have - support for asyncio? Instead of the DeferedValues, we could simply use asyncio.Futures. This way we would not need to introduce additional classes for lazy and deferred values and the ExecutionContext could stay as it is.

To demonstrate this, I have rewritten your DataLoader in test_lazy_execution to use Futures instead of LazyValues and DeferredValues, see test_lazy_execution_with_futures here. Note that this version uses the same synchronous batch loader, but it does not need the additional classes or changes in the ExecutionContext. The only minor difference/drawback is that you can't use graphql_sync , but you need graphql. But I don't think this is a big issue - you can use asyncio.run() to run graphql from synchronous code.

What do you think about this?

jkimbo · 2022-01-23T19:13:09Z

Thanks for the detailed feedback @Cito and I agree with your summary of the problem, though specifically I'm trying to solve the case where you need batching in a purely synchronous context (and one that can't be async i.e. Django).

Regarding your concerns:

the additional calls of is_lazy() and isinstance(..., DeferredValue) degrade performance a bit

That is true, though it's only the case if you implement the is_lazy check. I would imagine that consumers of graphql-core would only implement the check if operating in a synchronous environment and would have controls to disable it if not needed.

the additional handling of these cases makes the ExecutionContext logic even more complicated

Agreed but I don't see a way around it

we now have two additional concepts to think about and to distinguish, lazy and deferred values

Deferred values is meant to only be an internal concept to graphql-core. It shouldn't affect consumers of the library. It's there only to represent an object that can be resolved in the future.

Deferreds are considered an antipattern

I don't think that article is relevant here because it's arguing that using deferreds in JS is an antipattern because it has promises (which I agree with). The DeferredValue implementation is basically equivalent to the Promise implementation (https://github.com/syrusakbary/promise) just without the threading. I was going to call it Promise as well but I didn't want to confuse things. Anyway it's purely a mechanism to represent a value that can be resolved in the future. I could also call it Future but again I didn't want to confuse things with asyncio Futures.

This would still not support loader functions returning Promises

Correct though you could implement a wrapper that would allow it. I think that is best done in userland.

The ExecutionContext needs to manage the deferred_values task queue and processes it sequentially. I think dealing with such queues should better happen outside in the data loader and by an asynchronous executor of any kind, in order to allow running tasks concurrently.

That is possible though you then need a mechanism to "register" the dataloader batch functions with graphql-core. I'm not sure exactly what an API for that would look like though it might warrant further investigation. Note: the "lazy" value implementation can use threads when resolving batches so that it runs concurrently.

The solution currently does not allow resolvers with lazy values and awaitables to coexist. For instance, in execute_fields if some results are awaitable, and some are deferred, then only the resolution via deferred_dict would take place, and the awaitables would not be gathered. Either the ExecutionContext should allow both to exist - I think this should be possible by refining your solution accordingly - or there should be a separate specialized ExecutionContext that would generally ignore awaitables.

Yes that is a limitation, though I am trying to solve the problem of batching in a purely synchronous context so there wouldn't be any lazy values anyway.

What you've implemented with Futures is basically how dataloaders work in asyncio (https://github.com/strawberry-graphql/strawberry/blob/main/strawberry/dataloader.py) and it works great when you can work in an asyncio context. However Django (and maybe other frameworks) currently don't support asyncio at all when interacting with the ORM and will raise an exception if you try and use it while there is a running asyncio loop (https://github.com/django/django/blob/37d9ea5d5c010d54a416417399344c39f4e9f93e/django/utils/asyncio.py#L17-L23). Unfortunately what you've suggested is not possible for any GraphQL server running on Django which is why I'm trying to come up with a solution that works only in a synchronous context.

Cito · 2022-01-23T21:22:52Z

Django (and maybe other frameworks) currently don't support asyncio at all when interacting with the ORM and will raise an exception if you try and use it while there is a running asyncio loop

Seems I didn't understand the depth of the problem since I'm not using Django. Maybe we should then implement something like your solution for those who have this problem, but try to mitigate the issues I mentioned above as far as possble.

Particularly, I think we shouldn't try to amend the existing ExecutionContext by adding the logic for lazy/deferred values, which currently can't coexist anyway, but instead create a subclass that is synchronously only, without any of the checks for awaitables and special handling of them. Instead, it should only check for deferred/lazy values.

This context could then be used in the case where we can't have an async loop. It should also be configurable to allow different kinds of lazy values. Currently, the checks for lazy values are configurable via the is_lazy method, but the resolution of lazy values by using the get method seems to be hardcoded. Maybe the is_lazy function could be passed as a a function like is_awaitable, or is_awaitable could be reused with the meaning of is_lazy in the sync context, plus we need another function for resolving the lazy objects.

Will try to experiment a bit to find the most clean and simple solution, but please have some patience, since I already know the next two weeks will be busy for me. But feel free to work on this in parallel as well. We will also need more tests, particularly for the error handling and the DeferredValue class.

jkimbo · 2022-01-23T21:42:58Z

Sounds good @Cito , having a specific ExecutionContext class sounds like a good idea to me, I just wasn't sure how to implement it without refactoring the implementation which I know follows the js one closely.

Cito · 2022-02-03T20:22:36Z

Hi @jkimbo. I have now implemented a specific DeferredExecutionContext as a subclass of the normal ExecutionContext.

Instead of using DeferredValue and LazyValue classes, I have used a simple self-made synchronous Future class with a similar API as the existing Future classes in Python. I think this is a bit simpler to understand despite the callback hell.

I have also added a test for the error handling, but we need more tests before releasing this.

Let me know if this looks good to you and we should continue to follow this approach.

Not sure whether we should release this as part of graphql-core or as a separate package.

jkimbo · 2022-02-07T19:30:40Z

Sorry for not getting back to you sooner @Cito . Your code looks great and I think it's a much better approach.

Regarding if we should split it out: I would prefer keeping it in graphql-core because it's going to be quite tied to the execution context class and so easily broken with new releases. If it's part of the library then at least it's obvious if the implement breaks.

syrusakbary · 2022-02-07T19:39:25Z

I think graphql-core should add support for DeferredValues, but from an async perspective.

Wait... what?

I've been trying to have @defer support on a extensive GraphQL service in my company.
After researching a bit I realized that this directive it's already supported and used in Facebook and Apollo Server.
https://relay.dev/docs/glossary/#defer
https://www.apollographql.com/blog/community/backend/introducing-defer-in-apollo-server/

Basically, if you have a query like the following:

query NewsFeed {
  newsFeed {
    stories {
      text
      comments @defer {
        text
      }
    }
  }
}

The HTTP request (one request that resolves sequentially - it flushes per result) will resolve in:

{
  "data": {
    "newsFeed": {
      "stories": [{
        "text": "blabla",
        "comments": null
      }]
    }
  }
}
// wait a few seconds
{
  "path": ["newsFeed", "stories", 0, "comments"],
  "data": [{
    "text": "The first comment in the first story"
  },
  {
    "text": "Second comment in the first story"
  }]
}
// wait a few seconds
{
  "path": ["newsFeed", "stories", 1, "comments"],
  "data": [{
    "text": "The first comment in the second story"
  }]
}

All within the same HTTP request.
You can actually see an implementation of this when going to Facebook and inspecting their GraphQL responses.

Implementation Details

async def resolve_comments(self, info):
    asyncio.sleep(2)
    return ...

Basically, the GraphQL query should return a DeferredExecutionContext with the main query resolved (but not blocked in the comments)
And then with a list of coroutines and it's paths (that when resolved will return more GraphQL responses similar to subscriptions).

Conclusion

I think it's better to start from this async-first approach and then "perhaps" add a way to support in sync context.
But starting from async-first will make it more robust I believe! :)

Cito · 2022-02-07T22:05:42Z

Hi @syrusakbary, thanks for this input, great ideas.

As far as I understand, these features are currently on the way of getting into the spec. GraphQL.js already has an experimental "defer-stream" branch as well, and this may get merged in the upcoming v17. As far as I see they simply return an iterator that yields the deferred values if there are any, instead of returning a single result. I will incorporate this solution into GraphQL-core, too. Maybe we can also have an experimental branch or subpackage supporting this if this does not get merged soon.

However, as @jkimbo mentioned, the problem with Django it does not tolerate a running event loop. So this case still needs some special support like the one in this PR. The good thing is that the solution I proposed above is just a subclass of the existing execution context, and nothing needs to be changed there - it will not affect the normal async usage at all.

fahimalizain · 2022-08-12T12:57:36Z

src/graphql/utilities/deferred_value.py

+                values.append(value)
+            ret.resolve(values)
+
+    for p in l:


Suggested change

for p in l:

for p in deferred_values.values():

Because the List l will have plain_values too ?

The work presented in this PR is really good 👍🏼
We were able to increase our GQL performance by at least 5X using dataloaders.

jkimbo · 2022-09-23T13:46:48Z

I've published a package for people to try this out: https://github.com/jkimbo/graphql-sync-dataloaders

@Cito I still think this should live inside graphql-core for maintainability reasons but at least this way people can try it out in their projects.

Cito · 2022-09-23T17:07:35Z

Thanks @jkimbo. I will definitely look into this again. But also like to get feedback whether this is useful or could be improved.

ericls · 2022-10-06T16:55:46Z

Just to make sure I understand where this is going, we are going to have our own Promise/Future-like class that is thenable and require all all resolvers/data-loaders to return an instance of that class, and support for handling these instances would be done in a new ExecutionContext?

jkimbo requested a review from Cito January 17, 2022 08:53

Cito self-assigned this Jan 17, 2022

Test lazy execution with Futures

34dfc6c

Test lazy execution with synchronous Futures

f967836

syrusakbary mentioned this pull request Feb 7, 2022

Support/help with promise-based resolvers #148

Open

Cito mentioned this pull request Mar 16, 2022

Support for @defer directive graphql-python/graphql-core-legacy#213

Open

fahimalizain mentioned this pull request Aug 12, 2022

feat: Remove Default Resolver + GQL Dataloaders leam-tech/frappe_graphql#64

Closed

fahimalizain reviewed Aug 12, 2022

View reviewed changes

jkimbo mentioned this pull request Sep 23, 2022

Consider supporting promise-based dataloaders in v3 graphql-python/graphene#1391

Open

Cito force-pushed the main branch from 71b18cb to 1e5be9c Compare January 8, 2023 13:22

Cito force-pushed the main branch from 5c63d22 to 0402948 Compare April 7, 2024 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add lazy execution prototype #155

[WIP] Add lazy execution prototype #155

jkimbo commented Jan 17, 2022 •

edited

jkimbo commented Jan 22, 2022

Cito commented Jan 22, 2022

Cito commented Jan 23, 2022 •

edited

jkimbo commented Jan 23, 2022 •

edited

Cito commented Jan 23, 2022 •

edited

jkimbo commented Jan 23, 2022

Cito commented Feb 3, 2022 •

edited

jkimbo commented Feb 7, 2022

syrusakbary commented Feb 7, 2022 •

edited

Cito commented Feb 7, 2022

fahimalizain Aug 12, 2022

fahimalizain Aug 12, 2022

jkimbo commented Sep 23, 2022

Cito commented Sep 23, 2022

ericls commented Oct 6, 2022

[WIP] Add lazy execution prototype #155

Are you sure you want to change the base?

[WIP] Add lazy execution prototype #155

Conversation

jkimbo commented Jan 17, 2022 • edited

jkimbo commented Jan 22, 2022

Cito commented Jan 22, 2022

Cito commented Jan 23, 2022 • edited

jkimbo commented Jan 23, 2022 • edited

Cito commented Jan 23, 2022 • edited

jkimbo commented Jan 23, 2022

Cito commented Feb 3, 2022 • edited

jkimbo commented Feb 7, 2022

syrusakbary commented Feb 7, 2022 • edited

Wait... what?

Implementation Details

Conclusion

Cito commented Feb 7, 2022

fahimalizain Aug 12, 2022

Choose a reason for hiding this comment

fahimalizain Aug 12, 2022

Choose a reason for hiding this comment

jkimbo commented Sep 23, 2022

Cito commented Sep 23, 2022

ericls commented Oct 6, 2022

jkimbo commented Jan 17, 2022 •

edited

Cito commented Jan 23, 2022 •

edited

jkimbo commented Jan 23, 2022 •

edited

Cito commented Jan 23, 2022 •

edited

Cito commented Feb 3, 2022 •

edited

syrusakbary commented Feb 7, 2022 •

edited