Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for I/O bound list resolvers by using parallel execution #457

Closed
garyd203 opened this issue Nov 3, 2020 · 15 comments · Fixed by #458
Closed

Improve performance for I/O bound list resolvers by using parallel execution #457

garyd203 opened this issue Nov 3, 2020 · 15 comments · Fixed by #458

Comments

@garyd203
Copy link
Contributor

garyd203 commented Nov 3, 2020

At the moment, when an output list contains objects that themselves have an I/O bound resolver (eg. a secondary network RPC call to add more fields to the list items), then the list item resolvers are executed sequentially (see discussion yesterday, and also the original PR that introduced sequential resolution).

The concern here is that the elapsed wall-clock time to resolve the entire query is unnecessarily slow, because we have to wait for a sequence of network calls, which could potentially be executed in parallel instead. I'd like to start a discussion about how we could restore parallel execution for this scenario, whilst still addressing the concerns that led us to use sequential execution.

I am not fully aware of the issues. I am guessing they include:

  1. Reduce the overhead of an asyncio task for trivial resolvers
  2. Reduce the overhead of an asyncio task for non-output coercion
  3. Reduce the cost of starting an asyncio task for every member of a lengthy list

As a conversation starter, here are some ideas for the type of behaviour we might want (either standalone, or potentially combined) to address the I/O bound list resolver scenario outlined above:

  1. Only apply this functionality to the list_coercer for outputs (since they are most likely to involve I/O)
  2. Create asyncio tasks in chunks (eg. maximum 10 at a time)
  3. Only create asyncio tasks if there is at least 1 resolver (I don't think I am expressing this correctly - I mean if there is a @resolver-decorated async python function that needs to be executed in order to get the output value for a requested field)
  4. Provide a hook on the resolver for the field that produces the list, allowing it to specify an alternate list_coercer
  5. Provide a hook on the engine, allowing developers to specify a custom list_coercer
  6. Or something else... ;-)

Can you please comment? Thanks

@abusi
Copy link
Contributor

abusi commented Nov 3, 2020

Yep, i'm quite in a agreement with you, we need to be able to do it concurrently when it's necessary.
I'm quite fond of the idea to do a @resolver("Type.field", gather=True). I think it is quite self-explanatory.
@Maximilien-R what do you think of this idea ?

@garyd203
Copy link
Contributor Author

garyd203 commented Nov 3, 2020

That sounds like a reasonable approach.

Should we make gather be off by default (as it is at the moment) or on by default? If it is on by default for list fields, then that would be consistent with behaviour for non-list fields. My personal opinion is that I would expect it to be on by default. However, I am not aware of the issues around lists being gathered by default, so maybe this is not the right approach.

@Maximilien-R
Copy link
Member

Hi @abusi @garyd203,

Adding a new argument to the @Resolver & @Subscription decorator could probably do the job. Maybe we should think of a clearer name than just gather. A name that target a more the fact it will only impact the output "children" coercion (which could be for both object and list, this way we could control if list are coerced synchronously but also the object fields).

Also, I'm agree with @garyd203, we should set it to True by default. Maybe we missed this point on our previous PR.

The issue I see with this, is about default resolver (fields which doesn't implement a dedicated resolver) which will always be gathered that could impact the performance.

FYI, @garyd203, we decided to allow synchronous coercion on list (for arguments/inputs and outputs) cause we noticed that in complexe queries with nested objects with list and arguments could lead to performance impact due to the creation of many async tasks a asyncio.gather creation that wasn't necessary cause most of arguments doesn't do any I/O (you can take multiple Relay edges connection with their arguments for example).

@abusi
Copy link
Contributor

abusi commented Nov 4, 2020

@Maximilien-R @garyd203 I also think it should default to False.
Should we summary this with:

When you decorate a method which is not a resolver of a list/object, the gather parameter will have to effect.
When you know the decorated list/object resolver will return a list of thing with a subresolution performing an I/O, you specificaly set gather to True.

Also @Maximilien-R I open to naming suggestion :)

@garyd203
Copy link
Contributor Author

garyd203 commented Nov 4, 2020

Naming suggestion instead of gather: sequentially/sequential or concurrently/concurrent (ie. describe how the resolver for this collection-type field does sub-resolution on the items in the field).

@garyd203
Copy link
Contributor Author

garyd203 commented Nov 9, 2020

Thanks for the discussion @Maximilien-R and @abusi, it sounds like we have a good idea how to implement this feature. What's the way forward for doing the implementation work? Is it something one of you would want to do? Otherwise I expect I will have some time in the next month or 2, depending on my other commitments .

@Maximilien-R
Copy link
Member

I'll take a look at it wednesday, at least to check if its easily feasable or will need some important refactoring.

However, it seems that @garyd203 and myself was favorable to set the gather by default to True but @abusi seems to say that he would prefer to set it to False.

If we resume the things to do:

  • add a parameter to the @Resolver & @Subscription decorator to determine whether or not list/object fields will be resolved concurrently or sequentially
  • add a global parameter to the create_engine function to define a default behavior for fields without a dedicated resolver (concurrently or sequentially)

In anycases, the new parameter should impact only list/object resolutions.

@abusi
Copy link
Contributor

abusi commented Nov 10, 2020

@Maximilien-R Hum, I not really hard set on the default to True, if you think it is better to default to False, so be it. In the end, it's okay for me. The important thing is that it is explain in the documentation.

@garyd203
Copy link
Contributor Author

If we resume the things to do:

  • add a parameter to the @Resolver & @Subscription decorator to determine whether or not list/object fields will be resolved concurrently or sequentially
  • add a global parameter to the create_engine function to define a default behavior for fields without a dedicated resolver (concurrently or sequentially)

This sounds good to me. Let us know how your investigation goes.

I was wondering whether we want to implement both of these features now, or just one? AFAICT the first feature would solve all use cases. However, it may sometimes be inconvenient to users to update all relevant resolvers, and additionally our differing opinions on default behaviour suggest that it would be very useful to have the global engine-specific config as well - so the second feature would be good too, but perhaps not necessary.

@garyd203
Copy link
Contributor Author

garyd203 commented Dec 8, 2020

Thanks @abusi and @Maximilien-R

How does the release process work? When should I expect this to be in a release? (not rushing you, just looking to set my expectations)

@Maximilien-R
Copy link
Member

We have to update the documentation and the changelogs but I guess we could release a new version before the end of the week.

What do you think @abusi?

@abusi
Copy link
Contributor

abusi commented Dec 8, 2020

Yeah totaly, @Maximilien-R will you have time to edit the doc/changelog ? If not I may be able to do it.

@Maximilien-R
Copy link
Member

Hm... I'll try to do it this evening.

@Maximilien-R
Copy link
Member

@garyd203, version 1.3.0 is out 👍

@garyd203
Copy link
Contributor Author

garyd203 commented Dec 8, 2020

Thank you @Maximilien-R :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants