Improve performance for I/O bound list resolvers by using parallel execution #457

garyd203 · 2020-11-03T11:31:25Z

At the moment, when an output list contains objects that themselves have an I/O bound resolver (eg. a secondary network RPC call to add more fields to the list items), then the list item resolvers are executed sequentially (see discussion yesterday, and also the original PR that introduced sequential resolution).

The concern here is that the elapsed wall-clock time to resolve the entire query is unnecessarily slow, because we have to wait for a sequence of network calls, which could potentially be executed in parallel instead. I'd like to start a discussion about how we could restore parallel execution for this scenario, whilst still addressing the concerns that led us to use sequential execution.

I am not fully aware of the issues. I am guessing they include:

Reduce the overhead of an asyncio task for trivial resolvers
Reduce the overhead of an asyncio task for non-output coercion
Reduce the cost of starting an asyncio task for every member of a lengthy list

As a conversation starter, here are some ideas for the type of behaviour we might want (either standalone, or potentially combined) to address the I/O bound list resolver scenario outlined above:

Only apply this functionality to the list_coercer for outputs (since they are most likely to involve I/O)
Create asyncio tasks in chunks (eg. maximum 10 at a time)
Only create asyncio tasks if there is at least 1 resolver (I don't think I am expressing this correctly - I mean if there is a @resolver-decorated async python function that needs to be executed in order to get the output value for a requested field)
Provide a hook on the resolver for the field that produces the list, allowing it to specify an alternate list_coercer
Provide a hook on the engine, allowing developers to specify a custom list_coercer
Or something else... ;-)

Can you please comment? Thanks

The text was updated successfully, but these errors were encountered:

abusi · 2020-11-03T13:08:02Z

Yep, i'm quite in a agreement with you, we need to be able to do it concurrently when it's necessary.
I'm quite fond of the idea to do a @resolver("Type.field", gather=True). I think it is quite self-explanatory.
@Maximilien-R what do you think of this idea ?

garyd203 · 2020-11-03T23:00:57Z

That sounds like a reasonable approach.

Should we make gather be off by default (as it is at the moment) or on by default? If it is on by default for list fields, then that would be consistent with behaviour for non-list fields. My personal opinion is that I would expect it to be on by default. However, I am not aware of the issues around lists being gathered by default, so maybe this is not the right approach.

Maximilien-R · 2020-11-04T08:17:42Z

Hi @abusi @garyd203,

Adding a new argument to the @Resolver & @Subscription decorator could probably do the job. Maybe we should think of a clearer name than just gather. A name that target a more the fact it will only impact the output "children" coercion (which could be for both object and list, this way we could control if list are coerced synchronously but also the object fields).

Also, I'm agree with @garyd203, we should set it to True by default. Maybe we missed this point on our previous PR.

The issue I see with this, is about default resolver (fields which doesn't implement a dedicated resolver) which will always be gathered that could impact the performance.

FYI, @garyd203, we decided to allow synchronous coercion on list (for arguments/inputs and outputs) cause we noticed that in complexe queries with nested objects with list and arguments could lead to performance impact due to the creation of many async tasks a asyncio.gather creation that wasn't necessary cause most of arguments doesn't do any I/O (you can take multiple Relay edges connection with their arguments for example).

abusi · 2020-11-04T09:33:49Z

@Maximilien-R @garyd203 I also think it should default to False.
Should we summary this with:

When you decorate a method which is not a resolver of a list/object, the gather parameter will have to effect.
When you know the decorated list/object resolver will return a list of thing with a subresolution performing an I/O, you specificaly set gather to True.

Also @Maximilien-R I open to naming suggestion :)

garyd203 · 2020-11-04T12:25:10Z

Naming suggestion instead of gather: sequentially/sequential or concurrently/concurrent (ie. describe how the resolver for this collection-type field does sub-resolution on the items in the field).

garyd203 · 2020-11-09T21:27:28Z

Thanks for the discussion @Maximilien-R and @abusi, it sounds like we have a good idea how to implement this feature. What's the way forward for doing the implementation work? Is it something one of you would want to do? Otherwise I expect I will have some time in the next month or 2, depending on my other commitments .

Maximilien-R · 2020-11-09T21:39:30Z

I'll take a look at it wednesday, at least to check if its easily feasable or will need some important refactoring.

However, it seems that @garyd203 and myself was favorable to set the gather by default to True but @abusi seems to say that he would prefer to set it to False.

If we resume the things to do:

add a parameter to the @Resolver & @Subscription decorator to determine whether or not list/object fields will be resolved concurrently or sequentially
add a global parameter to the create_engine function to define a default behavior for fields without a dedicated resolver (concurrently or sequentially)

In anycases, the new parameter should impact only list/object resolutions.

abusi · 2020-11-10T09:18:46Z

@Maximilien-R Hum, I not really hard set on the default to True, if you think it is better to default to False, so be it. In the end, it's okay for me. The important thing is that it is explain in the documentation.

garyd203 · 2020-11-11T06:09:23Z

If we resume the things to do:

add a parameter to the @Resolver & @Subscription decorator to determine whether or not list/object fields will be resolved concurrently or sequentially

add a global parameter to the create_engine function to define a default behavior for fields without a dedicated resolver (concurrently or sequentially)

This sounds good to me. Let us know how your investigation goes.

I was wondering whether we want to implement both of these features now, or just one? AFAICT the first feature would solve all use cases. However, it may sometimes be inconvenient to users to update all relevant resolvers, and additionally our differing opinions on default behaviour suggest that it would be very useful to have the global engine-specific config as well - so the second feature would be good too, but perhaps not necessary.

garyd203 · 2020-12-08T00:07:25Z

Thanks @abusi and @Maximilien-R

How does the release process work? When should I expect this to be in a release? (not rushing you, just looking to set my expectations)

Maximilien-R · 2020-12-08T07:35:59Z

We have to update the documentation and the changelogs but I guess we could release a new version before the end of the week.

What do you think @abusi?

abusi · 2020-12-08T08:28:05Z

Yeah totaly, @Maximilien-R will you have time to edit the doc/changelog ? If not I may be able to do it.

Maximilien-R · 2020-12-08T10:21:37Z

Hm... I'll try to do it this evening.

Maximilien-R · 2020-12-08T18:37:05Z

@garyd203, version 1.3.0 is out 👍

garyd203 · 2020-12-08T21:46:30Z

Thank you @Maximilien-R :-)

Maximilien-R mentioned this issue Nov 11, 2020

ISSUE-457 - Improve performance for I/O bound list resolvers by using parallel execution #458

Merged

abusi closed this as completed in #458 Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance for I/O bound list resolvers by using parallel execution #457

Improve performance for I/O bound list resolvers by using parallel execution #457

garyd203 commented Nov 3, 2020

abusi commented Nov 3, 2020

garyd203 commented Nov 3, 2020 •

edited

Maximilien-R commented Nov 4, 2020

abusi commented Nov 4, 2020 •

edited

garyd203 commented Nov 4, 2020

garyd203 commented Nov 9, 2020

Maximilien-R commented Nov 9, 2020

abusi commented Nov 10, 2020

garyd203 commented Nov 11, 2020

garyd203 commented Dec 8, 2020

Maximilien-R commented Dec 8, 2020

abusi commented Dec 8, 2020

Maximilien-R commented Dec 8, 2020

Maximilien-R commented Dec 8, 2020

garyd203 commented Dec 8, 2020

Improve performance for I/O bound list resolvers by using parallel execution #457

Improve performance for I/O bound list resolvers by using parallel execution #457

Comments

garyd203 commented Nov 3, 2020

abusi commented Nov 3, 2020

garyd203 commented Nov 3, 2020 • edited

Maximilien-R commented Nov 4, 2020

abusi commented Nov 4, 2020 • edited

garyd203 commented Nov 4, 2020

garyd203 commented Nov 9, 2020

Maximilien-R commented Nov 9, 2020

abusi commented Nov 10, 2020

garyd203 commented Nov 11, 2020

garyd203 commented Dec 8, 2020

Maximilien-R commented Dec 8, 2020

abusi commented Dec 8, 2020

Maximilien-R commented Dec 8, 2020

Maximilien-R commented Dec 8, 2020

garyd203 commented Dec 8, 2020

garyd203 commented Nov 3, 2020 •

edited

abusi commented Nov 4, 2020 •

edited