Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for combinators with their own task pools (a la FuturesUnordered) #2053

Open
cramertj opened this issue Jan 27, 2020 · 5 comments

Comments

@cramertj
Copy link
Member

See #2047. When a combinator keeps its own task pool rather than deferring directly to the parent executor, cooperatively yielding (cx.waker.wake_by_ref(); return Poll::Pending) works "less well" because the task pool will remain active and thus be polled again. #2049 introduced a mitigation for this by limiting the number of times a FuturesUnordered is polled, but it'd be good to develop general guidance for these types of combinators, and perhaps a more principled solution for FuturesUnordered.

cc @jonhoo @carllerche

@jonhoo
Copy link
Contributor

jonhoo commented Jan 27, 2020

I think the most appealing solution to this is for FuturesUnordered to also cooperatively yield using the same mechanism that the underlying tasks are. #2049 is sort of a blunt solution since it just blindly yields every so often, and that has some sad implications (such as the multiplicative property @cramertj points out in #2047). If a mechanism like tokio-rs/tokio#2160 was available throughout the ecosystem, then FuturesUnordered to simply use poll_cooperate (in the lingo of that PR) each time it detects that it has looped for example. I think the mechanism from that PR can be de-coupled from the executor by just providing hooks for the executor to bind into — see tokio::league::ceded — but it will certainly require some careful thought to be general enough.

@cramertj
Copy link
Member Author

Yes, if there was some sort of global mechanism for setting budget and decrementing / resetting where appropriate, I could see FuturesUnordered making use of that.

@jonhoo
Copy link
Contributor

jonhoo commented Jan 28, 2020

One other option as briefly mentioned in #2047 (comment) is to make FuturesUnordered only ever iterate over its futures at most once. That is, if it ever detects that it is about to poll a given future for the second time in the same call to poll_next, then it returns Poll::Pending. How to do this efficiently given the current implementation is not entirely clear though.

@udoprog
Copy link
Contributor

udoprog commented Jan 28, 2020

One other option as briefly mentioned in #2047 (comment) is to make FuturesUnordered only ever iterate over its futures at most once. That is, if it ever detects that it is about to poll a given future for the second time in the same call to poll_next, then it returns Poll::Pending. How to do this efficiently given the current implementation is not entirely clear though.

This is something I've been experimenting with for the last week in a project called unicycle. The architecture is described in more detail in the README, but briefly: It maintains two bitsets (active and alternate) and child tasks store wakeup interest in the active bitset. Once polled it swaps these bitsets and uses the alternate to determine what to poll while draining it. Once it is empty, it forcibly yields and starts over again.

This guarantees that each child task is only polled at most once per cycle. And child tasks which aggressively signal interest in waking up are not given any special preference.

What I've been able to determine so far:

With Jon's help I've been comparing it to FuturesUnordered in Noria's benchmarks. But all I can say so far is that it appears to provide the same throughput with a similar latency profile. I am however interested to know if anyone has any heavy, real world use of FuturesUnordered I can benchmark against. So please hit me up if you do!

@dekellum
Copy link
Contributor

The benchmarks detailed here with links make heavy use of FuturesUnordered as a means to simulate concurrency and are throughput oriented. I suspect the latter body-image client benchmarks are potentially more interesting as they include net I/O.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants