Budgeting with a conditional check against an always ready stream blocks the condition #2542

Nemo157 · 2020-05-17T17:04:09Z

Version

0.2.20

Platform

playground

Description

Taking the first correct example from the tokio::select! docs and running it in the playground results in an infinite hang (playground with println commented out).

The tested code includes an additional time::delay_for(Duration::from_millis(10)).await; bit of work in the some_async_work which avoids this issue.

This has the same underlying problem as rust-lang/futures-rs#2157, the budgeting is only applied to the conditional and blocks it from ever becoming true, while the actual work continues running because it is not subject to the budgeting.

The text was updated successfully, but these errors were encountered:

Darksonn · 2020-05-17T17:10:10Z

I guess select! should increment the budget to solve this?

Nemo157 · 2020-05-17T17:18:19Z

That doesn't fix any generic combinators from outside the Tokio ecosystem.

Darksonn · 2020-05-17T17:30:33Z

If you put an infinite loop into the async fn, there is not really much Tokio can do? If you block the executor, the executor is blocked.

This is exactly the issue that coop was introduced to help with.

Darksonn · 2020-05-17T17:44:37Z

Hmm, I guess coop does interact here, but still, this is broken for the same reason that you should not use block_on inside futures. The issues about FuturesUnordered and Shared were not like this one, and would have a different fix from this one.

carllerche · 2020-05-17T17:46:34Z

I'm not 100% sure what the best strategy is here. Leaving select! out of budgeting was an explicit decision. The reasoning is the sub calls will increment the budget. The example here is running a no-op. This could be representative of some arbitrary CPU computation.

That doesn't fix any generic combinators from outside the Tokio ecosystem.

At this point, resources external from Tokio are expected to provide their own budgeting strategies.

carllerche · 2020-05-17T17:47:52Z

For reference, the example is:

use tokio::time::{self, Duration};

async fn some_async_work() {
    // do work
}

#[tokio::main]
async fn main() {
    let mut delay = time::delay_for(Duration::from_millis(50));

    while !delay.is_elapsed() {
        tokio::select! {
            _ = &mut delay, if !delay.is_elapsed() => {
                println!("operation timed out");
            }
            _ = some_async_work() => {
                println!("operation completed");
            }
        }
    }
}

carllerche · 2020-05-17T17:49:27Z

One option that I can think of is combinators could check to ensure that if Ready is returned, the budget has been decremented.

jonhoo · 2020-05-17T18:22:42Z

I wonder how much of a problem this is in practice? How many futures running on a tokio runtime contain no calls into tokio resources? I think I agree with the analysis above that this is really a non-future-friendly busy loop in disguise, and is a disjoint problem from the one coop was added to solve.

That said, of course, it would be nice to be able to do the right thing in as many cases as we can, and maybe coop can be used to help with this particular case. It's a weird one though; the user is selecting in a loop over a future that is always ready and polls no resource. To me, this seems like something that wouldn't happen in real code. I feel like either the future would do something (in which case the problem goes away) or they wouldn't be using select! (in which case adding budgeting to select wouldn't fix the problem).

Or, phrased differently, do we think this particular construct (select! with a non-polling future) actually happens, and that fixing select! for that particular instance adds meaningful value?

Darksonn · 2020-05-17T18:35:01Z

Well, if you want a more realistic case where this would happen, consider this:

use tokio::time::{self, Duration};
use futures::channel::mpsc::channel;
use futures::stream::StreamExt;

#[tokio::main]
async fn main() {
    let mut delay = time::delay_for(Duration::from_millis(50));

    let (send, recv) = channel::<()>(10);

    drop(send);
    let mut recv = recv.fuse();

    loop {
        tokio::select! {
            _ = &mut delay => {
                println!("operation timed out");
                break;
            }
            msg = recv.next() => {
                if let Some(()) = msg {
                    println!("operation completed");
                }
            }
        }
    }
}

playground

I just had someone make a similar mistake over on Discord, albeit that was with futures' select! macro, and the next() was on an empty FuturesUnordered.

jonhoo · 2020-05-17T18:41:24Z

Thanks, that example is helpful. I feel like it only goes to prove my point though. If the user uses futures::select (not tokio::select), then there's nothing we can do about this. If the user uses a tokio channel, then the problem already goes away because of coop. So the question here is about whether we specifically want to implement a fix for when people use tokio::select with a future that never yields and which uses no tokio resources.

Nemo157 · 2020-05-17T19:00:45Z

So, the context in which I noticed this was while trying to write an example of what would happen with some utilities during starvation for rust-lang/futures-rs#2135. That resulted in me noticing that this would hang:

let stream = futures::stream::select_all(vec![
    futures::stream::repeat(0).boxed(),
    tokio::time::interval(Duration::from_millis(10)).map(|_| 1).boxed(),
    tokio::time::interval(Duration::from_millis(100)).map(|_| 2).boxed(),
]);
let mut counts = [0, 0, 0];
stream
    .take_until(tokio::time::delay_for(Duration::from_secs(1)))
    .for_each(|val| { counts[val] += 1; async {} }).await;

Which then lead to rust-lang/futures-rs#2157, and eventually this as I was trying to find an example using only Tokio utilities (take_until is basically the equivalent of a looped select! which breaks when the future completes).

Matthias247 · 2020-05-17T19:36:12Z

These would all run endlessly with and without budgeting, right? The idea is that budgeting improve fairness in some situations, but it can not fix all of these.

In the synchronous world we would call those things deadlocks (or livelocks), and there is no automagical mitigation against those.

The examples could all be "fixed" by running the timer off another thread than the current executor thread. But I don't think that's desirable, since it will decrease performance. async/await is about reaching highest performance and thereby explicitly opting into a harder cooperative programming model. If you want that - you have to live within the constraints of the model.

I actually don't think the described issues are too bad - you will immediately observe them during a debug run. What would be far more painful is if an app would run into an endless blocking situation somewhere later during normal operation. Do we have any examples where this could happen? Obviously we can easily build a contrived example, but that isn't helpful to determine how serious the issue might be.

I also think there might be some potential for new runtime diagnostics in tokio, which could e.g. detect blocked executor threads and tasks. That could at least help to point out these issues in a live application.

Darksonn · 2020-05-17T20:05:37Z

These would all run endlessly with and without budgeting, right? The idea is that budgeting improve fairness in some situations, but it can not fix all of these.

Without budgeting, polling the timer after the timeout has expired returns Poll::Ready, so in that case it would only block the thread until the timeout. It would still block the thread until then of course.

I guess the idea is that budgeting would make it more comparable to a

while now() < timeout {
    tokio::task::yield_now().await;
}

so at least other tasks could run, whenever it yields.

Matthias247 · 2020-05-17T20:10:24Z

Without budgeting, polling the timer after the timeout has expired returns Poll::Ready

That will depend on how the timer is implemented. It might only change it's state if the timer handler is running inside the eventloop - which won't happen if the code never yields to the loop.

It might be currently implemented to also check the time on each poll - but I would not be on it being always that way.

Darksonn · 2020-05-17T20:17:38Z

Yes of course, and in some sense that is what we are experiencing here. If you run it with pre-coop Tokio, it does complete, because it happened to have such an implementation back then.

jonhoo · 2020-05-18T13:44:25Z

I'm definitely conflicted here. I'm not opposed to finding a way to have select! avoid infinite loops, but I also feel like it's the "wrong" solution. It would only work in very specific circumstances; for example, it would not fix @Nemo157's code in #2542 (comment). In that code, I think the issue actually lies in take_until, which (presumably) keeps polling the stream while it's available before checking the until future?

Nemo157 · 2020-05-18T13:46:56Z

No, take_until will check the future before each poll of the underlying stream, but there's no way for Delay to indicate to take_until the difference between its initial "I'm not ready yet" Pendings and the later "you need to yield to the executor before I can become ready" Pendings.

jonhoo · 2020-05-18T13:54:10Z

Ah, sorry, yes, you're right. I think the issue here then is actually with for_each. It is basically demonstrating a variant of the issue that FuturesUnordered had — if neither the stream nor the closure's future ever yields Pending, then it never yields to the executor, which is not okay. I think the fix here is the same as rust-lang/futures-rs#2049 — to change for_each so that it forces periodic yields if it hasn't yielded for a while. This, to me, is an indicator that rust-lang/futures-rs#2053 isn't framed quite right — it's not just about sub-executors, it's about any future that may never yield Pending.

I think it's important to stress here that I don't think we should place the onus on implementors of Stream to yield Pending if they're continuously yielding Ready. Either of those yield control back to the caller. It seems most reasonable to me that the top-level future is the one that has to ensure that it occasionally yields, even if all of its dependencies are continuously providing more data.

carllerche · 2020-11-12T21:38:15Z

Is there any action to take here?

jonhoo · 2021-03-21T00:16:19Z

Honestly, I think the action here is to standardize coop and have every resource call into it 😅 But barring that, I think the fix is to at least ensure that any combinators that we have control over yield periodically.

Arnavion · 2021-06-02T01:32:58Z

I disagree that forcing periodic yields is the way this should be solved. Periodically yielding means the entire call frame has to unwind all the way to the executor and back every time.

Currently tokio::time::Sleep is lying in its Future impl - Future::poll is supposed to be the way the future makes forward progress, but Sleep doesn't actually do this and relies on another task to mark it as completed.

I hit this problem when I was using a Sleep to enforce my own equivalent of budgeting:

fn poll(self, cx) {
    loop {
        if let Ready(()) = self.timeout.poll(cx) { // self.timeout is a Sleep
            self.timeout.reset(...);
            yield_now().poll(cx);
            return Poll::Pending;
        }

        let work = ready!(self.stream_of_work.poll_next(cx));
        process(work);
    }
}

This was in a single-threaded executor, and under program load stream_of_work would have sufficient time to become ready again by the time it was poll()ed again.

I would like to have a way of being in control of when the cooperative-yielding happens rather than letting tokio's budget system handle it. Making the budget system public doesn't help either - my code can encode budgeting strategies that suit my needs rather than just a single number that tokio's system uses.

Alternatively, if every timer checking for elapsed-ness on every poll is a perf concern, it would be nice to have some way the user can drive the timer wheel forward from within the user task. Eg given some fn tokio::time::make_progress() -> (), the code above could call that every N iterations.

Darksonn · 2021-06-02T07:17:24Z

@Arnavion It seems like your use-case would be better served by storing a std::time::Instant as the timeout and comparing it to Instant::now() in each iteration. The Sleep type is intended for the use-case where you want to receive a wakeup at some point in the future, but you are using it exclusively for checking whether a duration has elapsed — you don't even need the wakeups that the runtime is sending you when the timeout elapses since the thing you're actually waiting for is stream_of_work.poll_next.

Arnavion · 2021-06-02T07:56:56Z

but you are using it exclusively for checking whether a duration has elapsed

No. I simplified the example a bit too much perhaps, but there's more work being done with the timeout elapses than just yielding and resetting the timer. So there's a reason to have the timer register a waker when stream_of_work also returns Pending.

Darksonn added A-tokio Area: The main tokio crate C-bug Category: This is a bug. I-hang Program never terminates, resulting from infinite loops, deadlock, livelock, etc. M-coop Module: tokio/coop labels May 17, 2020

carllerche closed this as completed May 17, 2020

carllerche reopened this May 17, 2020

carllerche assigned jonhoo May 17, 2020

taiki-e mentioned this issue Mar 20, 2021

stream: avoid yielding in AllFuture and AnyFuture #3625

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Budgeting with a conditional check against an always ready stream blocks the condition #2542

Budgeting with a conditional check against an always ready stream blocks the condition #2542

Nemo157 commented May 17, 2020

Darksonn commented May 17, 2020

Nemo157 commented May 17, 2020

Darksonn commented May 17, 2020 •

edited

Darksonn commented May 17, 2020

carllerche commented May 17, 2020

carllerche commented May 17, 2020

carllerche commented May 17, 2020

jonhoo commented May 17, 2020

Darksonn commented May 17, 2020 •

edited

jonhoo commented May 17, 2020

Nemo157 commented May 17, 2020

Matthias247 commented May 17, 2020

Darksonn commented May 17, 2020

Matthias247 commented May 17, 2020

Darksonn commented May 17, 2020

jonhoo commented May 18, 2020

Nemo157 commented May 18, 2020

jonhoo commented May 18, 2020

carllerche commented Nov 12, 2020

jonhoo commented Mar 21, 2021

Arnavion commented Jun 2, 2021

Darksonn commented Jun 2, 2021

Arnavion commented Jun 2, 2021

Budgeting with a conditional check against an always ready stream blocks the condition #2542

Budgeting with a conditional check against an always ready stream blocks the condition #2542

Comments

Nemo157 commented May 17, 2020

Version

Platform

Description

Darksonn commented May 17, 2020

Nemo157 commented May 17, 2020

Darksonn commented May 17, 2020 • edited

Darksonn commented May 17, 2020

carllerche commented May 17, 2020

carllerche commented May 17, 2020

carllerche commented May 17, 2020

jonhoo commented May 17, 2020

Darksonn commented May 17, 2020 • edited

jonhoo commented May 17, 2020

Nemo157 commented May 17, 2020

Matthias247 commented May 17, 2020

Darksonn commented May 17, 2020

Matthias247 commented May 17, 2020

Darksonn commented May 17, 2020

jonhoo commented May 18, 2020

Nemo157 commented May 18, 2020

jonhoo commented May 18, 2020

carllerche commented Nov 12, 2020

jonhoo commented Mar 21, 2021

Arnavion commented Jun 2, 2021

Darksonn commented Jun 2, 2021

Arnavion commented Jun 2, 2021

Darksonn commented May 17, 2020 •

edited

Darksonn commented May 17, 2020 •

edited