Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock in tokio-1.0 #3493

Closed
tancehao opened this issue Feb 1, 2021 · 23 comments
Closed

deadlock in tokio-1.0 #3493

tancehao opened this issue Feb 1, 2021 · 23 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.

Comments

@tancehao
Copy link

tancehao commented Feb 1, 2021

Version

│ │ ├── tokio v1.0.2
│ │ │ └── tokio-macros v1.0.0
│ │ ├── tokio-util v0.6.1
│ │ │ ├── tokio v1.0.2 ()
│ │ │ └── tokio-stream v0.1.2
│ │ │ └── tokio v1.0.2 (
)
│ ├── tokio v1.0.2 ()
├── tokio v1.0.2 (
)
├── tokio-compat-03 v0.0.0
├── tokio-stream v0.1.2 ()
├── tokio-util v0.6.1 (
)

Platform

linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 GNU/Linux

Description
My program pulls data from a redis server and send them to another one.
There are many futures —— some of them pull data from the source, some push to the target, and some print stats into stdout periodically(5 seconds). And when pulling and pushing data, the program modifies some metrics managed by prometheus-0.1.0.
The program works well for months under tokio-0.3.0, but when I upgraded to tokio-1.0, deadlock happens. When it is blocked, the program no longer print stats, meaning that some of the futures are not polled by the tokio runtime(multi-threaded mode).

I executed perf top, and here is the output:

image

and here is what it showed when I selected the first line and then it's annotation:
image

These statistics seldom change for a long time.

So what happend in my process. What's wrong with the atomic operations?

@tancehao tancehao added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Feb 1, 2021
@Darksonn
Copy link
Contributor

Darksonn commented Feb 1, 2021

Are you able to get some sort of backtrace or line number for the snippet of assembly?

@tancehao
Copy link
Author

tancehao commented Feb 2, 2021

Are you able to get some sort of backtrace or line number for the snippet of assembly?

I don't know how...

@Darksonn
Copy link
Contributor

Darksonn commented Feb 2, 2021

Can you try out doing this? If the program is running in release mode, you might need to add this to your Cargo.toml to include debugging information such as line numbers in the executable.

@pronvis
Copy link

pronvis commented Feb 3, 2021

My problem is not related to dead lock (as I think), but still worth to mention: #3500

@Flakebi
Copy link

Flakebi commented Feb 3, 2021

I think I hit the same bug. Upgrading from tokio 0.2 to 1.0 went relatively smooth, just in one part in the application, when performing a certain slightly more complex request, it started to hang with 100 % CPU on one core and never returned.
I was able to pinpoint it to one part in my code, a print; tokio::sleep.await; print reached the first print but often not the second (sometimes it worked). It was constantly polling the future, spending most time in atomic operations as mentioned in the issue, but it never got beyond the await.

Today, I finally found the reason in the cooperative scheduling in tokio::coop for reducing tail latencies. Replacing Budget::initial() with Budget::unconstrained() (disabling early returns) fixed this bug for me.

tokio/tokio/src/coop.rs

Lines 97 to 99 in cc97fb8

pub(crate) fn budget<R>(f: impl FnOnce() -> R) -> R {
with_budget(Budget::initial(), f)
}

This bug can only be found in sufficiently large projects that have deeply nested futures and exhaust the initial budget (a backtrace I captured had 218 stackframes).

The module descriptions mentions that “voluntary yield points should be placed after at least some work has been done”. However, that is not always obeyed, leading to tasks being polled and doing nothing for eternity.

One of the places I found is tokio::sleep:

let coop = ready!(crate::coop::poll_proceed(cx));
me.entry.poll_elapsed(cx).map(move |r| {
coop.made_progress();
r
})

My original code did not contain a sleep, so there must to be other places too.

@Darksonn
Copy link
Contributor

Darksonn commented Feb 3, 2021

Thank you @Flakebi. The coop system consuming permits without doing any work certainly sounds like a problem that can cause a deadlock. Thanks for pointing it out!

I have opened a new issue for tracking this specifically, a link to which you can find right above this reply.

@tancehao Can you check whether your deadlock is caused by the same issue by replacing Budget::initial with Budget::unconstrainted in Tokio? You can replace your Tokio dependency with a custom modified one using a patch section in your Cargo.toml.

@jonhoo
Copy link
Sponsor Contributor

jonhoo commented Feb 4, 2021

I replied over in #3502, but the indicated code in sleep doesn't look wrong to me. If coop is dropped without the call to made_progress (that is, if poll_elapsed returns Pending), then the budget is restored to what it was prior to the call to poll_proceed.

I suspect the root cause here is instead that there is some code in the application in question that will not return Pending as long as some child future returns Ready, which is the same problem as an old version of FuturesUnordered had. I'm not sure why that would have changed in the move from 0.2 to 1.0 though..

Could it be that some dependency in your stack still uses tokio 0.2 — that might explain this too. If you run cargo tree -d -p tokio, do you see any other versions than 1.0 show up?

@Flakebi
Copy link

Flakebi commented Feb 4, 2021

Thanks for the fast answers!
There is a tokio 0.2 in my dependency graph, but it’s in a crate that doesn’t actually use it (meilisearch-error), everything else uses 1.1.1 :)
Interesting bug, there are a lot of FuturesOrdered in my call stack. The application/juniper code in that case does not contain any manual implementation of Future or Stream (also this part changed little when updating tokio) apart from message passing with actix and tokio sleeps inserted for debugging. Also, a bug there shouldn’t change behavior if I switch off coop I guess?

The behavior I see is:

  1. coop disabled (as described in my comment above): Request finishes instantly (few milliseconds)
  2. coop enabled (unchanged tokio): Request does not finish within 30 s. The future that should make progress is constantly polled but it does not make any progress.
  3. coop enabled and print; tokio::sleep.await; print in code: The first print is reached, so it started polling the future. The second print is never reached.

 

I guess I don’t understand how coop is working. In the case of sleep, it seems to return early when the budget is used up? (coop::poll_proceed returns Pending, thus ready! returns)
In the case that budget is left and the sleep finished, it calls made_progress, which resets the budget, so sleep does not use any budget.
That means, if I poll enough sleeps with a short enough sleep time that they are always ready, I never use up the budget?

@jonhoo
Copy link
Sponsor Contributor

jonhoo commented Feb 4, 2021

So, FuturesOrdered should work just fine as long as you're on a recent version of the futures- crates. I don't think that's buggy any more.

I guess I don’t understand how coop is working. In the case of sleep, it seems to return early when the budget is used up? (coop::poll_proceed returns Pending, thus ready! returns)

Yes, that's right, though that should result in Pending being propagated all the way up to the executor, which then resets the budget and immediately polls again (if there are no other futures to poll). So the question is really why the coop budget is always exhausted by the time you get to polling the sleep. The budget is only decremented when you poll a "leaf future" (like sleep or Receiver::recv) and it returns Ready, so this suggests that somewhere in your code a large number of leaf futures are polled any time the top-level task is polled. Are you calling sleep in a loop or something like that?

In the case that budget is left and the sleep finished, it calls made_progress, which resets the budget, so sleep does not use any budget.

Ah, no, quite the opposite. The default behavior is that dropping coop resets the budget — only if made_progress is called is the budget actually decremented. The intuition here is that we don't want to "pay" out of the budget if the future did not make progress. So, if poll_elapsed returns Pending, then made_progress is not call, and therefore the budget is reset, which means that the call to poll_elapsed did not spend out of the budget.

You can see the relevant code here:

tokio/tokio/src/coop.rs

Lines 156 to 173 in 3e5a0a7

impl RestoreOnPending {
pub(crate) fn made_progress(&self) {
self.0.set(Budget::unconstrained());
}
}
impl Drop for RestoreOnPending {
fn drop(&mut self) {
// Don't reset if budget was unconstrained or if we made progress.
// They are both represented as the remembered budget being unconstrained.
let budget = self.0.get();
if !budget.is_unconstrained() {
CURRENT.with(|cell| {
cell.set(budget);
});
}
}
}

That means, if I poll enough sleeps with a short enough sleep time that they are always ready, I never use up the budget?

It's the other way around. If you poll sleeps that are always ready, every time they're ready your budget gets decremented. If, on the other hand, you create a very long sleep that is basically never ready, then polling it does not consume any of your budget.

@Flakebi
Copy link

Flakebi commented Feb 4, 2021

It's the other way around.

Ah, right, I knew I missed something. Thanks for the explanation!

It seems like the Pending never gets to the executor, it’s stuck at 0. I’ll keep debugging.

@Flakebi
Copy link

Flakebi commented Feb 6, 2021

I got a little further, in my case it is not actually a deadlock, it just takes a minute to finish and I never waited that long.
What I can observe:

  1. Disabled coop: 132 ms
  2. Enabled coop: 54286 ms (411x)

Reading the futures issue @jonhoo linked above, I think I can infer the reason. If the budget is exhausted, tokio’s leaf futures return Pending but instantly wake themselves again. FuturesUnordered keeps polling futures if they are woken, even if they just returned Pending. Having a leaf future inside FuturesUnordered would create a deadlock if the budget is exhausted, weren’t it for the fact that FuturesUnordered bails out after unsuccessfully polling for 32 times.

According to the stacktrace I captured, there are 6 nested layers of FuturesUnordered. Every layer will poll 32 times until returning, which makes 326 = 1,073,741,824 polls that do nothing.

@Darksonn
Copy link
Contributor

Darksonn commented Feb 6, 2021

I see. Well, FuturesUnordered probably should not be doing that.

That said, I have considered if the coop system should turn itself off automatically if the task keeps polling the future. Thoughts?

@jonhoo
Copy link
Sponsor Contributor

jonhoo commented Feb 6, 2021

Ooof, yeah, that seems like a likely culprit @Flakebi!

The longer-term solution here is that every "sub-executor" (like FuturesUnordered) needs to participate in coop. Rather than randomly bail out after 32, it should check if the budget is exhausted after it gets Pending, and if it is, then just return Pending up the stack immediately. There's also some further discussion in rust-lang/futures-rs#2053.

Shorter-term, I think @Darksonn is right that we need some intermediate solution. Unfortunately, I think the proposed solution of "disable budgets if a future is polled after yielding due to coop" won't quite work, since that will be a very common case in practice. Consider the FuturesUnordered case — it will poll the future 32 times if it's the only one in the set (and likely >1 if there are more), so we'd probably get to a place where budgeting is almost universally disabled. Not a good place to be in.

One thought I've had in the past is that instead of coop making a future return Pending and immediately wake itself, we could stick the Waker in a place that the executor knows about, and only call wake on them all after control has been yielded back to the executor. This should work since tokio leaf futures need to be driven by the tokio executor at the moment anyway, and should be generalizeable to other executors (like the rest of coop) if we make the "stash the Waker" operation customizeable. I think this approach would mitigate the problem, since the leaf future should not be polled again by any sub-executor when they return due to coop.

@carllerche
Copy link
Member

IMO FuturesUnordered should not poll the same future twice per iteration. I consider that a bug w/ FuturesUnordered. I don't think we need to overcompensate for it now. Maybe we (tokio) can provide our own version of a FuturesUnordered-like utility. The longer-term solution is to upstream the coop strategy tor std and make sure the ecosystem is in consensus on behaviors.

@taiki-e
Copy link
Member

taiki-e commented Feb 7, 2021

IMO FuturesUnordered should not poll the same future twice per iteration.

I haven't read all of this thread and related threads, but I tend to agree with this.

@jonhoo @Flakebi Is a patch like this enough? or am I missing something?

I don't have much time this & next week, but if the above patch works or someone can write a correct fix, I would accept a PR to fix this on futures-rs side.

@Flakebi
Copy link

Flakebi commented Feb 7, 2021

@taiki-e I tested your patch and it works. Also looks quite elegant to me :)

@jonhoo
Copy link
Sponsor Contributor

jonhoo commented Feb 7, 2021

IMO FuturesUnordered should not poll the same future twice per iteration.

I haven't read all of this thread and related threads, but I tend to agree with this.

Yeah, me too.

@jonhoo @Flakebi Is a patch like this enough? or am I missing something?

So, that will only kind of work. It is definitely an improvement (we'll no longer always re-poll 32 times), but the more fundamental "we keep re-polling" issue is still present. Also, dependning on the ordering of the futures in the list, that might cause you to re-poll future while not polling another future that's ready I think?

@taiki-e
Copy link
Member

taiki-e commented Feb 7, 2021

the more fundamental "we keep re-polling" issue is still present.

Yeah, I agree that the patch does not solve the fundamental problem. (I think to fix the fundamental problem it needs the longer-term solution that @carllerche mentioned)

Also, dependning on the ordering of the futures in the list, that might cause you to re-poll future while not polling another future that's ready I think?

Currently, the patch is using cmp::min(self.len(), 32), so if there are more futures than 32, it will behave like that. Perhaps it makes more sense to just be self.len().

@taiki-e
Copy link
Member

taiki-e commented Feb 7, 2021

Filed rust-lang/futures-rs#2333

@tancehao
Copy link
Author

@Darksonn I'll have a try

@taiki-e
Copy link
Member

taiki-e commented Feb 26, 2021

The problem reported by @Flakebi (#3493 (comment)) has been fixed in futures 0.3.13 (rust-lang/futures-rs#2333).

@tancehao: Could you see if the problem you encountered also happen with the latest futures?

@tancehao
Copy link
Author

@Darksonn I've upgraded my tokio library to 1.5, but suprisingly I found the problem still exists.
Thankfully I have applied the debug method you taught me, and I finally found the position the dead loop happens.
perf
loop
(the channel function is imported from tokio::sync::watch);
It seems that one of the tasks executing the 'acquire' async function was not paused when it reaches w.changed().await, or maybe I'm not using the atomic operations correctly.
This problem does not occur every time, it only happens by chance. Could you give me some advices to fix this bug?

@Darksonn
Copy link
Contributor

If the watch receiver in self.watcher has not seen the latest value in the channel, then your clone of self.watcher is also considered to not yet have seen that value. This will cause the call to w.changed() to complete immediately, making it go around the loop again, but since you only took a clone, self.watcher is still not considered to have seen the newest value when it goes around the loop again. This means that unless your loop returns on the a > 0 condition, the loop turns into a busy loop.

So in conclusion, if self.current is negative or zero when acquire is called, then the call to acquire will busy loop.

In general, it seems like you should be using a tokio::sync::Semaphore here instead of trying to make your own by combining an atomic integer and a channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

7 participants