Deadlock on workers reaching memory.pause threshold #5235

gerrymanoim · 2021-08-19T15:54:31Z

Apologies - I'm not sure exactly how to provide a code example that triggers this condition, but here's what I observed:

Sometimes at the end of long runs with 1000s of tasks, I've found that there are straggler tasks that seem to be stuck on workers. These workers seem to be at the memory.pause 0.8 mark and the amount of stuck tasks is equal to the threads available to the dask worker. The workers are heart beating just fine, but don't seem to be actually doing anything with the tasks they're processing (callstacks for each task are blank). Other workers aren't stealing these tasks. When I go kill the workers, the scheduler will go reassign those tasks and everything will complete as normal.

The text was updated successfully, but these errors were encountered:

jrbourbeau · 2021-10-14T14:36:46Z

@fjetter @crusaderky, by chance, have either of you run into this scenario before?

crusaderky · 2021-10-14T15:01:43Z

Yes. See #3761. It will be fixed within the next few weeks.

gerrymanoim · 2021-10-14T15:29:34Z

Thanks! That's via #5381?

crusaderky · 2021-10-14T15:46:23Z

It will likely be a separate PR

jrbourbeau · 2021-10-14T21:42:46Z

Thanks @crusaderky! Closing as a duplicate of #3761

ncclementi added the needs info Needs further information from the user label Sep 17, 2021

jrbourbeau closed this as completed Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock on workers reaching memory.pause threshold #5235

Deadlock on workers reaching memory.pause threshold #5235

gerrymanoim commented Aug 19, 2021

jrbourbeau commented Oct 14, 2021

crusaderky commented Oct 14, 2021 •

edited

gerrymanoim commented Oct 14, 2021

crusaderky commented Oct 14, 2021

jrbourbeau commented Oct 14, 2021

Deadlock on workers reaching memory.pause threshold #5235

Deadlock on workers reaching memory.pause threshold #5235

Comments

gerrymanoim commented Aug 19, 2021

jrbourbeau commented Oct 14, 2021

crusaderky commented Oct 14, 2021 • edited

gerrymanoim commented Oct 14, 2021

crusaderky commented Oct 14, 2021

jrbourbeau commented Oct 14, 2021

crusaderky commented Oct 14, 2021 •

edited