You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies - I'm not sure exactly how to provide a code example that triggers this condition, but here's what I observed:
Sometimes at the end of long runs with 1000s of tasks, I've found that there are straggler tasks that seem to be stuck on workers. These workers seem to be at the memory.pause 0.8 mark and the amount of stuck tasks is equal to the threads available to the dask worker. The workers are heart beating just fine, but don't seem to be actually doing anything with the tasks they're processing (callstacks for each task are blank). Other workers aren't stealing these tasks. When I go kill the workers, the scheduler will go reassign those tasks and everything will complete as normal.
The text was updated successfully, but these errors were encountered:
Apologies - I'm not sure exactly how to provide a code example that triggers this condition, but here's what I observed:
Sometimes at the end of long runs with 1000s of tasks, I've found that there are straggler tasks that seem to be stuck on workers. These workers seem to be at the
memory.pause
0.8 mark and the amount of stuck tasks is equal to the threads available to the dask worker. The workers are heart beating just fine, but don't seem to be actually doing anything with the tasks they're processing (callstacks for each task are blank). Other workers aren't stealing these tasks. When I go kill the workers, the scheduler will go reassign those tasks and everything will complete as normal.The text was updated successfully, but these errors were encountered: