You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#5431 changed Scheduler.decide_worker to stop it from assigning new tasks to workers with paused or closing_gracefully status.
However, those workers are still stealing tasks - effectively negating the benefits of the PR.
This is reflected by the flakiness of test_avoid_paused_workers; the test frequently hangs in CI on the lines
while (len(w1.tasks), len(w2.tasks), len(w3.tasks)) != (4, 0, 4):
awaitasyncio.sleep(0.01)
Above, w2 is paused. However, the tuple ends up looking like (3, 1, 4) instead. If you add await wait(futures), the test will start hanging deterministically since a task is always stolen from one of the running workers to the paused one, and there it sits forever since nothing steals it back.
Adding , config={"distributed.scheduler.work-stealing": False} to the gen_cluster decorator makes the issue disappear.
The text was updated successfully, but these errors were encountered:
#5431 changed
Scheduler.decide_worker
to stop it from assigning new tasks to workers with paused or closing_gracefully status.However, those workers are still stealing tasks - effectively negating the benefits of the PR.
This is reflected by the flakiness of
test_avoid_paused_workers
; the test frequently hangs in CI on the linesAbove, w2 is paused. However, the tuple ends up looking like
(3, 1, 4)
instead. If you addawait wait(futures)
, the test will start hanging deterministically since a task is always stolen from one of the running workers to the paused one, and there it sits forever since nothing steals it back.Adding
, config={"distributed.scheduler.work-stealing": False}
to the gen_cluster decorator makes the issue disappear.The text was updated successfully, but these errors were encountered: