Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unresponsive workers should be flagged on the dashboard #8546

Open
crusaderky opened this issue Feb 29, 2024 · 1 comment
Open

Unresponsive workers should be flagged on the dashboard #8546

crusaderky opened this issue Feb 29, 2024 · 1 comment

Comments

@crusaderky
Copy link
Collaborator

Let's hamstring a worker:

async def kill_event_loop():
    while True:
        pass

fut = c.submit(kill_event_loop)

This makes the worker completely unresponsive. After 5 minutes (distributed.scheduler.worker-ttl) without a single heartbeat coming through, the scheduler will disconnect it forcefully.

Until that happens, this is what I'm seeing on the dashboard:
Screenshot from 2024-02-29 17-26-34

Desired behaviour

There should be an indication on the dashboard of how many seconds have passed since the last heartbeat.
There should be also be visual cues - e.g. the whole line turning red - to indicate when you get to a significant threshold since the last heartbeat - e.g. max(4x of the expected heartbeat ratio, 25% of the worker-ttl)

@jrbourbeau
Copy link
Member

There should be an indication on the dashboard of how many seconds have passed since the last heartbeat.
There should be also be visual cues - e.g. the whole line turning red - to indicate when you get to a significant threshold since the last heartbeat

I believe this is on the "Info" page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants