You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Executing a Dask worker with >1 processes (aka --nworkers [N] with N>1) will lead to wrong data in both the worker dashboard and especially the Prometheus metrics at /metrics.
and fully loading all worker processes (i.e. 15 running tasks per worker) will lead to these worker dashboard numbers.
So it's reporting that it is only executing 1 task while actually, this is currently churning through 15 tasks. Also, the Prometheus metrics will say this
# HELP dask_worker_tasks Number of tasks at worker.
# TYPE dask_worker_tasks gauge
dask_worker_tasks{state="memory"} 3.0
dask_worker_tasks{state="executing"} 1.0
# HELP dask_worker_concurrent_fetch_requests Deprecated: This metric has been renamed to transfer_incoming_count.\nNumber of open fetch requests to other workers
# TYPE dask_worker_concurrent_fetch_requests gauge
dask_worker_concurrent_fetch_requests 0.0
# HELP dask_worker_threads Number of worker threads
# TYPE dask_worker_threads gauge
dask_worker_threads 1.0
Aggregating these metrics with Grafana will give wildy wrong numbers (out by a factor of 15 in my case).
Environment:
Dask version: 2024.2.1
Python version: 3.9.18
Operating System: Linux
Install method (conda, pip, source): poetry
The text was updated successfully, but these errors were encountered:
Describe the issue:
Executing a Dask worker with >1 processes (aka
--nworkers [N]
withN>1
) will lead to wrong data in both the worker dashboard and especially the Prometheus metrics at/metrics
.Minimal Complete Verifiable Example:
Running a dask worker in k8s with this config
and fully loading all worker processes (i.e. 15 running tasks per worker) will lead to these worker dashboard numbers.
So it's reporting that it is only executing 1 task while actually, this is currently churning through 15 tasks. Also, the Prometheus metrics will say this
Aggregating these metrics with Grafana will give wildy wrong numbers (out by a factor of 15 in my case).
Environment:
The text was updated successfully, but these errors were encountered: