Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dashboard docs for queuing #9660

Merged
merged 7 commits into from Nov 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/conf.py
Expand Up @@ -333,6 +333,7 @@
"https://asyncssh.readthedocs.io/en/latest/",
"https://asyncssh.readthedocs.io/en/latest/objects.inv",
),
"distributed": ("https://distributed.dask.org/en/latest", None),
"pyarrow": ("https://arrow.apache.org/docs/", None),
"zarr": (
"https://zarr.readthedocs.io/en/latest/",
Expand Down
44 changes: 44 additions & 0 deletions docs/source/dashboard-progress-script.py
@@ -0,0 +1,44 @@
"""
This script was run to produce some of the screenshots on https://docs.dask.org/en/stable/dashboard.html
"""
import time

from dask import delayed
from dask.distributed import Client, wait


@delayed
def inc(x):
time.sleep(0.1)
return x + 1


@delayed
def double(x):
time.sleep(0.1)
return 2 * x


@delayed
def add(x, y):
time.sleep(0.1)
return x + y


if __name__ == "__main__":
with Client(n_workers=4, threads_per_worker=2, memory_limit="4 GiB") as client:
while True:
data = list(range(1000))
output = []
for x in data:
a = inc(x)
b = double(x)
c = add(a, b)
output.append(c)

total = delayed(sum)(output)
total = total.persist()
wait(total)
time.sleep(5)
del total
time.sleep(2)
106 changes: 55 additions & 51 deletions docs/source/dashboard.rst
Expand Up @@ -34,7 +34,7 @@ In a Jupyter Notebook or JupyterLab session displaying the client object will sh

You can also query the address from ``client.dashboard_link`` (or for older versions of distributed, ``client.scheduler_info()['services']``).

By default, when starting a scheduler on your local machine the dashboard will be served at ``http://localhost:8787/status``. You can type this address into your browser to access the dashboard, but may be directed
By default, when starting a scheduler on your local machine the dashboard will be served at ``http://localhost:8787/status``. You can type this address into your browser to access the dashboard, but may be directed
elsewhere if port 8787 is taken. You can also configure the address using the ``dashboard_address``
parameter (see :class:`LocalCluster <distributed.deploy.local.LocalCluster>`).

Expand All @@ -49,14 +49,14 @@ of the most commonly used plots shown on the entry point for the dashboard:
.. figure:: images/dashboard_status.png
:alt: Main dashboard with five panes arranged into two columns. In the left column there are three bar charts. The top two show total bytes stored and bytes per worker. The bottom has three tabs to toggle between task processing, CPU utilization, and occupancy. In the right column, there are two bar charts with corresponding colors showing task activity over time, referred to as task stream and progress.

.. _dashboard.memory:
.. _dashboard.memory:

Bytes Stored and Bytes per Worker
---------------------------------
These two plots show a summary of the overall memory usage on the cluster (Bytes Stored),
as well as the individual usage on each worker (Bytes per Worker). The colors on these plots
indicate the following.
as well as the individual usage on each worker (Bytes per Worker). The colors on these plots
indicate the following.

.. raw:: html

<table>
Expand All @@ -81,24 +81,29 @@ indicate the following.
</table>

.. figure:: images/dashboard_memory.png
:alt: Two bar charts on memory usage. The top chart shows the total cluster memory in a single bar with mostly under target memory in blue and a small part of spilled to disk in grey. The bottom chart displays the memory usage per worker, with a separate bar for each of the 16 workers. The first four bars are orange as their worker's memory are close to the spilling to disk target, with the first worker standing out with a portion in grey that correspond to the amount spilled to disk. The remaining workers are all under target showing blue bars.
:alt: Two bar charts on memory usage. The top chart shows the total cluster memory in a single bar with mostly under target memory in blue and a small part of spilled to disk in grey. The bottom chart displays the memory usage per worker, with a separate bar for each of the 16 workers. The first four bars are orange as their worker's memory are close to the spilling to disk target, with the first worker standing out with a portion in grey that correspond to the amount spilled to disk. The remaining workers are all under target showing blue bars.

The different levels of transparency on these plot is related to the type of memory
The different levels of transparency on these plot is related to the type of memory
(Managed, Unmanaged and Unmanaged recent), and you can find a detailed explanation of them in the
`Worker Memory management documentation <https://distributed.dask.org/en/latest/worker.html#memory-management>`_
:doc:`Worker Memory management documentation <worker-memory>`


.. _dashboard.proc-cpu-occ:

Task Processing/CPU Utilization/Occupancy
-----------------------------------------

**Task Processing**
**Task Processing**

The *Processing* tab in the figure shows the number of tasks that have been assigned to each worker. Not all of these
tasks are necessarily *executing* at the moment: a worker only executes as many tasks at once as it has threads. Any
extra tasks assigned to the worker will wait to run, depending on their :doc:`priority <priority>` and whether their
dependencies are in memory on the worker.

The *Processing* tab in the figure shows the number of tasks being processed by each worker with the blue bar. The scheduler will
try to ensure that the workers are processing the same number of tasks. If one of the bars is completely white it means that
worker has no tasks and its waiting for them. This usually happens when the computations are close to finished (nothing
to worry about), but it can also mean that the distribution of the task across workers is not optimized.
The scheduler will try to ensure that the workers are processing about the same number of tasks. If one of the bars is
completely white it means that worker has no tasks and is waiting for them. This usually happens when the computations
are close to finished (nothing to worry about), but it can also mean that the distribution of the task across workers is
not optimized.

There are three different colors that can appear in this plot:

Expand Down Expand Up @@ -132,23 +137,24 @@ In this plot on the dashboard we have two extra tabs with the following informat

**CPU Utilization**

The *CPU* tab shows the cpu usage per-worker as reported by ``psutils`` metrics.
The *CPU* tab shows the cpu usage per-worker as reported by ``psutil`` metrics.

**Occupancy**

The *Occupancy* tab shows the occupancy, in time, per worker. The total occupancy for a worker is the total expected runtime
for all tasks currently on a worker. For example, an occupancy of 10s means an occupancy of 10s means that the worker
estimates it will take 10s to execute all the tasks it has currently been assigned.
The *Occupancy* tab shows the occupancy, in time, per worker. The total occupancy for a worker is the amount of time Dask expects it would take
to run all the tasks, and transfer any of their dependencies from other workers, *if the execution and transfers happened one-by-one*.
For example, if a worker has an occupancy of 10s, and it has 2 threads, you can expect it to take about 5s of wall-clock time for the worker
to complete all its tasks.

.. _dashboard.task-stream:

Task Stream
-----------

The task stream is a view of all the tasks across worker-threads. Each row represents a thread and each rectangle represents
an individual tasks. The color for each rectangle corresponds to the task-prefix of the task being performed and it matches the color
of the *Progress* plot (see Progress section). This means that all the individual tasks part of the `inc` task-prefix for example, will have
the same randomly assigned color from the viridis color map.
The task stream is a view of all the tasks across worker-threads. Each row represents a thread and each rectangle represents
an individual task. The color for each rectangle corresponds to the task-prefix of the task being performed and it matches the color
of the *Progress* plot (see Progress section). This means that all the individual tasks part of the `inc` task-prefix for example, will have
the same randomly assigned color from the viridis color map.

There are certain colors that are reserved for a specific kinds of tasks:

Expand Down Expand Up @@ -197,45 +203,43 @@ is idle. Having too much white and red is an indication of not optimal use of re
Progress
--------

The progress bars plot shows the progress of each individual task-prefix. The color of the of each bar matches the color of the
individual tasks on the task stream that correspond to the same task-prefix. Each horizontal bar has three different components:
The progress bars plot shows the progress of each individual task-prefix. The color of each bar matches the color of the
individual tasks on the task stream from the same task-prefix. Each horizontal bar has four different components, from left to right:

.. raw:: html

<table>
<tr>
<td>
<div role="img" aria-label="light grey square" style="color:rgba(128,128,128, 0.4); font-size: 25px ">&#9632;</div>
</td>
<td>Tasks that are ready to run.</td>
</tr>
<tr>
<td>
<div role="img" aria-label="teal square" style="color:rgba(30,151,138, 1); font-size: 25px ">&#9632;</div>
</td>
<td> Tasks that have been completed and are in memory.</td>
</tr>
<tr>
<td>
<div role="img" aria-label="light teal square" style="color:rgba(30,151,138, 0.6); font-size: 25px ">&#9632;</div>
</td>
<td>Tasks that have been completed, been in memory and have been released.</td>
</tr>
</table>
<ul style="list-style-type: none">
<li>
<span role="img" aria-label="light teal square" style="background:rgba(30,151,138, 0.6); width: 0.6em; height: 0.6em; border: 1px solid rgba(30,151,138, 0.6); display: inline-block"></span>
<span>Tasks that have completed, are not needed anymore, and now have been released from memory.</span>
</li>
<li>
<span role="img" aria-label="teal square" style="background:rgba(30,151,138, 1); width: 0.6em; height: 0.6em; border: 1px solid rgba(30,151,138, 1); display: inline-block"></span>
<span> Tasks that have completed and are in memory.</span>
</li>
<li>
<span role="img" aria-label="light grey square" style="background:rgba(128,128,128, 0.4); width: 0.6em; height: 0.6em; border: 1px solid rgba(128,128,128, 0.4); display: inline-block"></span>
<span>Tasks that are ready to run.</span>
</li>
<li>
<span role="img" aria-label="hashed light grey square" style="background-image: linear-gradient(135deg, rgba(128,128,128, 0.4) 25%, #ffffff 25%, #ffffff 50%, rgba(128,128,128, 0.4) 50%, rgba(128,128,128, 0.4) 75%, #ffffff 75%, #ffffff 100%); width: 0.6em; height: 0.6em; border: 1px solid rgba(128,128,128, 0.4); display: inline-block"></span>
<span>Tasks that are <a href="https://distributed.dask.org/en/stable/scheduling-policies.html#queuing">queued</a>. They are ready to run, but not assigned to workers yet, so higher-priority tasks can run first.</span>
</li>
</ul>

.. figure:: images/dashboard_progress.png
:alt: Progress bar chart with one bar for each task-prefix matching with the names "add", "double", "inc", and "sum". The "double", "inc" and "add" bars have a progress of approximately one third of the total tasks, displayed in their individual color with different transparency levels. The "double" and "inc" bars have a grey background, and the "sum" bar is empty.
:alt: Progress bar chart with one bar for each task-prefix matching with the names "add", "double", "inc", and "sum". The "double", "inc" and "add" bars have a progress of approximately one third of the total tasks, displayed in their individual color with different transparency levels. The "double" and "inc" bars have a striped grey background, and the "sum" bar is empty.


Dask JupyterLab Extension
--------------------------

The `JupyterLab Dask extension <https://github.com/dask/dask-labextension#dask-jupyterlab-extension>`__
allows you to embed Dask's dashboard plots directly into JupyterLab panes.
The `JupyterLab Dask extension <https://github.com/dask/dask-labextension#dask-jupyterlab-extension>`__
allows you to embed Dask's dashboard plots directly into JupyterLab panes.

Once the JupyterLab Dask extension is installed you can choose any of the individual plots available and
integrated as a pane in your JupyterLab session. For example, in the figure below we selected the *Task Stream*,
*Progress*, *Workers Memory*, and *Graph* plots.
Once the JupyterLab Dask extension is installed you can choose any of the individual plots available and
integrated as a pane in your JupyterLab session. For example, in the figure below we selected the *Task Stream*,
*Progress*, *Workers Memory*, and *Graph* plots.

.. figure:: images/dashboard_jupyterlab.png
:alt: Dask JupyterLab extension showing an arrangement of four panes selected from a display of plot options. The panes displayed are the Task stream, Bytes per worker, Progress and the Task Graph.
:alt: Dask JupyterLab extension showing an arrangement of four panes selected from a display of plot options. The panes displayed are the Task stream, Bytes per worker, Progress and the Task Graph.
9 changes: 4 additions & 5 deletions docs/source/deploying-hpc.rst
Expand Up @@ -190,11 +190,10 @@ following :doc:`configuration value <../../configuration>`:
temporary-directory: /path/to/local/storage

However, not all HPC systems have local storage. If this is the case then you
may want to turn off Dask's ability to spill to disk altogether. See `this
page <https://distributed.dask.org/en/latest/worker.html#memory-management>`_
for more information on Dask's memory policies. Consider changing the
following values in your ``~/.config/dask/distributed.yaml`` file to disable
spilling data to disk:
may want to turn off Dask's ability to spill to disk altogether.
See :doc:`this page <worker-memory>` for more information on Dask's memory policies.
Consider changing the following values in your ``~/.config/dask/distributed.yaml`` file
to disable spilling data to disk:

.. code-block:: yaml

Expand Down
Binary file modified docs/source/images/dashboard_progress.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/images/dashboard_status.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.