Skip to content

Commit

Permalink
doc tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
crusaderky committed Oct 26, 2021
1 parent 125334d commit 8a666d2
Showing 1 changed file with 12 additions and 7 deletions.
19 changes: 12 additions & 7 deletions docs/source/active_memory_manager.rst
Expand Up @@ -7,8 +7,9 @@ usage of workers across the Dask cluster. It is disabled by default.
Memory imbalance and duplication
--------------------------------
Whenever a Dask task returns data, it is stored on the worker that executed the task for
as long as it's a dependency of other tasks, is referenced by a ``Client`` through a
``Future``, or is part of a :doc:`published dataset <publish>`.
as long as it's a dependency of other tasks, is referenced by a
:class:`~distributed.Client` through a :class:`~distributed.Future`, or is part of a
:doc:`published dataset <publish>`.

Dask assigns tasks to workers following criteria of CPU occupancy, :doc:`resources`, and
locality. In the trivial use case of tasks that are not connected to each other, take
Expand Down Expand Up @@ -38,7 +39,7 @@ The AMM can be enabled through the :doc:`Dask configuration file <configuration>
The above is the recommended setup and will run all enabled *AMM policies* (see below)
every two seconds. Alternatively, you can manually start/stop the AMM from the
``Client`` or trigger a one-off iteration:
:class:`~distributed.`Client` or trigger a one-off iteration:

.. code-block:: python
Expand All @@ -58,6 +59,10 @@ long as they don't harm data integrity. These suggestions can be of two types:
- Delete one or more replicas of an in-memory task. The AMM will never delete the last
replica of a task, even if a policy asks to.

There are no "move" operations. A move is performed in two passes: first a policy
creates a copy; in the next AMM iteration, the same or another policy deletes the
original (if the copy succeeded).

Unless a policy puts constraints on which workers should be impacted, the AMM will
automatically create replicas on workers with the lowest memory usage first and delete
them from workers with the highest memory usage first.
Expand Down Expand Up @@ -90,7 +95,7 @@ Built-in policies
ReduceReplicas
++++++++++++++
class
``distributed.active_memory_manager.ReduceReplicas``
:class:`distributed.active_memory_manager.ReduceReplicas`
parameters
None

Expand Down Expand Up @@ -128,7 +133,7 @@ define two methods:
Create one replica of the target task on the worker with the lowest memory among
the listed candidates.
``yield "drop", <TaskState>, None``
Delete one replica of the target task one the worker with the highest memory
Delete one replica of the target task on the worker with the highest memory
usage across the whole cluster.
``yield "drop", <TaskState>, {<WorkerState>, <WorkerState>, ...}``
Delete one replica of the target task on the worker with the highest memory
Expand Down Expand Up @@ -186,8 +191,8 @@ Example
The following custom policy ensures that keys "foo" and "bar" are replicated on all
workers at all times. New workers will receive a replica soon after connecting to the
scheduler. The policy will do nothing if the target keys are not in memory somewhere or
if all workers already hold a replica.
Note that this example is incompatible with the ``ReduceReplicas`` built-in policy.
if all workers already hold a replica. Note that this example is incompatible with the
:class:`~distributed.active_memory_manager.ReduceReplicas built-in policy.
In mymodule.py (it must be accessible by the scheduler):
Expand Down

0 comments on commit 8a666d2

Please sign in to comment.