Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMM high level documentation #5456

Merged
merged 3 commits into from Oct 22, 2021
Merged

AMM high level documentation #5456

merged 3 commits into from Oct 22, 2021

Conversation

crusaderky
Copy link
Collaborator

No description provided.

@crusaderky crusaderky self-assigned this Oct 22, 2021
@crusaderky crusaderky added the documentation Improve or add to documentation label Oct 22, 2021
distributed:
scheduler:
active-memory-manager:
start: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - is it too late to change these config fields? I'd find enable: true to be a clearer config name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I find enable to be ambiguous. You can call run_once while it's not started.


.. note::
This policy is incompatible with :meth:`~distributed.Client.replicate` and with the
``broadcast`` parameter of :meth:`~distributed.Client.scatter`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment more on what this incompatibility entails? What happens if AMM is enabled and scatter(..., broadcast=True) is called in user code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Replicate a task that is not yet in memory
- Create more replicas of a task than there are workers
- Create replicas of a task on workers that already hold them
- Create replicas on paused or retiring workers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to enable debug logs for issues like these? When developing a new policy, should you try to avoid generating unacceptable suggestions (in which case you'd want to be able to debug these issues), or is generating unacceptable suggestions (and relying on the AMM to ignore them) in a policy fine and intended if it makes the policy code simpler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. Policies should be simple and readable. The subtle edge case management is left to the AMM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note to this section clarifying that policies shouldn't worry about these cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me.

ipython
prometheus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these rows deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those pages don't exist anywhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- these are over in the dask/dask docs now (e.g. https://docs.dask.org/en/stable/setup/prometheus.html)

@crusaderky crusaderky merged commit cdc68cc into dask:main Oct 22, 2021
@crusaderky crusaderky deleted the AMM/docs branch October 22, 2021 16:15
zanieb pushed a commit to zanieb/distributed that referenced this pull request Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improve or add to documentation memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants