-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Control automatic resubmission on SLURM #10601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Thank you for implementing this! |
@awaelchli Do you know which release this is expected to be a part of? I tried installing the latest 1.5.3 release (from 2 hours ago) and it doesn't seem to include the new SLURM |
@marksibrahim Lightning 1.6, so a bit far a way. You will have to install directly from the GitHub repository in the meantime. |
What does this PR do?
Fixes #6389
Adds the ability to turn on or off the automatic resubmission that Lightning does when a job gets interrupted by the SLURM controller (via signal handling). Users who prefer to let libraries handle the resubmission (e.g. submitit) can now set
Idea from @ananthsub.
Whether the the default for
auto_requeue
is True or False is currently being debated on the linked issue. For now, the PR implements the behavior we have already (auto_requeue=True by default).Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃
Part of #1 (it's a lie, this is just here to avoid noisy GitHub bot)
cc @Borda @awaelchli