Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bagua Strategy #11146

Merged
merged 89 commits into from Feb 4, 2022
Merged

Add Bagua Strategy #11146

merged 89 commits into from Feb 4, 2022

Conversation

wangraying
Copy link
Contributor

@wangraying wangraying commented Dec 18, 2021

What does this PR do?

Fixes #10455
Fixes BaguaSys/bagua#304

Suggested follow ups:

  • Have Bagua provide Enums for the option strings
  • Add registry values for the different configurations
  • Update their bagua.distributed.run to bagua.distributed.launch to match torch

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli self-assigned this Dec 18, 2021
@awaelchli awaelchli added this to the 1.6 milestone Dec 18, 2021
@awaelchli awaelchli added the feature Is an improvement or enhancement label Dec 18, 2021
@wangraying wangraying changed the title [draft] feat: add bagua plugin Add Bagua Training Plugin Dec 18, 2021
@wangraying wangraying marked this pull request as draft December 18, 2021 12:33
@awaelchli awaelchli changed the title Add Bagua Training Plugin Add Bagua Strategy Dec 23, 2021
pytorch_lightning/strategies/bagua.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/bagua.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/bagua.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/bagua.py Outdated Show resolved Hide resolved
@mergify mergify bot added the has conflicts label Feb 3, 2022
@mergify mergify bot removed the has conflicts label Feb 4, 2022
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

pytorch_lightning/strategies/bagua.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/bagua.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/bagua.py Show resolved Hide resolved
tests/strategies/test_bagua_strategy.py Outdated Show resolved Hide resolved
pytorch_lightning/strategies/bagua.py Show resolved Hide resolved
@carmocca carmocca enabled auto-merge (squash) February 4, 2022 16:36
@carmocca carmocca merged commit 8c07d8b into Lightning-AI:master Feb 4, 2022
@awaelchli
Copy link
Member

🎉 🎉 🎉 🎉 🎉

@tchaton
Copy link
Contributor

tchaton commented Feb 4, 2022

Congrats ! Awesome work !

@@ -807,7 +816,7 @@ def select_cluster_environment(self) -> ClusterEnvironment:
rank_zero_info("Multiprocessing is handled by SLURM.")
return SLURMEnvironment()

for env_type in (TorchElasticEnvironment, KubeflowEnvironment, LSFEnvironment):
for env_type in (BaguaEnvironment, TorchElasticEnvironment, KubeflowEnvironment, LSFEnvironment):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does Bagua need to be the first environment to check?

)

@classmethod
def register_plugins(cls, plugin_registry: Dict) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name has to be register_strategies, otherwise won't be called
https://github.com/PyTorchLightning/pytorch-lightning/blob/1203094a201bd38f0b8b77d93bc39fc95f06d8ae/pytorch_lightning/strategies/strategy_registry.py#L137

There is no test covers strategy="bagua", so this issue didn't get caught
I will fix it in #11448, or feel free to open a seperate PR to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement priority: 0 High priority task ready PRs ready to be merged strategy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Bagua Training Plugin Support Bagua in PyTorch Lightning
9 participants