Rethink/Refactor Horovod Testing #11975

krshrimali · 2022-02-18T05:02:14Z

Proposed refactor

Tests written for horovod strategy, might be outdated as they were mostly written ~2 years back.

A lot of tests just check if the horovod run finished without any errors. The test purpose, however, may not be satisfied.
accelerator="auto" can be used, wherever possible to avoid separate tests for cpu and gpu devices.
For some tests, we don't need to pass default_root_dir and weights_save_path like arguments, and just pass the relevant arg for that test (like gradient_clip_algorithm).

Motivation

While working on #11911, @carmocca explained how these tests can be refactored, and creating an issue to rethink the strategy for testing horovod is a good idea.

Pitch

Welcoming comments and discussions on this one.

An example:

test_horovod_cpu_clip_grad_by_value just tests if the horovod run finished without any errors. It doesn't check if gradient_clip_val was correctly used. We can avoid running the process, and instead just check if gradient_clip_val served its purpose.
Minor changes were made to the tests here as well: https://github.com/PyTorchLightning/pytorch-lightning/pull/11911/files.

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @carmocca @ananthsub @justusschock @awaelchli @rohitgr7 @Borda @akihironitta @kaushikb11

The text was updated successfully, but these errors were encountered:

carmocca · 2022-02-22T17:08:59Z

Thanks for opening this!

The tests rely too much on launching this script:
https://github.com/PyTorchLightning/pytorch-lightning/blob/2d043857eef55eb327528b70c8b33c680e82fb7b/tests/models/data/horovod/train_default_model.py#L1-L16
https://github.com/PyTorchLightning/pytorch-lightning/blob/2d043857eef55eb327528b70c8b33c680e82fb7b/tests/models/test_horovod.py#L42-L65

We should limit its use and instead have more focused unit tests the different features supported.

carmocca · 2022-04-06T02:30:53Z

Linked to this topic is the idea of upstreaming the Horovod strategy, which means we would remove all these tests from this CI.

krshrimali added refactor tests strategy: horovod (removed) labels Feb 18, 2022

carmocca added this to the future milestone Apr 6, 2022

carmocca assigned Borda and unassigned Borda Apr 6, 2022

Borda self-assigned this Apr 6, 2022

carmocca mentioned this issue Dec 20, 2022

Deprecate the HorovodStrategy #16141

Merged

12 tasks

carmocca removed this from the future milestone Dec 20, 2022

carmocca unassigned Borda Dec 20, 2022

carmocca closed this as completed in #16141 Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink/Refactor Horovod Testing #11975

Rethink/Refactor Horovod Testing #11975

krshrimali commented Feb 18, 2022 •

edited

carmocca commented Feb 22, 2022 •

edited

carmocca commented Apr 6, 2022

Rethink/Refactor Horovod Testing #11975

Rethink/Refactor Horovod Testing #11975

Comments

krshrimali commented Feb 18, 2022 • edited

Proposed refactor

Motivation

Pitch

If you enjoy Lightning, check out our other projects! ⚡

carmocca commented Feb 22, 2022 • edited

carmocca commented Apr 6, 2022

krshrimali commented Feb 18, 2022 •

edited

carmocca commented Feb 22, 2022 •

edited