Engine: kwargs #1156

ditwoo · 2021-04-03T22:28:25Z

Before submitting (checklist)

Description

Related Issue

Type of Change

Examples / docs / tutorials / contributors update
Bug fix (non-breaking change which fixes an issue)
Improvement (non-breaking change which improves an existing feature)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

PS

I know, that I could join slack for pull request discussion.

Scitator · 2021-04-05T05:26:24Z

catalyst/engines/apex.py

-        model = ApexDistributedDataParallel(model, delay_allreduce=self.delay_all_reduce)
+        model, optimizer = amp.initialize(model, optimizer, **self.apex_kwargs)
+        # TODO: kwargs for Apex DDP ?
+        model = ApexDistributedDataParallel(model)  # , delay_allreduce=self.delay_all_reduce)


could we also add ddp_kwargs and pass them here?

in this case, we have to remove ** from the init and make apex_krargs and ddp_kwargs dict storages

Scitator

the PR looks amazing, nevertheless could we please make a few extra changes:

rename ddp_kwargs to dist_kwargs or process_kwargs, as far as they are used for torch.distributed.init_process_group
add truly ddp_kwargs and use them for ApexDistributedDataParallel and DistributedDataParallel wrappers
add an extra expectation for such cases - I mean, we should raise an error if we could not wrap the model correctly

Huge thanks in advance!

catalyst/engines/tests/test_distributed_amp.py

Scitator · 2021-04-09T05:08:03Z

catalyst/engines/torch.py

+        if "device_ids" not in self.ddp_kwargs:
+            self.ddp_kwargs["device_ids"] = [self.device]


we should move this under def setup_process(self, rank: int = -1, world_size: int = 1): in the end
cause self.device = None

Scitator · 2021-04-09T05:09:01Z

catalyst/engines/torch.py

        os.environ["MASTER_ADDR"] = str(self.address)
        os.environ["MASTER_PORT"] = str(self.port)
-        dist.init_process_group(self.backend, rank=self.rank, world_size=self.world_size)
+        dist.init_process_group(**self.process_group_kwargs)
        torch.cuda.set_device(int(self._rank))
        self.device = f"cuda:{int(self._rank)}"


Suggested change

self.device = f"cuda:{int(self._rank)}"

self.device = f"cuda:{int(self._rank)}"

if "device_ids" not in self.ddp_kwargs:

self.ddp_kwargs["device_ids"] = [self.device]

Scitator · 2021-04-09T05:09:26Z

catalyst/engines/torch.py

+        self.ddp_kwargs = copy.deepcopy(ddp_kwargs)
+        if "device_ids" not in self.ddp_kwargs:
+            self.ddp_kwargs["device_ids"] = [self.device]


Suggested change

self.ddp_kwargs = copy.deepcopy(ddp_kwargs)

if "device_ids" not in self.ddp_kwargs:

self.ddp_kwargs["device_ids"] = [self.device]

self.ddp_kwargs = copy.deepcopy(ddp_kwargs)

apex kwargs

7551558

ditwoo requested review from bagxi and Scitator as code owners April 3, 2021 22:28

removed unused typing import (Union)

c64d023

Scitator reviewed Apr 5, 2021

View reviewed changes

ditwoo added 4 commits April 7, 2021 20:33

apex kwargs now is a dict

a861c0b

scaler_kwargs

3f95b31

feature: ddp_kwargs

5657ed0

docs: examples of kwargs usage

4dd7f68

Scitator reviewed Apr 8, 2021

View reviewed changes

feature: ddp_kwargs -> process_group_kwargs

25d218c

Scitator reviewed Apr 8, 2021

View reviewed changes

catalyst/engines/tests/test_distributed_amp.py Outdated Show resolved Hide resolved

feature: ddp_kwargs

8d346f1

Scitator reviewed Apr 9, 2021

View reviewed changes

ditwoo added 2 commits April 9, 2021 19:18

fix: ddp_kwargs default arguments when determined device

16aac6e

fix: removed redundant comment

72015a8

Scitator approved these changes Apr 9, 2021

View reviewed changes

Scitator merged commit 8940ef0 into master Apr 9, 2021

mergify bot deleted the engine-kwargs branch April 9, 2021 18:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engine: kwargs #1156

Engine: kwargs #1156

ditwoo commented Apr 3, 2021 •

edited

Scitator Apr 5, 2021

Scitator Apr 5, 2021

Scitator left a comment

Scitator Apr 9, 2021

Scitator Apr 9, 2021

Scitator Apr 9, 2021

		if "device_ids" not in self.ddp_kwargs:
		self.ddp_kwargs["device_ids"] = [self.device]

Engine: kwargs #1156

Engine: kwargs #1156

Conversation

ditwoo commented Apr 3, 2021 • edited

Before submitting (checklist)

Description

Related Issue

Type of Change

PR review

Scitator Apr 5, 2021

Choose a reason for hiding this comment

Scitator Apr 5, 2021

Choose a reason for hiding this comment

Scitator left a comment

Choose a reason for hiding this comment

Scitator Apr 9, 2021

Choose a reason for hiding this comment

Scitator Apr 9, 2021

Choose a reason for hiding this comment

Scitator Apr 9, 2021

Choose a reason for hiding this comment

ditwoo commented Apr 3, 2021 •

edited