Effective batch size in DDP #13165

pengzhangzhi · 2022-05-27T06:49:44Z

pengzhangzhi
May 27, 2022

I have max batch size of 4 in single gpu. If 2 gpus are used, should I increase the batch size to 8 such that each gpu gets 4 batches. Or I just keep it as 4 and PL will load 2 four-batches data to 2 gpus?
Based on the doc,

In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.

I think it is the second case?

I also have another problem related to ddp training, which is posted on this link below.
https://forums.pytorchlightning.ai/t/how-to-initialize-tensors-that-are-in-the-right-device-when-ddp-are-used/1708

I post it here for convenience.

I am incorporating a pytorch based model into the pl framework for ddp training.
I have a lightning model

class ZfoldLightning(pl.LightningModule):
    def __init__(self, hparams):
        ...
        self.model = XFold(MODEL_PARAM)

which initializes the XFold model in __init__.
However, the XFold model contains many 'to device' code like b = torch.randn(1).to(a.device), which is not recommended by PL.
I tried to increase the batch size and train this model on two device. this does not work. OOM error appears. Turns out even DDP is used, I can only use the same batch size as that of single gpu. I think the reason is that all the tensors are stored in one gpu no matter how many gpus are ultized.

One solution is to refactor those to device code and use the recommended usage a.type_as(b). But there are to many of code to refactor.
I am wondering if there are better solutions?
Any helps?

Answered by pengzhangzhi

May 30, 2022

I have solved my problem and find out that the answer is: each gpu get #batch_size batches. If you have batch_size of 2 and 2 gpus are utilized, each gpu gets 2 batches and 4 batches in total are feed into a forward pass.

View full answer

pengzhangzhi · 2022-05-30T01:00:25Z

pengzhangzhi
May 30, 2022
Author

I have solved my problem and find out that the answer is: each gpu get #batch_size batches. If you have batch_size of 2 and 2 gpus are utilized, each gpu gets 2 batches and 4 batches in total are feed into a forward pass.

2 replies

SagiPolaczek Sep 30, 2022

Thanks for sharing

Ki-Zhang Mar 9, 2024

Thanks for sharing, and I would like to ask whether the data in the batch obtained by the two GPUs are different or the same？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effective batch size in DDP #13165

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Effective batch size in DDP #13165

pengzhangzhi May 27, 2022

Replies: 1 comment · 2 replies

pengzhangzhi May 30, 2022 Author

SagiPolaczek Sep 30, 2022

Ki-Zhang Mar 9, 2024

pengzhangzhi
May 27, 2022

Replies: 1 comment 2 replies

pengzhangzhi
May 30, 2022
Author