CacheDataset with DDP and Multi-GPUs #11763
-
We use CacheDataset MONAI CacheDataset to speed up data loading. However, when combining the lightning module's standard training code with DDP strategy and multi-GPU environment, the cached dataset is not working as expected: If provided with a full length of data in the CacheDataset, the initial epoch takes forever to load because each GPU will try to read in and cache ALL data, which is unnecessary because in DDP each GPU will only use a portion of the data. A workaround is mentioned in here MONAI issue, which mentioning to partition data before feeding into the CacheDataset: However, if I make the partitioning in the setup() function, the trainer will train for total_data_length // num_gpus samples each epoch instead of total_ data_length. And if I put the CacheDataset with full data length in the prepare_data function, the subprocess's object can't access the dataset instance (saved in self.x, which is not recommended). So what's the best practical way to handle this? My gut feeling is that I should use the partitioned dataset on each GPU, and let the loader use the full length of dataset instead of part of it. Any suggestions? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
hey @bill-yc-chen since DDP executes scripts independently across devices, maybe try DDP_Spawn instead? |
Beta Was this translation helpful? Give feedback.
hey @bill-yc-chen
since DDP executes scripts independently across devices, maybe try DDP_Spawn instead?
https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html#sharing-datasets-across-process-boundaries