Gradient checkpointing with DDP in a loop #10479
Replies: 4 comments 3 replies
-
Dear @shivammehta007 , I also got this error, does it have been solved? |
Beta Was this translation helpful? Give feedback.
-
This does not appear to be a lightning issue, but rather with DistributedDataParallel from torch.distributed not supporting gradient checkpointing |
Beta Was this translation helpful? Give feedback.
-
Currently, I solved this problem. Cause of the model has parameters that were not used in producing a loss. Do the following two settings, and you will find the unused parameter names.
|
Beta Was this translation helpful? Give feedback.
-
Hi @kuixu, we can find the parameter names but how to go about the solution? Do we need to remove them or what? How will the issue be solved? |
Beta Was this translation helpful? Give feedback.
-
Since my method is an Autoregressive algorithm It is making a huge gradient tape, I am trying to do something like this
It works fine on single GPU but on DDP it throws this error
I am running it with
Any workaround for this?
Beta Was this translation helpful? Give feedback.
All reactions