Fix gradient clipping for Sharded DDP #9168

sgugger · 2020-12-17T14:25:41Z

What does this PR do?

As mentioned in the discussion of #9156, Trainer does not do gradient clipping correctly when using a sharded optimizer. This PR fixes that, and also allows Trainer to not perform any gradient clipping (by passing None or 0 to the corresponding argument).

LysandreJik

Looks good to me! Thanks for adding comments

* Fix gradient clipping for Sharded DDP * Fix typos in comments

Fix gradient clipping for Sharded DDP

1be753b

sgugger requested a review from LysandreJik December 17, 2020 14:25

Fix typos in comments

eebbd4b

LysandreJik approved these changes Dec 17, 2020

View reviewed changes

LysandreJik merged commit 77d6941 into master Dec 17, 2020

LysandreJik deleted the sharded_ddp_clip branch December 17, 2020 14:44

stas00 mentioned this pull request Dec 17, 2020

Sharded DDP training fails with seq2seq models #9156

Closed

4 tasks

guyrosin pushed a commit to guyrosin/transformers that referenced this pull request Jan 15, 2021

Fix gradient clipping for Sharded DDP (huggingface#9168)

6eb9104

* Fix gradient clipping for Sharded DDP * Fix typos in comments

statelesshz mentioned this pull request Oct 7, 2023

remove the obsolete code related to fairscale FSDP #26651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gradient clipping for Sharded DDP #9168

Fix gradient clipping for Sharded DDP #9168

sgugger commented Dec 17, 2020

LysandreJik left a comment

Fix gradient clipping for Sharded DDP #9168

Fix gradient clipping for Sharded DDP #9168

Conversation

sgugger commented Dec 17, 2020

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment