Example of gradient clipping with manual optimization does not handle gradient unscaling properly #18089

function2-llx · 2023-07-15T07:28:38Z

📚 Documentation

The doc of manual optimization give an example of gradient clipping (added by #16023):

from lightning.pytorch import LightningModule


class SimpleModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.automatic_optimization = False

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()

        # compute loss
        loss = self.compute_loss(batch)

        opt.zero_grad()
        self.manual_backward(loss)

        # clip gradients
        self.clip_gradients(opt, gradient_clip_val=0.5, gradient_clip_algorithm="norm")

        opt.step()

However, it seems that this example does not handle gradient unscaling properly. The gradients should be unscaled when using mixed precision training before calling self.clip_gradients.

cc @carmocca @justusschock @awaelchli @Borda

The text was updated successfully, but these errors were encountered:

function2-llx · 2023-07-15T08:24:53Z

I'm not sure if this is a limitation or not, currently I actually find no simple way to achieve this.

For automatic optimization, gradient unscaling is performed right after the optimizer closure (training step, zero grad, backward) and before the gradient clipping (called in self._after_closure).
https://github.com/Lightning-AI/lightning/blob/e9c42ed11f68aafc18fe64a26d87118d57a5743c/src/lightning/pytorch/plugins/precision/amp.py#L78-L84
https://github.com/Lightning-AI/lightning/blob/e9c42ed11f68aafc18fe64a26d87118d57a5743c/src/lightning/pytorch/plugins/precision/precision_plugin.py#L77-L87
https://github.com/Lightning-AI/lightning/blob/e9c42ed11f68aafc18fe64a26d87118d57a5743c/src/lightning/pytorch/plugins/precision/precision_plugin.py#L116-L125

However, for manual optimization, the calling order is:

epoch_loop.manual_optimization.run()
model.training_step()
inside the training step, user manually backward the loss (with gradient scaling), and call optimizer.step().
In optimizer.step(), the gradients are unscaled, but the gradient clipping for the unscaled gradients are disabled due to manual optimization (in _after_closure -> _clip_gradients).

Above all, there seems to be no space for the user to insert gradient unscaling in training_step, since it's always unscaled in optimizer.step(). On ther other hand, The user is also unable to clip gradients after the gradients are unscaled before the optimizer's actual step.

So here comes a question, why not just also allow automatic gradient clipping for manual optimization? If users are supposed to take care of gradient clipping, most of the time they just simply call self.clip_gradients for unscaled gradients just like automatic optimization; if they want to do extra stuffs, they can make it via configure_gradient_clipping.

kkoutini · 2023-08-10T08:48:32Z

Hi, did you manage to find the correct way to do the scaling before the cliping for manual optimization ?

function2-llx · 2023-08-10T09:06:28Z

@kkoutini No, I give up and use fabric instead.

function2-llx added docs Documentation related needs triage Waiting to be triaged by maintainers labels Jul 15, 2023

awaelchli linked a pull request Feb 27, 2024 that will close this issue

WIP: Unscale the gradients before gradient clipping in manual optimization #19536

Draft

awaelchli added bug Something isn't working and removed docs Documentation related needs triage Waiting to be triaged by maintainers labels Feb 27, 2024

awaelchli added this to the 2.2.x milestone Feb 27, 2024

awaelchli self-assigned this Feb 27, 2024

awaelchli added the precision: amp Automatic Mixed Precision label Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example of gradient clipping with manual optimization does not handle gradient unscaling properly #18089

Example of gradient clipping with manual optimization does not handle gradient unscaling properly #18089

function2-llx commented Jul 15, 2023 •

edited by github-actions bot

function2-llx commented Jul 15, 2023 •

edited

kkoutini commented Aug 10, 2023

function2-llx commented Aug 10, 2023

Example of gradient clipping with manual optimization does not handle gradient unscaling properly #18089

Example of gradient clipping with manual optimization does not handle gradient unscaling properly #18089

Comments

function2-llx commented Jul 15, 2023 • edited by github-actions bot

📚 Documentation

function2-llx commented Jul 15, 2023 • edited

kkoutini commented Aug 10, 2023

function2-llx commented Aug 10, 2023

function2-llx commented Jul 15, 2023 •

edited by github-actions bot

function2-llx commented Jul 15, 2023 •

edited