You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, it seems that this example does not handle gradient unscaling properly. The gradients should be unscaled when using mixed precision training before calling self.clip_gradients.
However, for manual optimization, the calling order is:
epoch_loop.manual_optimization.run()
model.training_step()
inside the training step, user manually backward the loss (with gradient scaling), and call optimizer.step().
In optimizer.step(), the gradients are unscaled, but the gradient clipping for the unscaled gradients are disabled due to manual optimization (in _after_closure -> _clip_gradients).
Above all, there seems to be no space for the user to insert gradient unscaling in training_step, since it's always unscaled in optimizer.step(). On ther other hand, The user is also unable to clip gradients after the gradients are unscaled before the optimizer's actual step.
So here comes a question, why not just also allow automatic gradient clipping for manual optimization? If users are supposed to take care of gradient clipping, most of the time they just simply call self.clip_gradients for unscaled gradients just like automatic optimization; if they want to do extra stuffs, they can make it via configure_gradient_clipping.
馃摎 Documentation
The doc of manual optimization give an example of gradient clipping (added by #16023):
However, it seems that this example does not handle gradient unscaling properly. The gradients should be unscaled when using mixed precision training before calling
self.clip_gradients
.cc @carmocca @justusschock @awaelchli @Borda
The text was updated successfully, but these errors were encountered: