[MRG] Improve stability of SGDClassifier / SGDRegressor with gradient clipping #3883
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
squared_hinge
loss ofSGDClassifier
(and potentially thesquared
loss ofSGDRegressor
) tend to trigger numerical overflows even on normalized data for some hyper parameter combinations.This PR fixes that issue by clipping
dloss
to1e12
. All existing still tests pass.I have also had to prevent strong l2 regularization with large learning rates to trigger negative scales (which are meaningless and can also cause numerical divergence if lower than -1). Instead I set the weights to zero in that case. A new non regression tests highlights this case as well.
Both non regression tests were inspired by #3040. They both fail at epoch #2 and #3 of the iris data with the
sgd_fast.pyx
implementation from master.