Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Changed behavior] n_iter_no_change should be attached with early_stopping, not model #19743

Closed
IchiruTake opened this issue Mar 22, 2021 · 4 comments

Comments

@IchiruTake
Copy link

I have used Neural Network to validate the result. Surprisingly, n_iter_no_change is attached directly into the model instead, although it followed the docmuentation, but get confused for this hyper-parameter. The data is performed on AND gate. This happens on version 0.22 -> 0.24.
Solution: Changed n_iter_no_change so that this hyper-parameter works only if early_stopping=True

import numpy as np
from time import time

X = np.array([[0, 0, 0, 0], [0, 0, 0, 1], 
              [0, 0, 1, 0], [0, 0, 1, 1],
              [0, 1, 0, 0], [0, 1, 0, 1],
              [0, 1, 1, 0], [0, 1, 1, 1],
              [1, 0, 0, 0], [1, 0, 0, 1],
              [1, 0, 1, 0], [1, 0, 1, 1],
              [1, 1, 0, 0], [1, 1, 0, 1],
              [1, 1, 1, 0], [1, 1, 1, 1],], dtype=np.uint8)
Y = np.array([[0]] * 15 + [[1]], dtype=np.uint8)

CASE #1:

import sklearn
from sklearn.neural_network import MLPRegressor, MLPClassifier

print(sklearn.__version__)
start = time()
model = MLPRegressor(hidden_layer_sizes=(16, 4, ), activation='logistic', solver='adam', max_iter=7500, 
                     shuffle=False, n_iter_no_change=10, learning_rate_init=0.001, nesterovs_momentum=True)

model.fit(X, Y.ravel())
pred = model.predict(X)
print(pred)
print("Executing Time: {:.6f}s".format(time() - start))

CASE #2:

import sklearn
from sklearn.neural_network import MLPRegressor, MLPClassifier

print(sklearn.__version__)
start = time()
model = MLPRegressor(hidden_layer_sizes=(16, 4, ), activation='logistic', solver='adam', max_iter=7500, 
                     shuffle=False, n_iter_no_change=7500, learning_rate_init=0.001, nesterovs_momentum=True)

model.fit(X, Y.ravel())
pred = model.predict(X)
print(pred)
print("Executing Time: {:.6f}s".format(time() - start))

Result for Case #1:

pred = [0.14804349 0.13967124 0.14385876 0.13574606 0.1296776  0.12198994
 0.12610933 0.11865247 0.13068479 0.12283299 0.12688654 0.11929599
 0.11350414 0.10649617 0.11028053 0.10350576]

Result for Case #2:

pred = [ 1.55467171e-02  5.70094411e-04  3.03478798e-03 -6.76589593e-03
  1.42867468e-03 -7.64987634e-03 -7.58148435e-03  5.68008762e-03
 -7.20951138e-03 -3.52291628e-03 -3.47743642e-03  5.96549015e-03
 -5.02314026e-03  7.11164102e-03  7.02257007e-03  9.94591169e-01]

Note: The speed compared on Neural Net was better compared to 0.22
Version 0.22.post1: Executing Time: 2.344778s
Version 0.24.1: Executing Time: 2.290114s (2.38 % better)

@glemaitre
Copy link
Member

There is no bug here. Basically if early_stopping is activated, we use the validation score to stop the neural network. Otherwise, the loss is used. I think this is the missing piece of information that should be added to the documentation. Indeed, by passing verbose=True, it becomes obvious. Case 1, would have print the following:

Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.

While if activating early_stopping, it would have trigger:

Validation score did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.

@IchiruTake did I miss something in which case do not hesitate to correct me.

@glemaitre
Copy link
Member

If I did not miss anything (and that there is no regression) @IchiruTake do you want to submit a PR to improve the documentation?

@IchiruTake
Copy link
Author

If I did not miss anything (and that there is no regression) @IchiruTake do you want to submit a PR to improve the documentation?

There is no bug here. Basically if early_stopping is activated, we use the validation score to stop the neural network. Otherwise, the loss is used. I think this is the missing piece of information that should be added to the documentation. Indeed, by passing verbose=True, it becomes obvious. Case 1, would have print the following:

Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.

While if activating early_stopping, it would have trigger:

Validation score did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.

@IchiruTake did I miss something in which case do not hesitate to correct me.

This will thus make the definition to be easier to understand

I am just a beginner developer, thus not good as fixing some other people's code. I am trying to improve myself.
In case of early_stopping,
If early_stopping is False, we don't make any stop, thus don't violate the meaning of early_stopping, which means that the model will keep training till the number of maximum epochs.
If early_stopping is True,

  • validation_fraction = 0: The model will make stopping in training set
  • validation_fraction > 0: The model will make stopping in validation set
    This would be clearer.
    Moreover, can you add Nesterov + Adam to compute?

@jeremiedbb
Copy link
Member

Fixed in #19818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants