New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace boston in ensemble test_forest #16927
Conversation
I think the failures are due to the use of the california dataset, this is a message from one of the failures:
But other tests use the california dataset as well so I don't understand the cause of the failure... |
Hi @lucyleeow the failing test seems to be related to pytest-dev/pytest#6925, as only CI with pytest 5.4.1 are failing. |
It might be because i didn't add |
While reviewing your PRs, this dataset looks have pretty poor results on test sets compared to the training set, which means the default parameters are overfitting: from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_validate
X, y = load_diabetes(return_X_y=True)
results = cross_validate(RandomForestRegressor(random_state=0), X, y,return_train_score=True)
print("train score", results['train_score'].mean())
print("test score", results['test_score'].mean())
# train score 0.9210138461843774
# test score 0.4230661480472566 |
Oh wow that's a big difference. What kind of tests should I be careful on? I might not be good at assessing this e.g., Edit: should I tune parameters and use tuned parameters for all the tests? |
I would try to to use |
Hum maybe diabetes is too hard of a dataset to expect good generalization accuracy from the |
Or maybe this is good enough for such tests. It's still significantly better than random. |
Or feel free to use |
I tried using Tried Is this a problem with the way the dataset is generated? Since
Gives:
Regardles, happy to keep as diabetes as well. |
I think that one expects the OOB score to be really close to the score that you will obtain on the test set. So for this test, I would make the diff between the OOB and the test and check that it is smaller than I am really surprised for the diabetes results indeed. |
@@ -389,7 +389,7 @@ def check_oob_score(name, X, y, n_estimators=20): | |||
assert abs(test_score - est.oob_score_) < 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we are already doing this for the classification. I think it makes sense to do that for the regression.
We only require a comment to mention that in the first case, this is a diff between accuracies and in the second one a diff between R2.
I read the code wrong and thought we were fitting and testing on the same data, which is why I thought oob_score would always be worse. It makes sense. Will fix, thanks @glemaitre |
Thanks @glemaitre. I amended all test to use the generated regression dataset. |
Thanks @lucyleeow |
Reference Issues/PRs
Towards #16155
What does this implement/fix? Explain your changes.
Replaces boston dataset with
subset of california housingdiabetes dataset insklearn/ensemble/tests/test_forest.py
Any other comments?
Did not use diabetes dataset due to poor R2 score and oob score intest_oob_score_regressors
(as picked by @adrinjalali in prev PR).Poor R2 score in
test_oob_score_regressors
with diabetes dataset. Happy to change to California/another dataset if this is a problem.