-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unneeded warning from Random Forest Classifier / Regressor #5233
base: branch-23.04
Are you sure you want to change the base?
Conversation
I have moved this warning to the documentation.IMHO, A user who is not thinking on reproducibility might be not interested in reading this warning each time they train their models.
Update API documentation for 'n_streams' parameter. Removed unused 'warnings' import.
Update API documentation for 'n_streams' parameter. Removed unused 'warnings' import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the contribution! I agree that the warnings are better suited to be placed in the API documentation.
I have made two minor suggestions for a IMO slight improvement in wording.
Finally, I think we we would also need to update the corresponding documentation for the respective dask classes:
- python/cuml/dask/ensemble/randomforestclassifier.py
- python/cuml/dask/ensemble/randomforestregressor.py
Although I am not sure whether they would ever be (almost) reproducible in a multi-GPU context.
Number of parallel streams used for forest building. | ||
For almost reproducible results, n_streams = 1 is recommended. If | ||
n_streams is > 1, results may vary due to stream/thread timing | ||
differences, even when random_state is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of parallel streams used for forest building. | |
For almost reproducible results, n_streams = 1 is recommended. If | |
n_streams is > 1, results may vary due to stream/thread timing | |
differences, even when random_state is set. | |
Number of parallel streams used for forest building. | |
For almost reproducible results, it is recommended to set n_streams = 1. | |
If n_streams is set to a value greater than 1, there could be variations | |
in the results due to unpredictable differences in stream/thread timing, | |
even if random_state is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user I might be confused by the term "almost reproducible results". Is there a better way to describe or quantify this? Or should we just not promise reproducibility at all? I understand that this question might be outside of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Carl, I couldn't say. The "almost reproducible results" statement was part of the original warning.
Hi @cjnolet , could you please help us with the above? Thanks!
Co-authored-by: Carl Simon Adorf <sadorf@nvidia.com>
Co-authored-by: Carl Simon Adorf <sadorf@nvidia.com>
@miguelusque Can you update this branch, please? |
Fixes #5224 . The user is presented a warning related with reproducible results when using Random Forest. IMO, that information should be in the API, and not presented as a warning.
I have updated the API documentation and also removed the warning.
Hope it helps!
Miguel