Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneeded warning from Random Forest Classifier / Regressor #5233

Open
wants to merge 5 commits into
base: branch-23.04
Choose a base branch
from

Conversation

miguelusque
Copy link
Member

Fixes #5224 . The user is presented a warning related with reproducible results when using Random Forest. IMO, that information should be in the API, and not presented as a warning.

I have updated the API documentation and also removed the warning.

Hope it helps!
Miguel

I have moved this warning to the documentation.IMHO,  A user who is not thinking on reproducibility might be not interested in reading this warning each time they train their models.
Update API documentation for 'n_streams' parameter. Removed unused 'warnings' import.
Update API documentation for 'n_streams' parameter. Removed unused 'warnings' import.
@miguelusque miguelusque requested a review from a team as a code owner February 14, 2023 08:50
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 14, 2023
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution! I agree that the warnings are better suited to be placed in the API documentation.

I have made two minor suggestions for a IMO slight improvement in wording.

Finally, I think we we would also need to update the corresponding documentation for the respective dask classes:

  • python/cuml/dask/ensemble/randomforestclassifier.py
  • python/cuml/dask/ensemble/randomforestregressor.py

Although I am not sure whether they would ever be (almost) reproducible in a multi-GPU context.

python/cuml/ensemble/randomforestclassifier.pyx Outdated Show resolved Hide resolved
Comment on lines 191 to 194
Number of parallel streams used for forest building.
For almost reproducible results, n_streams = 1 is recommended. If
n_streams is > 1, results may vary due to stream/thread timing
differences, even when random_state is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Number of parallel streams used for forest building.
For almost reproducible results, n_streams = 1 is recommended. If
n_streams is > 1, results may vary due to stream/thread timing
differences, even when random_state is set.
Number of parallel streams used for forest building.
For almost reproducible results, it is recommended to set n_streams = 1.
If n_streams is set to a value greater than 1, there could be variations
in the results due to unpredictable differences in stream/thread timing,
even if random_state is specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user I might be confused by the term "almost reproducible results". Is there a better way to describe or quantify this? Or should we just not promise reproducibility at all? I understand that this question might be outside of the scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Carl, I couldn't say. The "almost reproducible results" statement was part of the original warning.

Hi @cjnolet , could you please help us with the above? Thanks!

@csadorf csadorf added doc Documentation non-breaking Non-breaking change labels Feb 14, 2023
miguelusque and others added 2 commits February 14, 2023 18:31
Co-authored-by: Carl Simon Adorf <sadorf@nvidia.com>
Co-authored-by: Carl Simon Adorf <sadorf@nvidia.com>
@csadorf
Copy link
Contributor

csadorf commented Feb 28, 2023

@miguelusque Can you update this branch, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue doc Documentation non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Remove UserWarning from RandomForestClassifier
2 participants