New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add partial_fit function to DecisionTreeClassifier #18889
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR. Can you show that this is faster than building the tree from scratch? |
@amueller I don't think speed is what I have in mind. The VFDT name might cause some confusion, but this PR is more like a preliminary step that allows future algorithms to have a focus on streaming data. In those cases, data samples would come continuously and saving all of them to wait for a batch fitting would be quite expensive. I will check the time differences in benchmarks though. Thanks for the advice! |
If the goal is online learning, this should be implemented as partial_fit,
but you would need to show that multiple calls to partial_fit roughly
equate to fitting in batch.
|
@jnothman Thank you for the advice. So another I am working on benchmarking with |
Yes, |
This comment has been minimized.
This comment has been minimized.
The estimators in the tree module sets scikit-learn/sklearn/tree/_classes.py Lines 165 to 167 in f33fb0a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some tests been disabled? The code should be covered earlier...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test error in test_different_endianness_joblib_pickle
doesn't seem to be related:
ValueError: Big-endian buffer not supported on little-endian compiler
This is actually related although in a not so trivial way. One likely fix would be to change More details:
|
@lesteve Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line:
X, y = fetch_california_housing(return_X_y=True)
causes the following error, which is definitely unrelated this time:
urllib.error.HTTPError: HTTP Error 403: Forbidden
This might be a temporary issue. We are planning to make a release with a retry mechanism at some point.
|
Reference Issues/PRs
First step for #18888
What does this implement/fix? Explain your changes.
partial_fit
function toDecisionTreeClassifier
Any other comments?
Collaboration of @neurodata
Thank you for feedback!