New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interaction constraint shortcuts to HistGradientBoosting*
#24849
Add interaction constraint shortcuts to HistGradientBoosting*
#24849
Conversation
This is ready for review. |
I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different). Maybe @lorentzenchr can confirm or deny? |
Hm I'm thinking now that with all pairwise constraints, each leaf is an indicator with two features, and a tree is a weighted sum of these indicators, so it is actually possible to rewrite the function in this way, just in a less obvious way than for EBMs? If that's the case that would be another interesting unit test maybe? |
This could be a separate PR, but I would suggest breaking up https://scikit-learn.org/dev/auto_examples/inspection/plot_partial_dependence.html#sphx-glr-auto-examples-inspection-plot-partial-dependence-py into two examples, and showing that with the interaction constraints the predictions are actually the sum of the partial dependencies, both for the univariate case and the pairwise interaction case. For reference, I tried this with the no interactions estimator (when only training on the features that we compute partial dependence for): from sklearn.utils.extmath import cartesian
grid = cartesian(x['values'][0] for x in display.pd_results)
partials = cartesian(x['average'][0] for x in display.pd_results).sum(axis=-1)
preds = est.predict(grid)
preds - partials # should be zero and isn't ,but at least it's a constant I'm not sure why there's a constant offset but it's still a cool illustration that it's actually an additive model now, aka the partial dependencies are all there is. |
I think this can be done as part of the current PR, right? That would be nice to link to such an example from the release highlights. |
It might be related to the fact that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the concern with the white space. LGTM.
I would also like at least a quick update of an existing example (or new example) to show the "no interaction" which might be quite common.
A full-blown example for EBM-style glassbox modeling might come in a follow-up PR.
The example removes the mean of y on the whole dataset. I can try removing just the training set mean and not using early stopping, but even without early stopping, the difference doesn't seem directly related to the mean of I was gonna try and empirically confirm the glass box assertion was also true for the pairwise model but got sidetracked with meetings :-/ |
@amueller Side note: Github has |
I'll look at adding an example. I'd prefer to punt all the other ideas about changes to examples or new examples to a separate PR (or PRs). "hashtag one idea per PR" |
@betatim I agree. I also prefer a new PR for an additional example to keep this PR small. |
Any recommendations for a good didactic dataset/setup for an example? Otherwise I'll do a bit of research to find something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. 2 comments:
- Should we add a test that models are identical for string options and their manual pendent?
- In the original PR for interaction constraints, concerns were raised that "pairwise" expands to a list with
$n(n-1)/2$ elements. For 10000 features this is not great. One could use iterators/generators inside the code to improve memory footprint, but I would not do so as long as the need does not arise.
I saw that, not sure there is much we can do about it. A few lines later in Rough estimate: 1000 features, 10000 samples leads to ~80000000 bytes ( In the case of 10000 features it takes a considerable amount of time to even run So I'd wait and see how many people have datasets with 10000 features and for whom using ~500MB of memory to represent the constraints is a problem. My guess is that most people with that many features use a large amount of memory just for their data, so another 500MB here or there is a rounding error in the total budget. |
As said earlier, I would wait if memory problems with |
The reason I picked a (random) value for |
Regarding the example: I updated the existing PDP example that illustrates the "no interactions" case to use the shortcut. Is that the kind of example you were after @ogrisel? Regarding EBMs and "no interaction" boosted trees: from watching https://www.youtube.com/watch?v=MREiHgHgl0k I think there is a difference. EBMs are forced to use every feature in the dataset, they fit one tree per feature per "iteration" (one iteration is completed when each feature has been used). For boosted trees with "no interactions" we have no promise regarding which feature gets used/not used. TL;DR: something for a new PR after a bit of thinking. |
@betatim Could you fix the linting errors (activating pre-commit hooks might help with black). |
Formatting fixed, before we merge this we should look at (and decide on a way to resolve) #24899. |
That was my understanding as well :) so the main interpretability aspect is still there, but it's "just different" from the EBMs. I think it's worth saying both of these in the new example in the new PR :) |
HistGradientBoosting*
Allow users to specify common interaction constraints as a string.
Renamed parameter value to use an underscore and use `StrOptions` to describe valid parameter values.
4cabe16
to
152e622
Compare
Updated now that #24899 is merged. I think all the comments about examples are meant for a new PR, this means I think this is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
I directly pushed the required fixes for the CI and some nitpicks. We can remove the test checking for the error message when invalid string for the parameter. This is checked by a common test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @betatim |
Very cool to have this PR in 1.2. Thanks @betatim and everyone involved. |
Allow users to specify common interaction constraints as a string. Users can specify
"pairwise"
or"no interactions"
as a shortcut to having to explicitly provide the interactions.Reference Issues/PRs
Closes #24845