Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interaction constraint shortcuts to HistGradientBoosting* #24849

Merged
merged 18 commits into from Nov 25, 2022

Conversation

betatim
Copy link
Member

@betatim betatim commented Nov 7, 2022

Allow users to specify common interaction constraints as a string. Users can specify "pairwise" or "no interactions" as a shortcut to having to explicitly provide the interactions.

Reference Issues/PRs

Closes #24845

@betatim betatim changed the title Add interaction constraint shortcuts Add interaction constraint shortcuts to HistGradientBoosting* Nov 7, 2022
@betatim betatim marked this pull request as ready for review November 7, 2022 13:20
@betatim
Copy link
Member Author

betatim commented Nov 9, 2022

This is ready for review.

@amueller
Copy link
Member

amueller commented Nov 9, 2022

I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different).
For pairwise interactions, I think it would be nice to maybe have a separate example that explains what it means. Honestly I don't have a good intuition, and I think it would be good to explain that it's not an additive model on pairwise interactions (if I remember correctly), i.e. the models can't be written as
image

Maybe @lorentzenchr can confirm or deny?

@amueller
Copy link
Member

amueller commented Nov 9, 2022

Hm I'm thinking now that with all pairwise constraints, each leaf is an indicator with two features, and a tree is a weighted sum of these indicators, so it is actually possible to rewrite the function in this way, just in a less obvious way than for EBMs?

If that's the case that would be another interesting unit test maybe?

@amueller
Copy link
Member

amueller commented Nov 9, 2022

This could be a separate PR, but I would suggest breaking up https://scikit-learn.org/dev/auto_examples/inspection/plot_partial_dependence.html#sphx-glr-auto-examples-inspection-plot-partial-dependence-py into two examples, and showing that with the interaction constraints the predictions are actually the sum of the partial dependencies, both for the univariate case and the pairwise interaction case.

For reference, I tried this with the no interactions estimator (when only training on the features that we compute partial dependence for):

from sklearn.utils.extmath import cartesian
grid = cartesian(x['values'][0] for x in display.pd_results)
partials = cartesian(x['average'][0] for x in display.pd_results).sum(axis=-1)
preds = est.predict(grid)
preds - partials # should be zero and isn't ,but at least it's a constant

I'm not sure why there's a constant offset but it's still a cool illustration that it's actually an additive model now, aka the partial dependencies are all there is.

@ogrisel
Copy link
Member

ogrisel commented Nov 10, 2022

I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different).

I think this can be done as part of the current PR, right? That would be nice to link to such an example from the release highlights.

@ogrisel
Copy link
Member

ogrisel commented Nov 10, 2022

I'm not sure why there's a constant offset.

It might be related to the fact that the method="recursion" for tree based models does not account for the level 0 boosting iteration of gradient boosting that basically predicts the marginal mean of y IIRC. Subtracting the mean y before fitting might "fix" the discrepancy.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the concern with the white space. LGTM.

I would also like at least a quick update of an existing example (or new example) to show the "no interaction" which might be quite common.

A full-blown example for EBM-style glassbox modeling might come in a follow-up PR.

@amueller
Copy link
Member

Subtracting the mean y before fitting might "fix" the discrepancy.

The example removes the mean of y on the whole dataset. I can try removing just the training set mean and not using early stopping, but even without early stopping, the difference doesn't seem directly related to the mean of y.

I was gonna try and empirically confirm the glass box assertion was also true for the pairwise model but got sidetracked with meetings :-/

@lorentzenchr
Copy link
Member

lorentzenchr commented Nov 11, 2022

@amueller Side note: Github has $\LaTeX$ ($) support now.
Main note: If I‘m not mistaken, specifying pairwise interaction constraints gives a model $m$ that is pairwise additive in link space: $m(x) = h(\sum_i f_i(x_i) + \sum_{i\lt j} f_{i,j}(x_i,x_j))$. The functions $f$, however, are not identical to the trees. It may be the case that different branches (always including the root) of a tree belong to different $f$.

@betatim
Copy link
Member Author

betatim commented Nov 11, 2022

I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different).

I think this can be done as part of the current PR, right? That would be nice to link to such an example from the release highlights.

I'll look at adding an example.

I'd prefer to punt all the other ideas about changes to examples or new examples to a separate PR (or PRs). "hashtag one idea per PR"

@lorentzenchr
Copy link
Member

@betatim I agree. I also prefer a new PR for an additional example to keep this PR small.

@betatim
Copy link
Member Author

betatim commented Nov 11, 2022

Any recommendations for a good didactic dataset/setup for an example? Otherwise I'll do a bit of research to find something.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. 2 comments:

  • Should we add a test that models are identical for string options and their manual pendent?
  • In the original PR for interaction constraints, concerns were raised that "pairwise" expands to a list with $n(n-1)/2$ elements. For 10000 features this is not great. One could use iterators/generators inside the code to improve memory footprint, but I would not do so as long as the need does not arise.

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved
@betatim
Copy link
Member Author

betatim commented Nov 11, 2022

  • In the original PR for interaction constraints, concerns were raised that "pairwise" expands to a list ..

I saw that, not sure there is much we can do about it. A few lines later in _check_interaction_cst the code compuates constraints = [set(group) for group in interaction_cst] which will consume any iterator/generator. So the memory will get used there. As there is nothing to stop users from passing in an equivalent pairwise interaction constraint, I wouldn't stop them from using the shortcut.

Rough estimate: 1000 features, 10000 samples leads to ~80000000 bytes (np.ones((1000, 10_000), dtype=np.float64).nbytes) for the input data and ~4167352 bytes (sys.getsizeof([set(g) for g in itertools.combinations(range(1000), 2)])) for the pairwise constraints.

In the case of 10000 features it takes a considerable amount of time to even run sys.getsizeof([set(g) for g in itertools.combinations(range(1000), 2)]), long enough to write this sentence and still have to wait. It uses about 400MB of memory. I think to fix this you'd have to replace the list of sets with some other data structure to represent all the constraints, or restructure the code so that you can pass the string "pairwise" through to the low(est) levels of the tree building code.

So I'd wait and see how many people have datasets with 10000 features and for whom using ~500MB of memory to represent the constraints is a problem. My guess is that most people with that many features use a large amount of memory just for their data, so another 500MB here or there is a rounding error in the total budget.

@lorentzenchr
Copy link
Member

As said earlier, I would wait if memory problems with interaction_cst="pairwise arise at all. (BTW, it only scales with n_features, not with n_samples).
If this need arises, a way to go may be to pass the string "pairwise" instead of a list of sets to all helper functions for interaction constraints.

@betatim
Copy link
Member Author

betatim commented Nov 11, 2022

The reason I picked a (random) value for n_samples was to get an estimate of how much memory users are using to hold their data. Gives a sense of scale to how much extra memory the list uses. If your dataset is 16GB, then 400MB is nothing. If the dataset only consumed 40MB then an additional 400MB is a big deal.

@betatim
Copy link
Member Author

betatim commented Nov 11, 2022

Regarding the example: I updated the existing PDP example that illustrates the "no interactions" case to use the shortcut. Is that the kind of example you were after @ogrisel?

Regarding EBMs and "no interaction" boosted trees: from watching https://www.youtube.com/watch?v=MREiHgHgl0k I think there is a difference. EBMs are forced to use every feature in the dataset, they fit one tree per feature per "iteration" (one iteration is completed when each feature has been used). For boosted trees with "no interactions" we have no promise regarding which feature gets used/not used. TL;DR: something for a new PR after a bit of thinking.

@lorentzenchr
Copy link
Member

@betatim Could you fix the linting errors (activating pre-commit hooks might help with black).

@betatim
Copy link
Member Author

betatim commented Nov 14, 2022

Formatting fixed, before we merge this we should look at (and decide on a way to resolve) #24899.

@amueller
Copy link
Member

amueller commented Nov 14, 2022

@amueller If I‘m not mistaken, specifying pairwise interaction constraints gives a model m that is pairwise additive in link space: m(x)=h(∑ifi(xi)+∑i<jfi,j(xi,xj)). The functions f, however, are not identical to the trees. It may be the case that different branches (always including the root) of a tree belong to different f.

That was my understanding as well :) so the main interpretability aspect is still there, but it's "just different" from the EBMs. I think it's worth saying both of these in the new example in the new PR :)
The main reason I brought this up here was that if this property didn't hold, there might be little reason to do all pairwise interactions, but seems to make sense :)

@betatim betatim changed the title Add interaction constraint shortcuts to HistGradientBoosting* Add interaction constraint shortcuts to HistGradientBoosting* Nov 17, 2022
@betatim
Copy link
Member Author

betatim commented Nov 25, 2022

Updated now that #24899 is merged. I think all the comments about examples are meant for a new PR, this means I think this is ready.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ogrisel ogrisel added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Nov 25, 2022
@jeremiedbb
Copy link
Member

I directly pushed the required fixes for the CI and some nitpicks. We can remove the test checking for the error message when invalid string for the parameter. This is checked by a common test.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb added this to the 1.2 milestone Nov 25, 2022
@jeremiedbb jeremiedbb removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 25, 2022
@jeremiedbb jeremiedbb merged commit df14322 into scikit-learn:main Nov 25, 2022
@jeremiedbb
Copy link
Member

Thanks @betatim

@lorentzenchr
Copy link
Member

Very cool to have this PR in 1.2. Thanks @betatim and everyone involved.

@betatim betatim deleted the tree-interaction-shortcut branch November 28, 2022 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ensemble Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add user friendly string options for interaction constraints in HistGradientBoosting*
5 participants