Add interaction constraint shortcuts to `HistGradientBoosting*` #24849

betatim · 2022-11-07T10:59:10Z

Allow users to specify common interaction constraints as a string. Users can specify "pairwise" or "no interactions" as a shortcut to having to explicitly provide the interactions.

Reference Issues/PRs

Closes #24845

betatim · 2022-11-09T10:22:42Z

This is ready for review.

amueller · 2022-11-09T17:23:31Z

I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different).
For pairwise interactions, I think it would be nice to maybe have a separate example that explains what it means. Honestly I don't have a good intuition, and I think it would be good to explain that it's not an additive model on pairwise interactions (if I remember correctly), i.e. the models can't be written as

Maybe @lorentzenchr can confirm or deny?

amueller · 2022-11-09T17:40:36Z

Hm I'm thinking now that with all pairwise constraints, each leaf is an indicator with two features, and a tree is a weighted sum of these indicators, so it is actually possible to rewrite the function in this way, just in a less obvious way than for EBMs?

If that's the case that would be another interesting unit test maybe?

amueller · 2022-11-09T18:56:18Z

This could be a separate PR, but I would suggest breaking up https://scikit-learn.org/dev/auto_examples/inspection/plot_partial_dependence.html#sphx-glr-auto-examples-inspection-plot-partial-dependence-py into two examples, and showing that with the interaction constraints the predictions are actually the sum of the partial dependencies, both for the univariate case and the pairwise interaction case.

For reference, I tried this with the no interactions estimator (when only training on the features that we compute partial dependence for):

from sklearn.utils.extmath import cartesian
grid = cartesian(x['values'][0] for x in display.pd_results)
partials = cartesian(x['average'][0] for x in display.pd_results).sum(axis=-1)
preds = est.predict(grid)
preds - partials # should be zero and isn't ,but at least it's a constant

I'm not sure why there's a constant offset but it's still a cool illustration that it's actually an additive model now, aka the partial dependencies are all there is.

ogrisel · 2022-11-10T16:33:48Z

I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different).

I think this can be done as part of the current PR, right? That would be nice to link to such an example from the release highlights.

ogrisel · 2022-11-10T16:39:53Z

I'm not sure why there's a constant offset.

It might be related to the fact that the method="recursion" for tree based models does not account for the level 0 boosting iteration of gradient boosting that basically predicts the marginal mean of y IIRC. Subtracting the mean y before fitting might "fix" the discrepancy.

ogrisel

Besides the concern with the white space. LGTM.

I would also like at least a quick update of an existing example (or new example) to show the "no interaction" which might be quite common.

A full-blown example for EBM-style glassbox modeling might come in a follow-up PR.

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

amueller · 2022-11-10T20:28:16Z

Subtracting the mean y before fitting might "fix" the discrepancy.

The example removes the mean of y on the whole dataset. I can try removing just the training set mean and not using early stopping, but even without early stopping, the difference doesn't seem directly related to the mean of y.

I was gonna try and empirically confirm the glass box assertion was also true for the pairwise model but got sidetracked with meetings :-/

lorentzenchr · 2022-11-11T07:15:35Z

@amueller Side note: Github has $\LaTeX$ ($) support now.
Main note: If I‘m not mistaken, specifying pairwise interaction constraints gives a model $m$ that is pairwise additive in link space: $m(x) = h(\sum_i f_i(x_i) + \sum_{i\lt j} f_{i,j}(x_i,x_j))$. The functions $f$, however, are not identical to the trees. It may be the case that different branches (always including the root) of a tree belong to different $f$.

betatim · 2022-11-11T10:37:18Z

I think it would be awesome to have an example with "no interactions" and discuss how that leads to an additive model, which is therefore super-interpretable, and say that it's very closely related to the explainable boosting machine (EBM) of Caruana (but slightly different).

I think this can be done as part of the current PR, right? That would be nice to link to such an example from the release highlights.

I'll look at adding an example.

I'd prefer to punt all the other ideas about changes to examples or new examples to a separate PR (or PRs). "hashtag one idea per PR"

lorentzenchr · 2022-11-11T10:41:06Z

@betatim I agree. I also prefer a new PR for an additional example to keep this PR small.

betatim · 2022-11-11T11:00:55Z

Any recommendations for a good didactic dataset/setup for an example? Otherwise I'll do a bit of research to find something.

lorentzenchr

Looks very good. 2 comments:

Should we add a test that models are identical for string options and their manual pendent?
In the original PR for interaction constraints, concerns were raised that "pairwise" expands to a list with $n(n-1)/2$ elements. For 10000 features this is not great. One could use iterators/generators inside the code to improve memory footprint, but I would not do so as long as the need does not arise.

doc/whats_new/v1.2.rst

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

betatim · 2022-11-11T12:45:06Z

In the original PR for interaction constraints, concerns were raised that "pairwise" expands to a list ..

I saw that, not sure there is much we can do about it. A few lines later in _check_interaction_cst the code compuates constraints = [set(group) for group in interaction_cst] which will consume any iterator/generator. So the memory will get used there. As there is nothing to stop users from passing in an equivalent pairwise interaction constraint, I wouldn't stop them from using the shortcut.

Rough estimate: 1000 features, 10000 samples leads to ~80000000 bytes (np.ones((1000, 10_000), dtype=np.float64).nbytes) for the input data and ~4167352 bytes (sys.getsizeof([set(g) for g in itertools.combinations(range(1000), 2)])) for the pairwise constraints.

In the case of 10000 features it takes a considerable amount of time to even run sys.getsizeof([set(g) for g in itertools.combinations(range(1000), 2)]), long enough to write this sentence and still have to wait. It uses about 400MB of memory. I think to fix this you'd have to replace the list of sets with some other data structure to represent all the constraints, or restructure the code so that you can pass the string "pairwise" through to the low(est) levels of the tree building code.

So I'd wait and see how many people have datasets with 10000 features and for whom using ~500MB of memory to represent the constraints is a problem. My guess is that most people with that many features use a large amount of memory just for their data, so another 500MB here or there is a rounding error in the total budget.

lorentzenchr · 2022-11-11T13:03:45Z

As said earlier, I would wait if memory problems with interaction_cst="pairwise arise at all. (BTW, it only scales with n_features, not with n_samples).
If this need arises, a way to go may be to pass the string "pairwise" instead of a list of sets to all helper functions for interaction constraints.

betatim · 2022-11-11T15:21:35Z

The reason I picked a (random) value for n_samples was to get an estimate of how much memory users are using to hold their data. Gives a sense of scale to how much extra memory the list uses. If your dataset is 16GB, then 400MB is nothing. If the dataset only consumed 40MB then an additional 400MB is a big deal.

betatim · 2022-11-11T15:26:49Z

Regarding the example: I updated the existing PDP example that illustrates the "no interactions" case to use the shortcut. Is that the kind of example you were after @ogrisel?

Regarding EBMs and "no interaction" boosted trees: from watching https://www.youtube.com/watch?v=MREiHgHgl0k I think there is a difference. EBMs are forced to use every feature in the dataset, they fit one tree per feature per "iteration" (one iteration is completed when each feature has been used). For boosted trees with "no interactions" we have no promise regarding which feature gets used/not used. TL;DR: something for a new PR after a bit of thinking.

lorentzenchr · 2022-11-11T16:08:45Z

@betatim Could you fix the linting errors (activating pre-commit hooks might help with black).

betatim · 2022-11-14T07:53:21Z

Formatting fixed, before we merge this we should look at (and decide on a way to resolve) #24899.

amueller · 2022-11-14T15:59:46Z

@amueller If I‘m not mistaken, specifying pairwise interaction constraints gives a model m that is pairwise additive in link space: m(x)=h(∑ifi(xi)+∑i<jfi,j(xi,xj)). The functions f, however, are not identical to the trees. It may be the case that different branches (always including the root) of a tree belong to different f.

That was my understanding as well :) so the main interpretability aspect is still there, but it's "just different" from the EBMs. I think it's worth saying both of these in the new example in the new PR :)
The main reason I brought this up here was that if this property didn't hold, there might be little reason to do all pairwise interactions, but seems to make sense :)

Allow users to specify common interaction constraints as a string.

Renamed parameter value to use an underscore and use `StrOptions` to describe valid parameter values.

betatim · 2022-11-25T13:04:35Z

Updated now that #24899 is merged. I think all the comments about examples are meant for a new PR, this means I think this is ready.

ogrisel

LGTM!

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py

…oosting.py

jeremiedbb · 2022-11-25T15:13:06Z

I directly pushed the required fixes for the CI and some nitpicks. We can remove the test checking for the error message when invalid string for the parameter. This is checked by a common test.

jeremiedbb

LGTM

jeremiedbb · 2022-11-25T15:50:46Z

Thanks @betatim

lorentzenchr · 2022-11-25T18:22:55Z

Very cool to have this PR in 1.2. Thanks @betatim and everyone involved.

github-actions bot added the module:ensemble label Nov 7, 2022

betatim changed the title ~~Add interaction constraint shortcuts~~ Add interaction constraint shortcuts to HistGradientBoosting* Nov 7, 2022

betatim marked this pull request as ready for review November 7, 2022 13:20

ogrisel reviewed Nov 10, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

lorentzenchr reviewed Nov 11, 2022

View reviewed changes

betatim mentioned this pull request Nov 11, 2022

Strange parameter validation for interaction constraints in HistGradientBoosting* #24899

Closed

betatim mentioned this pull request Nov 14, 2022

DOC Adds release highlights for 1.2 #24798

Merged

ogrisel mentioned this pull request Nov 15, 2022

Make it possible to specify interaction_cst and monotonic_cst with feature names. #24852

Closed

betatim changed the title Add interaction constraint shortcuts to HistGradientBoosting* Add interaction constraint shortcuts to HistGradientBoosting* Nov 17, 2022

betatim added 3 commits November 25, 2022 13:13

Add interaction constraint shortcuts

1e769b6

Allow users to specify common interaction constraints as a string.

Update doc strings with interaction cst shortcut

184e7cd

Add whats new entry

8a54b7b

betatim added 9 commits November 25, 2022 13:16

Switch from whitespace to underscore for consistency

d0f14bc

Renamed parameter value to use an underscore and use `StrOptions` to describe valid parameter values.

Update test to use new parameter value

efb78a3

Remove explicit parameter validation

60b9938

Update doc string

8bc96ed

Update PDP example to use "no_interactions" shortcut

45f403a

Update whats new

a868526

Fix formatting

0211c2f

Fix doc string

b990599

Small fixes

152e622

betatim force-pushed the tree-interaction-shortcut branch from 4cabe16 to 152e622 Compare November 25, 2022 12:49

ogrisel approved these changes Nov 25, 2022

View reviewed changes

ogrisel added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Nov 25, 2022

jeremiedbb added 5 commits November 25, 2022 15:54

Merge remote-tracking branch 'upstream/main' into pr/betatim/24849

2ef2e3a

right place in what's new

fe868de

remove test for error because it's handled by the common test

c5725af

nit

34a5eaa

follow docstring conventions

4f9ad01

jeremiedbb reviewed Nov 25, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved

Update sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_b…

b69b68f

…oosting.py

jeremiedbb approved these changes Nov 25, 2022

View reviewed changes

jeremiedbb added this to the 1.2 milestone Nov 25, 2022

jeremiedbb removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 25, 2022

jeremiedbb merged commit df14322 into scikit-learn:main Nov 25, 2022

betatim deleted the tree-interaction-shortcut branch November 28, 2022 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add interaction constraint shortcuts to `HistGradientBoosting*` #24849

Add interaction constraint shortcuts to `HistGradientBoosting*` #24849

betatim commented Nov 7, 2022

betatim commented Nov 9, 2022

amueller commented Nov 9, 2022 •

edited

amueller commented Nov 9, 2022

amueller commented Nov 9, 2022 •

edited

ogrisel commented Nov 10, 2022

ogrisel commented Nov 10, 2022

ogrisel left a comment

amueller commented Nov 10, 2022

lorentzenchr commented Nov 11, 2022 •

edited

betatim commented Nov 11, 2022

lorentzenchr commented Nov 11, 2022

betatim commented Nov 11, 2022

lorentzenchr left a comment

betatim commented Nov 11, 2022

lorentzenchr commented Nov 11, 2022

betatim commented Nov 11, 2022 •

edited

betatim commented Nov 11, 2022

lorentzenchr commented Nov 11, 2022

betatim commented Nov 14, 2022

amueller commented Nov 14, 2022 •

edited

betatim commented Nov 25, 2022

ogrisel left a comment

jeremiedbb commented Nov 25, 2022

jeremiedbb left a comment

jeremiedbb commented Nov 25, 2022

lorentzenchr commented Nov 25, 2022

Add interaction constraint shortcuts to HistGradientBoosting* #24849

Add interaction constraint shortcuts to HistGradientBoosting* #24849

Conversation

betatim commented Nov 7, 2022

Reference Issues/PRs

betatim commented Nov 9, 2022

amueller commented Nov 9, 2022 • edited

amueller commented Nov 9, 2022

amueller commented Nov 9, 2022 • edited

ogrisel commented Nov 10, 2022

ogrisel commented Nov 10, 2022

ogrisel left a comment

Choose a reason for hiding this comment

amueller commented Nov 10, 2022

lorentzenchr commented Nov 11, 2022 • edited

betatim commented Nov 11, 2022

lorentzenchr commented Nov 11, 2022

betatim commented Nov 11, 2022

lorentzenchr left a comment

Choose a reason for hiding this comment

betatim commented Nov 11, 2022

lorentzenchr commented Nov 11, 2022

betatim commented Nov 11, 2022 • edited

betatim commented Nov 11, 2022

lorentzenchr commented Nov 11, 2022

betatim commented Nov 14, 2022

amueller commented Nov 14, 2022 • edited

betatim commented Nov 25, 2022

ogrisel left a comment

Choose a reason for hiding this comment

jeremiedbb commented Nov 25, 2022

jeremiedbb left a comment

Choose a reason for hiding this comment

jeremiedbb commented Nov 25, 2022

lorentzenchr commented Nov 25, 2022

Add interaction constraint shortcuts to `HistGradientBoosting*` #24849

Add interaction constraint shortcuts to `HistGradientBoosting*` #24849

amueller commented Nov 9, 2022 •

edited

amueller commented Nov 9, 2022 •

edited

lorentzenchr commented Nov 11, 2022 •

edited

betatim commented Nov 11, 2022 •

edited

amueller commented Nov 14, 2022 •

edited