ENH Add support for feature names in monotonic_cst #24855

ogrisel · 2022-11-07T18:46:57Z

Towards #24852.

TODO

update all the docstrings of the public API
add tests
update an example
changelog

I did not bother moving the MonotonicConstraint enum to the sklearn.utils.validation module. Not sure if I should do it or not. Maybe.

ogrisel · 2022-11-10T10:43:28Z

examples/ensemble/plot_monotonic_constraints.py

@@ -28,7 +28,7 @@

 rng = np.random.RandomState(0)

-n_samples = 5000
+n_samples = 1000


I reduce the number of samples to make the plot less crowded while conveying the same intuitions and furthermore making the example run faster.

ogrisel · 2022-11-10T10:53:53Z

I did not bother moving the MonotonicConstraint enum to the sklearn.utils.validation module. Not sure if I should do it or not. Maybe.

Let's keep this PR focused for now.

Follow-up PR(s) should probably:

move some tests to an estimator agnostic checks for (both for input data errors and monotonicity on random data);
move the user guide do an estimator agnostic section on monotonic_cst with cross-linking between estimators that support this parameter;
use the MonotonicConstraint enum in those estimators.

ogrisel · 2022-11-10T15:22:24Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py


+        - 1: monotonic increase
+        - 0: no constraint
+        - -1: monotonic decrease


I had to remove the indentation of the bullet list to avoid a warning for the old version of sphinx...

jjerphan

LGTM modulo a few unitary negative and positive reviews comments, ahem.

doc/whats_new/v1.2.rst

examples/ensemble/plot_monotonic_constraints.py

sklearn/ensemble/_hist_gradient_boosting/grower.py

sklearn/utils/validation.py

jjerphan · 2022-11-10T20:02:11Z

sklearn/utils/validation.py

+                f"monotonic_cst has shape {monotonic_cst.shape} but the input data "
+                f"X has {estimator.n_features_in_} features."
+            )
+        unexpected_cst = np.setdiff1d(monotonic_cst, [-1, 0, 1])


TIL that numpy.setdiff1d is a thing!

I knew about it but I must admit that github copilot suggested it to me :) Using explicit variable names such as unexpected_cst makes it very smart.

(I knew it, Olivier is a robot! One powered by copilot ;))

scikit-learn 2.0.0: Human Learning in Python?

sklearn/utils/validation.py

examples/ensemble/plot_monotonic_constraints.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

sklearn/utils/validation.py

sklearn/ensemble/_hist_gradient_boosting/grower.py

thomasjpfan · 2022-11-10T23:59:26Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

+        If a dict with str keys, map feature to monotonic constraints by name.
+        If an array, the feature are mapped to constraints by position.


What do you think about linking to the subsection in the example from this PR that uses a dictionary for monotonic constraints?

How would you do so? with a restructured text reference anchor in the "markdown" cell just before the final code snippet?

Done in 1925f0b and fa814d0.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

ogrisel · 2022-11-13T08:46:25Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

+
+        If a dict with str keys, map feature to monotonic constraints by name.
+        If an array, the feature are mapped to constraints by position. See
+        :ref:`monotonic_cst_features_names` for a usage example.


The reference works as expected:

https://output.circle-artifacts.com/output/job/bc0fc35c-0030-48fe-a5b5-5c6f3fe752fd/artifacts/0/doc/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html

thomasjpfan

Otherwise LGTM

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py

…contraints.py Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jjerphan

LGTM! Thank you, @ogrisel. To me, this is a notable UX enhancement.

Since @betatim did a review, I leave him re-review and approve this PR before merging it.

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

betatim

LGTM, modulo typo resolution

betatim · 2022-11-14T09:29:33Z

One thing related but also not: I don't know much about when you should or shouldn't specify this constraint or what the traps are you might fall in to. After reading the example I now think that it makes sense to specify the constraint when you know that a feature value will increase/decrease with the target value. It is a way to add information to the model instead of the model having to discover this relationship itself.

A trap might be that if there is underlying structure to the general trend then you might or might not want to specify the constraint. If the structure is noise, specify it. If the structure is real, don't specify it. The tricky thing is of course knowing which case it is (for real world data). If you get it right the performance of the model should improve

In addition, there is a use-case that is driven by "business decisions". Not sure I can cook up a realistic example on the spot. Maybe something like "houses with bigger area should not be cheaper than ones with less land". Here you might decrease the performance, but you can natively include a constraint from the business side in your model.

Not sure if it is worth linking to a good guide about this from the docs. (New PR either way)

Co-authored-by: Tim Head <betatim@gmail.com>

ogrisel · 2022-11-15T14:29:52Z

In addition, there is a use-case that is driven by "business decisions". Not sure I can cook up a realistic example on the spot. Maybe something like "houses with bigger area should not be cheaper than ones with less land". Here you might decrease the performance, but you can natively include a constraint from the business side in your model.

I think this is the main use case for this feature: enforce some a-priori defined business rules into the machine learning model decisions. They might decrease (or not) the predictive accuracy a bit but they might make the model compliant with regulations for instance.

Adding constraints can also act as a regularizer when labeled data is scarce and could improve the test set accuracy if the training set is "noisy" and make the model more "robust" in a way.

ogrisel · 2022-11-15T15:33:17Z

Merged! Thanks for the reviews.

Add support for feature names in monotonic_cst

825400e

github-actions bot added module:ensemble module:utils labels Nov 7, 2022

ogrisel added 8 commits November 8, 2022 00:16

docstring format

e660969

docstring format

3b15660

Fix indentation in docstring?

cecf92e

More docstring tweaking

1cd97d0

Add a test for the nominal case

44c33d3

Test error messages

73d8a37

Changelog entry

7ed48f5

Update example

498ca43

ogrisel commented Nov 10, 2022

View reviewed changes

Merge branch 'main' into monotonic_cst-feature-names

0641514

ogrisel added 3 commits November 10, 2022 15:20

Docstring tweak for sphinx?

9f37dd7

More indentation tweaking

c754b80

Update the regressors' docstring

5eb8174

ogrisel marked this pull request as ready for review November 10, 2022 15:21

ogrisel commented Nov 10, 2022

View reviewed changes

Fix docstring formating and phrasing

a523aaf

ogrisel added this to the 1.2 milestone Nov 10, 2022

ogrisel added the Quick Review For PRs that are quick to review label Nov 10, 2022

ogrisel mentioned this pull request Nov 10, 2022

ENH Specify categorical features with feature names in HGBDT #24889

Merged

jjerphan approved these changes Nov 10, 2022

View reviewed changes

ogrisel commented Nov 10, 2022

View reviewed changes

examples/ensemble/plot_monotonic_constraints.py Outdated Show resolved Hide resolved

ogrisel commented Nov 10, 2022

View reviewed changes

examples/ensemble/plot_monotonic_constraints.py Outdated Show resolved Hide resolved

ogrisel and others added 4 commits November 10, 2022 21:12

Apply suggestions from code review

f929848

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Fix undefined variable

86c4cc7

Exclude invalid values in ]-1, 1[

5ac2617

Report number of unexpected feature names

36dc2b7

thomasjpfan reviewed Nov 11, 2022

View reviewed changes

ogrisel and others added 4 commits November 11, 2022 06:17

Update sklearn/ensemble/_hist_gradient_boosting/grower.py

766d1f8

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Link to example from docstring

1925f0b

Add missing test case to increase coverage

7b2b3c3

Fix ref to example section

fa814d0

ogrisel commented Nov 13, 2022

View reviewed changes

Merge branch 'main' into monotonic_cst-feature-names

afd2fa6

thomasjpfan approved these changes Nov 13, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py Outdated Show resolved Hide resolved

ogrisel and others added 2 commits November 14, 2022 09:12

Update sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_…

161e87c

…contraints.py Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Cosmetic change in error message

5c6a7ee

jjerphan changed the title ~~Add support for feature names in monotonic_cst~~ ENH Add support for feature names in monotonic_cst Nov 14, 2022

jjerphan approved these changes Nov 14, 2022

View reviewed changes

betatim reviewed Nov 14, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

betatim reviewed Nov 14, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

betatim approved these changes Nov 14, 2022

View reviewed changes

jjerphan removed the Quick Review For PRs that are quick to review label Nov 14, 2022

Apply suggestions from code review

90060da

Co-authored-by: Tim Head <betatim@gmail.com>

ogrisel merged commit 74ddf01 into scikit-learn:main Nov 15, 2022

ogrisel deleted the monotonic_cst-feature-names branch November 15, 2022 15:32

ogrisel mentioned this pull request Nov 15, 2022

Make it possible to specify interaction_cst and monotonic_cst with feature names. #24852

Closed

alxhslm mentioned this pull request Apr 16, 2024

Make it possible to specify monotonic_cst with feature names in all tree-based estimators #28850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add support for feature names in monotonic_cst #24855

ENH Add support for feature names in monotonic_cst #24855

ogrisel commented Nov 7, 2022 •

edited

ogrisel Nov 10, 2022

ogrisel commented Nov 10, 2022

ogrisel Nov 10, 2022

jjerphan left a comment

jjerphan Nov 10, 2022

ogrisel Nov 10, 2022

betatim Nov 11, 2022

jjerphan Nov 14, 2022

thomasjpfan Nov 10, 2022 •

edited

ogrisel Nov 11, 2022

ogrisel Nov 11, 2022 •

edited

ogrisel Nov 13, 2022

thomasjpfan left a comment

jjerphan left a comment

betatim left a comment

betatim commented Nov 14, 2022 •

edited

ogrisel commented Nov 15, 2022

ogrisel commented Nov 15, 2022

		If a dict with str keys, map feature to monotonic constraints by name.
		If an array, the feature are mapped to constraints by position.

ENH Add support for feature names in monotonic_cst #24855

ENH Add support for feature names in monotonic_cst #24855

Conversation

ogrisel commented Nov 7, 2022 • edited

TODO

Choose a reason for hiding this comment

ogrisel commented Nov 10, 2022

Choose a reason for hiding this comment

jjerphan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan Nov 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel Nov 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

jjerphan left a comment

Choose a reason for hiding this comment

betatim left a comment

Choose a reason for hiding this comment

betatim commented Nov 14, 2022 • edited

ogrisel commented Nov 15, 2022

ogrisel commented Nov 15, 2022

ogrisel commented Nov 7, 2022 •

edited

thomasjpfan Nov 10, 2022 •

edited

ogrisel Nov 11, 2022 •

edited

betatim commented Nov 14, 2022 •

edited