Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Extend PDP for nominal categorical features #18298

Merged
merged 132 commits into from Nov 25, 2022
Merged
Show file tree
Hide file tree
Changes from 80 commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
2f3b52d
Extend PDP for categorical features
madhuracj Aug 30, 2020
9f066db
PDP method specification only allows for lists
madhuracj Aug 30, 2020
0eb3053
Fix unit tests by adding missing parameter
madhuracj Aug 30, 2020
fafa9d2
Improve docs
madhuracj Aug 30, 2020
6a7e87d
Wrap a long line
madhuracj Aug 30, 2020
22d66a1
Revert 9f066dbd612
madhuracj Sep 4, 2020
eca5ac7
Update docs as suggested and use features_indices which is resolved
madhuracj Sep 4, 2020
6b0d7c2
Unit test for categorical support in partial_dependence
madhuracj Sep 4, 2020
284c901
Remove extra line at the end of the file
madhuracj Sep 4, 2020
ed254ca
Fix typo
madhuracj Sep 4, 2020
7bca03f
Tests for plot_partial_dependence()
madhuracj Sep 4, 2020
ecf2bf8
Remove redundant check
madhuracj Sep 4, 2020
b16aa23
Merge remote-tracking branch 'upstream/main' into categorical_pdp
madhuracj Feb 1, 2021
fe6580b
Fix linting
madhuracj Feb 2, 2021
cebadbf
Update version introduced
madhuracj Feb 3, 2021
708e40c
Add an example for PDP on categorical features
madhuracj Feb 6, 2021
7c1fe2e
PDP for two-way categorical features
madhuracj Feb 8, 2021
dc45e16
Wrap long lines
madhuracj Feb 8, 2021
819f221
Revert unnecessary removal in 7c1fe2e821d37e
madhuracj Feb 9, 2021
399ea0c
Add documentation for bars_ and heatmaps_
madhuracj Feb 9, 2021
88a12bf
Rotate tick labels on x asix for clarity
madhuracj Feb 10, 2021
67b5fbf
Extract code to plot a heatmap a new util module
madhuracj Feb 13, 2021
78ef2cf
Use new plot module to plot heatmaps
madhuracj Feb 13, 2021
d2d5f74
Fix caller name
madhuracj Feb 13, 2021
c2f0649
Fix linting
madhuracj Feb 13, 2021
9f61ff4
Remove redundant import
madhuracj Feb 13, 2021
e431998
Merge branch 'main' into categorical_pdp
ogrisel Jun 3, 2021
3acccb4
Merge branch 'master' into categorical_pdp
madhuracj Jun 10, 2021
d96640e
Merge branch 'master' into categorical_pdp
madhuracj Jun 10, 2021
10b1ef4
Remove unused imports
madhuracj Jun 10, 2021
058e429
Merge commit '0e7761cdc4f244adb4803f1a97f0a9fe4b365a99' into categori…
madhuracj Jun 24, 2021
0e8a535
MAINT Adds target_version to black config (#20293)
thomasjpfan Jun 17, 2021
6b4f51c
apply black
madhuracj Jun 24, 2021
fbec9a3
Merge remote-tracking branch 'upstream/main' into categorical_pdp
madhuracj Jun 24, 2021
fedf2b2
DOC add whats new entry
glemaitre Jun 30, 2021
6e79bb7
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Jun 30, 2021
6b5d5ba
black
glemaitre Jun 30, 2021
dd3561e
FIX use categorical_features instead of is_categorical
glemaitre Jun 30, 2021
f79528e
style
glemaitre Jul 5, 2021
b8f0082
iter
glemaitre Jul 6, 2021
4c5d2dc
iter
glemaitre Jul 6, 2021
6170de9
iter
glemaitre Jul 6, 2021
6ab4e2d
iter
glemaitre Jul 8, 2021
49dbbf7
iter
glemaitre Jul 8, 2021
92653c0
iter
glemaitre Jul 9, 2021
c9ce64d
iter
glemaitre Jul 9, 2021
934d876
iter
glemaitre Jul 9, 2021
d6e3fb2
iter
glemaitre Jul 9, 2021
4009d5e
iter
glemaitre Jul 9, 2021
7996b6c
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Jul 20, 2021
8a940cb
iter
glemaitre Jul 20, 2021
da270a3
TST add test for plot_heatmap
glemaitre Jul 20, 2021
8adeda3
simplify API
glemaitre Jul 29, 2021
c7e63a8
TST add a test to check the fetching of indexing
glemaitre Jul 29, 2021
4d2ae7e
TST check the error message for small grid_resolution
glemaitre Jul 29, 2021
68a062b
iter
glemaitre Jul 29, 2021
be810fb
use fixture
glemaitre Jul 29, 2021
dd8cf35
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Sep 20, 2021
94fa738
update docstring
glemaitre Sep 20, 2021
a4b4bdb
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Apr 22, 2022
6aaa3ed
pass test
glemaitre Apr 22, 2022
65fd37d
fixes
glemaitre Apr 22, 2022
a5f5319
doc
glemaitre Apr 22, 2022
c15d378
fix tests
glemaitre Apr 22, 2022
1512726
add test
glemaitre Apr 22, 2022
a65c4d3
fix doc test
glemaitre Apr 22, 2022
b3a2843
TST add heatmap extra test
glemaitre Apr 22, 2022
d120f15
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Apr 22, 2022
add4890
DOC
glemaitre Apr 22, 2022
fbf82b2
revert name for consistency
glemaitre Apr 22, 2022
cf27475
remove duplicate
glemaitre Apr 22, 2022
3850da0
less diff
glemaitre Apr 22, 2022
cc268e1
iter
glemaitre Apr 22, 2022
178b377
TST check legend in heterogeneous case
glemaitre Apr 22, 2022
58fd04d
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Apr 26, 2022
db7bb49
Fix figures. Improve docs
madhuracj Apr 26, 2022
0db445f
Update versionadded to 1.2
madhuracj May 18, 2022
2f62724
Merge branch 'main' into categorical_pdp
madhuracj May 18, 2022
5748cf8
ChangeLog entry should go in v1.2
madhuracj May 18, 2022
2c420d2
Merge branch 'main' into categorical_pdp
glemaitre Jul 27, 2022
80b42cb
Merge branch 'main' into categorical_pdp
adrinjalali Jul 27, 2022
78ab357
Apply suggestions from code review
glemaitre Aug 2, 2022
68f5f0e
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Aug 2, 2022
25d92f7
review adrin comments
glemaitre Aug 2, 2022
be3be50
iter
glemaitre Aug 2, 2022
1c7f563
avoid future warning in the doc
glemaitre Aug 2, 2022
2bf4846
remove rst warning
glemaitre Aug 2, 2022
5383c96
Update v1.2.rst
glemaitre Aug 2, 2022
f59e282
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Sep 13, 2022
bfe6179
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Oct 12, 2022
1ef227f
DOC use is_categorical in example
glemaitre Oct 12, 2022
170d642
MAINT remove refactoring plot_heatmap
glemaitre Oct 12, 2022
d68bd2d
MAINT remove the refactoring for heatmap
glemaitre Oct 12, 2022
11653c9
DOC reintroduce interactions constraints in example
glemaitre Oct 12, 2022
90b6055
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Oct 19, 2022
b3d4fab
Apply suggestions from code review
glemaitre Oct 19, 2022
5f8a76b
Merge branch 'categorical_pdp' of github.com:madhuracj/scikit-learn i…
glemaitre Oct 19, 2022
85b2105
apply suggestion thomas
glemaitre Oct 19, 2022
7d95968
Merge branch 'main' into categorical_pdp
glemaitre Oct 27, 2022
bf208bd
Merge branch 'main' into categorical_pdp
glemaitre Oct 31, 2022
13d4991
Merge branch 'main' into categorical_pdp
ogrisel Nov 10, 2022
245f561
Apply suggestions from code review
glemaitre Nov 18, 2022
536adf7
API use feature_names and categorical_features
glemaitre Nov 18, 2022
00188e1
simplify
glemaitre Nov 18, 2022
108e50e
simplify
glemaitre Nov 18, 2022
99e139a
iter
glemaitre Nov 18, 2022
5fc53de
DOC fix docstring
glemaitre Nov 18, 2022
7c952ea
iter
glemaitre Nov 18, 2022
b269141
TST add test for utils
glemaitre Nov 18, 2022
292beec
TST add test for utils
glemaitre Nov 18, 2022
c44fea1
Apply suggestions from code review
glemaitre Nov 21, 2022
5cde742
Update sklearn/inspection/_pd_utils.py
glemaitre Nov 23, 2022
0237316
review pd_utils and tests
glemaitre Nov 23, 2022
fba1571
additional reviews by Olivier
glemaitre Nov 23, 2022
c7fc26d
blackify
glemaitre Nov 23, 2022
c439325
TST fix tests
glemaitre Nov 23, 2022
30c13ba
EXA figure rendering
glemaitre Nov 23, 2022
346c432
Merge remote-tracking branch 'origin/main' into pr/madhuracj/18298
glemaitre Nov 23, 2022
cf6ec31
Update plot_partial_dependence.py
glemaitre Nov 23, 2022
cd354a1
iter
glemaitre Nov 23, 2022
a865e20
DOC it works locally
glemaitre Nov 24, 2022
5b273de
DOC tweak
glemaitre Nov 24, 2022
bf3d741
better tweak
glemaitre Nov 24, 2022
92a6f90
Merge branch 'main' into categorical_pdp
glemaitre Nov 24, 2022
8b42d1f
Merge remote-tracking branch 'upstream/main' into pr/madhuracj/18298
jeremiedbb Nov 25, 2022
0c96552
fix what's new
jeremiedbb Nov 25, 2022
0bc73ed
filter the iloc warning
jeremiedbb Nov 25, 2022
d5f4c48
Pass categorical_features to HGBDT + avoid calling plt.subplots_adjust
ogrisel Nov 25, 2022
1ee652f
Fix pdp_lim
ogrisel Nov 25, 2022
a530c86
Fix test_partial_dependence_plot_limits_two_way
ogrisel Nov 25, 2022
5606e52
Fix test_partial_dependence_plot_limits_one_way
ogrisel Nov 25, 2022
bb608d5
Merge branch 'main' into categorical_pdp
jeremiedbb Nov 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
61 changes: 33 additions & 28 deletions doc/modules/partial_dependence.rst
Expand Up @@ -25,34 +25,33 @@ of all other input features (the 'complement' features). Intuitively, we can
interpret the partial dependence as the expected target response as a
function of the input features of interest.

Due to the limits of human perception the size of the set of input feature of
Due to the limits of human perception, the size of the set of input feature of
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
interest must be small (usually, one or two) thus the input features of interest
are usually chosen among the most important features.

The figure below shows two one-way and one two-way partial dependence plots for
the California housing dataset, with a :class:`HistGradientBoostingRegressor
<sklearn.ensemble.HistGradientBoostingRegressor>`:
the bike sharing dataset, with a
:class:`~sklearn.ensemble.HistGradientBoostingRegressor`:

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_003.png
.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_005.png
:target: ../auto_examples/inspection/plot_partial_dependence.html
:align: center
:scale: 70

One-way PDPs tell us about the interaction between the target response and an
input feature of interest feature (e.g. linear, non-linear). The left plot
in the above figure shows the effect of the average occupancy on the median
house price; we can clearly see a linear relationship among them when the
average occupancy is inferior to 3 persons. Similarly, we could analyze the
effect of the house age on the median house price (middle plot). Thus, these
interpretations are marginal, considering a feature at a time.

PDPs with two input features of interest show the interactions among the two
features. For example, the two-variable PDP in the above figure shows the
dependence of median house price on joint values of house age and average
occupants per household. We can clearly see an interaction between the two
features: for an average occupancy greater than two, the house price is nearly
independent of the house age, whereas for values less than 2 there is a strong
dependence on age.
One-way PDPs tell us about the interaction between the target response and an input
feature of interest (e.g. linear, non-linear). The left plot in the above figure
shows the effect of the temperature on the number of bike rentals; we can clearly see
that a higher temperature is related with a higher number of bike rentals. Similarly, we
could analyze the effect of the humidity on the number of bike rentals (middle plot).
Thus, these interpretations are marginal, considering a feature at a time.

PDPs with two input features of interest show the interactions among the two features.
For example, the two-variable PDP in the above figure shows the dependence of the number
of bike rentals on joint values of temperature and humidity. We can clearly see an
interaction between the two features: with a temperature higher than 20 degrees Celsius,
mainly the humidity has a strong impact on the number of bike rentals. For lower
temperatures, both the temperature and the humidity have an impact on the number of bike
rentals.

The :mod:`sklearn.inspection` module provides a convenience function
:func:`~PartialDependenceDisplay.from_estimator` to create one-way and two-way partial
Expand All @@ -74,6 +73,12 @@ and a two-way PDP between the two features::
You can access the newly created figure and Axes objects using ``plt.gcf()``
and ``plt.gca()``.

If you wish to plot partial dependence of categorical features, you need to specify
which features to considered as such using the parameter `categorical_features`. This
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
parameters takes a list of indices or names of the categorical features or a boolean
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
mask. The graphical representation of partial dependence for categorical features is
a bar plot or a 2D heatmap.

For multi-class classification, you need to set the class label for which
the PDPs should be created via the ``target`` argument::

Expand Down Expand Up @@ -120,23 +125,23 @@ feature for each sample separately with one line per sample.
Due to the limits of human perception, only one input feature of interest is
supported for ICE plots.

The figures below show four ICE plots for the California housing dataset,
with a :class:`HistGradientBoostingRegressor
<sklearn.ensemble.HistGradientBoostingRegressor>`. The second figure plots
the corresponding PD line overlaid on ICE lines.
The figures below show two ICE plots for the bike sharing dataset,
with a :class:`~sklearn.ensemble.HistGradientBoostingRegressor`:.
The figures plot the corresponding PD line overlaid on ICE lines.

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png
.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_004.png
:target: ../auto_examples/inspection/plot_partial_dependence.html
:align: center
:scale: 70

While the PDPs are good at showing the average effect of the target features,
they can obscure a heterogeneous relationship created by interactions.
When interactions are present the ICE plot will provide many more insights.
For example, we could observe a linear relationship between the median income
and the house price in the PD line. However, the ICE lines show that there
are some exceptions, where the house price remains constant in some ranges of
the median income.
For example, we see that the ICE for the temperature feature gives us some
additional information: Some of the ICE lines are flat while some others
shows a decrease of the dependence for temperature above 35 degrees Celsius.
We observe a similar pattern for the humidity feature: some of the ICEs
lines show a sharp decrease when the humidity is above 80%.
glemaitre marked this conversation as resolved.
Show resolved Hide resolved

The :mod:`sklearn.inspection` module's :meth:`PartialDependenceDisplay.from_estimator`
convenience function can be used to create ICE plots by setting
Expand Down
8 changes: 8 additions & 0 deletions doc/whats_new/v1.2.rst
Expand Up @@ -196,6 +196,14 @@ Changelog
to `"highs"` in version 1.4.
:pr:`23637` by :user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.inspection`
.........................

- |Enhancement| Extend :func:`plot_partial_dependence` and
:class:`PartialDependenceDisplay` to handle categorical features.
:pr:`18298` by :user:`Madhura Jayaratne <madhuracj>` and
:user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.metrics`
......................

Expand Down