Skip to content

Commit

Permalink
ENH Extend PDP for nominal categorical features (#18298)
Browse files Browse the repository at this point in the history
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Closes #14969
  • Loading branch information
madhuracj committed Nov 25, 2022
1 parent 7d1c318 commit c1cfc4d
Show file tree
Hide file tree
Showing 10 changed files with 1,143 additions and 322 deletions.
61 changes: 33 additions & 28 deletions doc/modules/partial_dependence.rst
Expand Up @@ -25,34 +25,33 @@ of all other input features (the 'complement' features). Intuitively, we can
interpret the partial dependence as the expected target response as a
function of the input features of interest.

Due to the limits of human perception the size of the set of input feature of
Due to the limits of human perception, the size of the set of input features of
interest must be small (usually, one or two) thus the input features of interest
are usually chosen among the most important features.

The figure below shows two one-way and one two-way partial dependence plots for
the California housing dataset, with a :class:`HistGradientBoostingRegressor
<sklearn.ensemble.HistGradientBoostingRegressor>`:
the bike sharing dataset, with a
:class:`~sklearn.ensemble.HistGradientBoostingRegressor`:

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_003.png
.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_005.png
:target: ../auto_examples/inspection/plot_partial_dependence.html
:align: center
:scale: 70

One-way PDPs tell us about the interaction between the target response and an
input feature of interest feature (e.g. linear, non-linear). The left plot
in the above figure shows the effect of the average occupancy on the median
house price; we can clearly see a linear relationship among them when the
average occupancy is inferior to 3 persons. Similarly, we could analyze the
effect of the house age on the median house price (middle plot). Thus, these
interpretations are marginal, considering a feature at a time.

PDPs with two input features of interest show the interactions among the two
features. For example, the two-variable PDP in the above figure shows the
dependence of median house price on joint values of house age and average
occupants per household. We can clearly see an interaction between the two
features: for an average occupancy greater than two, the house price is nearly
independent of the house age, whereas for values less than 2 there is a strong
dependence on age.
One-way PDPs tell us about the interaction between the target response and an input
feature of interest (e.g. linear, non-linear). The left plot in the above figure
shows the effect of the temperature on the number of bike rentals; we can clearly see
that a higher temperature is related with a higher number of bike rentals. Similarly, we
could analyze the effect of the humidity on the number of bike rentals (middle plot).
Thus, these interpretations are marginal, considering a feature at a time.

PDPs with two input features of interest show the interactions among the two features.
For example, the two-variable PDP in the above figure shows the dependence of the number
of bike rentals on joint values of temperature and humidity. We can clearly see an
interaction between the two features: with a temperature higher than 20 degrees Celsius,
mainly the humidity has a strong impact on the number of bike rentals. For lower
temperatures, both the temperature and the humidity have an impact on the number of bike
rentals.

The :mod:`sklearn.inspection` module provides a convenience function
:func:`~PartialDependenceDisplay.from_estimator` to create one-way and two-way partial
Expand All @@ -74,6 +73,12 @@ and a two-way PDP between the two features::
You can access the newly created figure and Axes objects using ``plt.gcf()``
and ``plt.gca()``.

To make a partial dependence plot with categorical features, you need to specify
which features are categorical using the parameter `categorical_features`. This
parameter takes a list of indices, names of the categorical features or a boolean
mask. The graphical representation of partial dependence for categorical features is
a bar plot or a 2D heatmap.

For multi-class classification, you need to set the class label for which
the PDPs should be created via the ``target`` argument::

Expand Down Expand Up @@ -120,23 +125,23 @@ feature for each sample separately with one line per sample.
Due to the limits of human perception, only one input feature of interest is
supported for ICE plots.

The figures below show four ICE plots for the California housing dataset,
with a :class:`HistGradientBoostingRegressor
<sklearn.ensemble.HistGradientBoostingRegressor>`. The second figure plots
the corresponding PD line overlaid on ICE lines.
The figures below show two ICE plots for the bike sharing dataset,
with a :class:`~sklearn.ensemble.HistGradientBoostingRegressor`:.
The figures plot the corresponding PD line overlaid on ICE lines.

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png
.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_004.png
:target: ../auto_examples/inspection/plot_partial_dependence.html
:align: center
:scale: 70

While the PDPs are good at showing the average effect of the target features,
they can obscure a heterogeneous relationship created by interactions.
When interactions are present the ICE plot will provide many more insights.
For example, we could observe a linear relationship between the median income
and the house price in the PD line. However, the ICE lines show that there
are some exceptions, where the house price remains constant in some ranges of
the median income.
For example, we see that the ICE for the temperature feature gives us some
additional information: Some of the ICE lines are flat while some others
shows a decrease of the dependence for temperature above 35 degrees Celsius.
We observe a similar pattern for the humidity feature: some of the ICE
lines show a sharp decrease when the humidity is above 80%.

The :mod:`sklearn.inspection` module's :meth:`PartialDependenceDisplay.from_estimator`
convenience function can be used to create ICE plots by setting
Expand Down
10 changes: 9 additions & 1 deletion doc/whats_new/v1.2.rst
Expand Up @@ -405,6 +405,14 @@ Changelog
containing only missing values when transforming.
:pr:`16695` by :user:`Vitor Santa Rosa <vitorsrg>`.

:mod:`sklearn.inspection`
.........................

- |Enhancement| Extended :func:`inspection.partial_dependence` and
:class:`inspection.PartialDependenceDisplay` to handle categorical features.
:pr:`18298` by :user:`Madhura Jayaratne <madhuracj>` and
:user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.kernel_approximation`
...................................

Expand All @@ -416,7 +424,7 @@ Changelog

- |Enhancement| :class:`kernel_approximation.RBFSampler` now accepts
`'scale'` option for parameter `gamma`.
:pr:`24755` by :user:`Gleb Levitski <GLevV>`
:pr:`24755` by :user:`Gleb Levitski <GLevV>`.

:mod:`sklearn.linear_model`
...........................
Expand Down

0 comments on commit c1cfc4d

Please sign in to comment.