Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add PredictionErrorDisplay #18020

Merged
merged 182 commits into from Nov 25, 2022
Merged
Show file tree
Hide file tree
Changes from 171 commits
Commits
Show all changes
182 commits
Select commit Hold shift + click to select a range
48d8d8a
FEA add plot_prediction_error
glemaitre Jul 28, 2020
ebb3a78
iter
glemaitre Jul 28, 2020
6828a2b
iter
glemaitre Jul 29, 2020
edcc3f5
Merge remote-tracking branch 'origin/master' into predicted_actual_re…
glemaitre Jul 29, 2020
3275dfa
iter
glemaitre Jul 29, 2020
0e9c06a
check with an example
glemaitre Jul 29, 2020
7f245a2
reset previous style
glemaitre Jul 29, 2020
246d0b3
iter
glemaitre Jul 29, 2020
11fe8c2
change another example
glemaitre Jul 29, 2020
f9e912a
DOC update another example
glemaitre Jul 29, 2020
08705ee
PEP8
glemaitre Jul 29, 2020
0aace91
add some tests
glemaitre Jul 29, 2020
4a5463d
TST add test and improve documentation
glemaitre Jul 30, 2020
1008644
compatibility matplotlib 2.2
glemaitre Jul 30, 2020
1b58be3
avoid computation in display class
glemaitre Jul 30, 2020
017d7b5
EXA remove subsample from example
glemaitre Jul 30, 2020
b6767d9
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Aug 7, 2021
ca71ccf
iter
glemaitre Aug 7, 2021
41143d2
iter
glemaitre Aug 7, 2021
9c2ab2e
iter
glemaitre Aug 7, 2021
e2cc928
iter
glemaitre Aug 7, 2021
4f1fa00
iter
glemaitre Aug 7, 2021
c9a06db
iter
glemaitre Aug 7, 2021
51cea2c
iter
glemaitre Aug 7, 2021
cb5a1be
docstring
glemaitre Aug 7, 2021
3e47dc4
docstring
glemaitre Aug 7, 2021
3213752
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Apr 20, 2022
07c5e46
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Apr 20, 2022
79d0e59
revert pre-commit change
glemaitre Apr 20, 2022
5e3a6ee
black
glemaitre Apr 20, 2022
79e0cf1
iter
glemaitre Apr 20, 2022
8905137
improve example and change API
glemaitre Apr 21, 2022
fb804e7
doc fix
glemaitre Apr 21, 2022
6a56df6
fix tests
glemaitre Apr 21, 2022
b279c9a
TST add residuals test
glemaitre Apr 21, 2022
c9bb7bb
fixes
glemaitre Apr 21, 2022
15c2e18
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Apr 26, 2022
4a3b4cb
Merge remote-tracking branch 'upstream/main' into pr/glemaitre/18020
jeremiedbb Sep 16, 2022
e6e46cd
target 1.2
jeremiedbb Sep 16, 2022
6c32fd9
Merge branch 'main' into predicted_actual_regression_plot
jeremiedbb Sep 21, 2022
96dec4f
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Sep 28, 2022
0e942b7
iter
glemaitre Sep 28, 2022
4ba8c77
switch color
glemaitre Sep 28, 2022
68a5c5f
add horizontal line for residuals plot
glemaitre Sep 28, 2022
154d191
add horizontal line for residuals plot
glemaitre Sep 28, 2022
b204d3e
update style regarding scores
glemaitre Sep 28, 2022
e56d3ac
Add the option to swap x- y-axsi
glemaitre Sep 28, 2022
cd70c93
DOC add some more documentation
glemaitre Sep 28, 2022
41f3d16
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Sep 28, 2022
72218b4
DOC outdated versionadded
glemaitre Sep 28, 2022
ab178e6
Update doc/modules/model_evaluation.rst
glemaitre Oct 31, 2022
d315a0e
Apply suggestions from code review
glemaitre Oct 31, 2022
02f0a6c
DOC Clarify sorting order of PCA components (#24531)
anntzer Sep 28, 2022
3e3ccec
DOC Mention pandas dataframe support in ColumnTransformer in FAQ (#24…
Dev-Khant Sep 28, 2022
eb1723e
MAINT modify import raccoon face for SciPy >= 1.10 (#24521)
glemaitre Sep 29, 2022
0aa7afa
CI Remove unneeded setup-python (#24544)
thomasjpfan Sep 30, 2022
66942bd
FIX Treat gradient boosting categoricals outside the bounds as unknow…
thomasjpfan Sep 30, 2022
91d361e
MNT update security.md (#24551)
adrinjalali Sep 30, 2022
6b684e9
BLD Import from numpy/arrayobject.h directly (#24547)
thomasjpfan Sep 30, 2022
1a13ee8
FIX Fixes HTML repr when get_params contains classes (#24512)
thomasjpfan Sep 30, 2022
c012047
MAINT remove redundant items from iterable (#24554)
rhelmeczi Sep 30, 2022
1dc22a2
MAINT Clean deprecation for 1.2: get_feature_names (#24395)
jeremiedbb Oct 3, 2022
05343b1
DOC Ensure that sklearn.utils.axis0_safe_slice passes numpydoc (#24561)
rjuer Oct 3, 2022
a080ed8
DOC Ensures that sklearn.tree._export.plot_tree passes numpydoc valid…
mauroantonioserrano Oct 3, 2022
3f8bf59
DOC Ensure that sklearn.utils.is_scalar_nan passes numpydoc validatio…
rjuer Oct 3, 2022
389cfd5
DOC make `plot_mean_shift.py` more colourblind friendly (#24553)
rprkh Oct 4, 2022
6be5770
ENH change Ridge tol to 1e-4 (#24465)
lorentzenchr Oct 6, 2022
b5819a7
MNT Optimize safe_indexing for slices (#24587)
thomasjpfan Oct 6, 2022
edf203a
CI remove LGTM (#24592)
adrinjalali Oct 6, 2022
7cc44ed
CLN Remove unneeded variable definition in DictVectorizer.fit (#24590)
thomasjpfan Oct 6, 2022
87f723b
DOC Ensures that available_if passes numpydoc validation (#24586)
irene000 Oct 6, 2022
6be933b
CLN Migrate avaliable_if to it's own file (#24594)
thomasjpfan Oct 6, 2022
00080cc
API Deprecate the extra keyword arguments of utils.extmath.density (#…
clytaemnestra Oct 7, 2022
fa5e4e4
CLN Do not override signature for _visual_block_ (#24588)
thomasjpfan Oct 7, 2022
2ba5fcc
TST Removed sklearn.util.fixes.linspace from numpydoc ignore list (#2…
mansi1597 Oct 7, 2022
dc682e3
DOC Ensures that sklearn.utils.extmath.safe_sparse_dot passes numpydo…
awinml Oct 7, 2022
5e3e3ca
CLN Remove unnecessary operation in mutual_info (#24569)
RowanMankoo Oct 7, 2022
683544d
TST Relax `test_gradient_boosting_early_stopping` (#24541)
jjerphan Oct 7, 2022
5484a80
DOC Ensures that sklearn.utils.extmath.weighted_mode passes numpydoc …
awinml Oct 7, 2022
e9faa62
TST Make test_function_docstrings ignore functions from utils.fixes (…
jeremiedbb Oct 7, 2022
0facca3
MAINT Clean deprecation for 1.2: cv_results_ keys (#24599)
jeremiedbb Oct 7, 2022
c67c59b
MAINT Clean deprecation for 1.2: load_boston (#24603)
jeremiedbb Oct 10, 2022
c701ad4
DOC Ensure that gen_even_slices passes numpydoc validation (#24608)
thatgeeman Oct 10, 2022
d576284
DOC Check sha256 digests of tarballs in tutorial and examples before …
ogrisel Oct 10, 2022
2fc9bbb
MAINT set joblib min to 1.1.1 (#24621)
ogrisel Oct 10, 2022
17f12a3
API Add `"auto"` value and deprecate default value for `normalized_st…
Micky774 Oct 10, 2022
c0bf1b0
DOC Ensures that svd_flip passes numpydoc validation (#24581)
mansi1597 Oct 11, 2022
82579e2
ENH FEA add interaction constraints to HGBT (#21020)
lorentzenchr Oct 11, 2022
63b1b01
DOC add a docstring example for the learning_curve function (#24546)
glemaitre Oct 12, 2022
6d3d44e
MAINT Clean deprecation for 1.2: normalize in linear models (#24391)
jeremiedbb Oct 12, 2022
9245703
Simplify Tempita preprocessing (#24624)
jjerphan Oct 12, 2022
178849f
GEMMTermComputer.{_compute_distances_on_chunks→_compute_dist_middle_t…
jjerphan Oct 12, 2022
550b653
ENH Introduces set_output API for pandas output (#23734)
thomasjpfan Oct 12, 2022
69ea8c8
CI Use GITHUB_OUTPUT instead of deprecated set-output (#24644)
thomasjpfan Oct 13, 2022
1935f2b
DOC Promote Meekail Zain to the Core Contributor Team (#24649)
jjerphan Oct 13, 2022
3400d89
CI Remove Windows 32 bit support (#24627)
thomasjpfan Oct 13, 2022
d8b0775
DOC Use :doi: directive for KMeans (#24641)
ArturoAmorQ Oct 13, 2022
7b58086
DOC Rework plot_roc.py example (#24200)
ArturoAmorQ Oct 13, 2022
8e79cd2
MAINT clean deprecation for 1.2: leftovers (#24647)
jeremiedbb Oct 13, 2022
317da38
DOC Ensure that gen_batches passes numpydoc validation (#24609)
thatgeeman Oct 13, 2022
61649ab
MAINT `PairwiseDistancesReduction`: Update comments and remove unused…
jjerphan Oct 13, 2022
39a9360
MAINT Fix full doc build by avoiding plot_set_output.py side-effect (…
lesteve Oct 13, 2022
274d537
MAINT Clean deprecation for 1.2: load_boston follow-up (#24653)
jeremiedbb Oct 13, 2022
b0cbd54
MAINT set plotly min to 5.10 (#24629)
ArturoAmorQ Oct 13, 2022
9629a14
OPTIM use pairwise_distances_argmin in NearestCentroid.predict (#24645)
ogrisel Oct 13, 2022
32d0619
MAINT use nanmin to replace nan by finite values in ranking of Search…
glemaitre Oct 13, 2022
ba8b09c
MAINT `PairwiseDistancesReduction`: Rename some symbols and files (#2…
jjerphan Oct 13, 2022
e9c57b4
DOC Ensures that `sklearn.utils.extmath.randomized_svd` passes numpyd…
awinml Oct 13, 2022
64937d2
DOC Ensures that if_delegate_has_method passes numpydoc validation (#…
michpara Oct 13, 2022
e656747
Add msvcp140.dll to Windows 64 bit wheels (#24631)
cmarmo Oct 14, 2022
a1dfc64
DOC Ensures that sklearn.utils.extmath.fast_logdet passes numpydoc va…
awinml Oct 14, 2022
d0c3bf3
MNT Do not update docs with deprecated decorator (#24410)
thomasjpfan Oct 14, 2022
2adf799
MAINT Clean deprecation for 1.2: default random_state in randomized_s…
jeremiedbb Oct 14, 2022
cb1318c
MAINT Clean deprecation for 1.2: feature names exact match (#24660)
jeremiedbb Oct 14, 2022
66a1b8b
MAINT Bump min dependencies for 1.2 (#24650)
jeremiedbb Oct 14, 2022
a5e8294
API Remove `sklearn.metrics.manhattan_distances` option `sum_over_fea…
rusdes Oct 14, 2022
7d5eda2
EFF avoid computing inertia in KMeans' predict (#24666)
jeremiedbb Oct 14, 2022
dbc1fe6
DOC make `plot_agglomerative_clustering_metrics.py` colorblind friend…
rprkh Oct 17, 2022
2eee0c7
DOC use KBinsDiscretizer in lieu of KMeans in vector quantization exa…
x110 Oct 17, 2022
b736d2e
TST use global_random_seed in sklearn/cluster/tests/test_dbscan.py (#…
OmarManzoor Oct 17, 2022
803d679
HOTFIX Temporarily disable py38_conda_defaults_openblas build (#24693)
lesteve Oct 18, 2022
0154fbc
DOC Fix typo and adjust wording in `set_output` example (#24689)
betatim Oct 18, 2022
19de56a
DOC Improve docstring around set_output (#24672)
thomasjpfan Oct 18, 2022
d7e6fce
ENH Makes OneToOneFeatureMixin and ClassNamePrefixFeaturesOutMixin pu…
thomasjpfan Oct 18, 2022
f5d0d71
DOC Fix typo in plot_set_output.py example (#24704)
AlessandroMiola Oct 19, 2022
7538a17
MAINT Fix build when SKLEARN_OPENMP_PARALLELISM_ENABLED=False (#24682)
lesteve Oct 19, 2022
08f2663
MAINT renable Linux + Python 3.8 build with OpenBLAS (#24705)
glemaitre Oct 19, 2022
495bf16
CI Add wheel builds for Python 3.11 (#24446)
cmarmo Oct 20, 2022
f2fddd7
CI Remove remaining windows 32 references (#24657)
thomasjpfan Oct 20, 2022
99b423b
DOC fix typo inside Pipeline docstring (#24730)
zaznaczony Oct 23, 2022
3dd21c1
DOC fix title underline too short in Gaussian Process kernel (#24726)
cmarmo Oct 23, 2022
87c7615
DOC correct bound of sum LinearSVR in formula in svm.rst (#24722)
ftorres16 Oct 23, 2022
a668c52
DOC fix sphinx directive in function (#24733)
glemaitre Oct 23, 2022
7fd64c5
DOC fix deprecation warning raised by KMeans and Matplotlib (#24692)
glemaitre Oct 24, 2022
241e2d5
Add sphinx_highlight.js to the search page (needed since sphinx 5.2.0…
cmarmo Oct 24, 2022
f7bdb96
DOC fix a missing final fullstop in docstring (#24739)
glemaitre Oct 24, 2022
cf0d263
DOC Improve narrative of plot_roc_crossval example (#24710)
ArturoAmorQ Oct 24, 2022
382e61f
FEA add (single) Cholesky Newton solver to GLMs (#24637)
lorentzenchr Oct 24, 2022
d4d3b65
MAINT force NumPy version for building scikit-learn for CPython 3.10 …
glemaitre Oct 25, 2022
f77b3c6
API add named_transformers attribute to FeatureUnion (#20331)
crflynn Oct 25, 2022
11f670e
DOC fix deprecated log loss argument in user guide (#24753)
lorentzenchr Oct 25, 2022
b8ffee1
FIX check_estimator fails when validating SGDClassifier with log_loss…
MaxwellLZH Oct 25, 2022
e9d358e
DOC Add links to DBSCAN references. (#24758)
cmarmo Oct 25, 2022
a07e555
FIX Fixes common test for requires_positive_X (#24667)
thomasjpfan Oct 26, 2022
be10b25
DOC add entries for the 1.1.3 release (#24744)
glemaitre Oct 26, 2022
25b726b
DOC add more info about the drop of support for 32-bit Python on Wind…
glemaitre Oct 26, 2022
e4cf715
DOC convert confusion matrix to y_true/y_pred for classification_repo…
joaocmd Oct 26, 2022
76beb65
DOC update index showing new release
glemaitre Oct 26, 2022
02248fc
DOC Use show_config instead of numpy.distutils's get_info (#24762)
thomasjpfan Oct 27, 2022
11e8d6b
ENH Add dtype preservation for Isomap (#24714)
rprkh Oct 27, 2022
a31665a
FIX Improves nan support in LabelEncoder (#22629)
thomasjpfan Oct 28, 2022
243fa04
DOC changed marker colors for calibration comparison (#24766)
star1327p Oct 28, 2022
9dfbe6a
DOC Fix inline interpreted text start-string without end-string. (#24…
cmarmo Oct 28, 2022
8daea4c
ENH Add gamma='scale' option to RBFSampler (#24755)
glevv Oct 28, 2022
35f9ea0
TST Fix uniform rng in kmeans test_scaled_weights (#24778)
fcharras Oct 28, 2022
2f432a8
DOC FIX Consistent formulae for metrics in the user guide (#24673)
awinml Oct 28, 2022
8f65a20
Improve error message in _search_successive_halving.py (#24781)
ggrrll Oct 28, 2022
fb1b81f
FIX bagging with SGD and early stopping throws ZeroDivisionError (#23…
MaxwellLZH Oct 28, 2022
82e59a1
ENH Allow path-like objects in load_svmlight_file. (#19075)
vnmabus Oct 28, 2022
92fc23e
DOC Update SECURITY.md for 1.1.3 (#24783)
adrinjalali Oct 31, 2022
33512f9
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Oct 31, 2022
82697b3
address missing comments from jeremie
glemaitre Oct 31, 2022
6387cba
address remaining comments Arturo
glemaitre Oct 31, 2022
c3b7950
DOC fix suptitle
glemaitre Oct 31, 2022
e20746a
rever tight_layout
glemaitre Oct 31, 2022
a6600cd
Apply suggestions from code review
glemaitre Nov 23, 2022
6dbee14
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Nov 23, 2022
de13efb
iter
glemaitre Nov 23, 2022
1ca8acf
iter
glemaitre Nov 23, 2022
b1ef9f0
iter
glemaitre Nov 23, 2022
8ef441b
iter
glemaitre Nov 23, 2022
417160b
Apply suggestions from code review
glemaitre Nov 24, 2022
0d7ba2a
DOC addressed most of the concerns
glemaitre Nov 24, 2022
3ed01d6
FIX/TST respond reviews
glemaitre Nov 24, 2022
026c77e
fix
glemaitre Nov 24, 2022
fcbb4ef
Apply suggestions from code review
glemaitre Nov 24, 2022
5f68884
DOC apply same changes to other docstrings
glemaitre Nov 24, 2022
52c3bd2
Merge remote-tracking branch 'origin/main' into predicted_actual_regr…
glemaitre Nov 24, 2022
d1a6c96
Merge remote-tracking branch 'upstream/main' into pr/glemaitre/18020
jeremiedbb Nov 25, 2022
f40bde8
fix what's new
jeremiedbb Nov 25, 2022
df16ab8
Apply suggestions from code review
ogrisel Nov 25, 2022
90a1a44
fix broken links
jeremiedbb Nov 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Expand Up @@ -1126,6 +1126,7 @@ See the :ref:`visualizations` section of the user guide for further details.
metrics.ConfusionMatrixDisplay
metrics.DetCurveDisplay
metrics.PrecisionRecallDisplay
metrics.PredictionErrorDisplay
metrics.RocCurveDisplay
calibration.CalibrationDisplay

Expand Down
64 changes: 64 additions & 0 deletions doc/modules/model_evaluation.rst
Expand Up @@ -2711,6 +2711,70 @@ Here are some usage examples of the :func:`d2_absolute_error_score` function::
>>> d2_absolute_error_score(y_true, y_pred)
0.0

Visual evaluation of regression models
--------------------------------------

The :class:`~sklearn.metrics.PredictionErrorDisplay` class allows to
visually inspect the quality of regression models. The quality of a regression
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
model can be assessed using two different plots as shown below:

.. image:: ../auto_examples/model_selection/images/sphx_glr_plot_cv_predict_001.png
:target: ../auto_examples/model_selection/plot_cv_predict.html
:scale: 75
:align: center

The plot on the left shows the actual values vs predicted values. For a
noise-free regression task, a perfect regression model would display data points
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
on the diagonal defined by predicted = actual values. The further away from this
optimal line, the larger the error of the model. In a more realistic setting with
irreducible noise, that is, when not all the variations of `y` can be explained
by features in `X`, then the best model would lead to a cloud of points densely
arranged around the diagonal.

Note that the above holds when the predicted values is the expected value of `y`
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
given `X`. This is typically the case for regression model that asymptotically
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
minimize the mean squared error objective function or more generally the
:ref:`mean Tweedie deviance <mean_tweedie_deviance>` for any value
of its "power" parameter.

When plotting the predictions of an estimator that predicts a quantile
of `y` given `X`, e.g. :class:`~sklearn.linear_model.QuantileRegressor`
or any other model asymptotically minimizing the :ref:`pinball loss
<pinball_loss>`, a fraction of the points are either expected to lie above or
below the diagonal depending on the estimated quantile.
glemaitre marked this conversation as resolved.
Show resolved Hide resolved

All in all, while intuitive to read, this plot does not really inform us on what
to do to obtain a better model.

The right-hand side plot shows the residuals, i.e. the difference between the
actual values and the predicted values, vs. the predicted values.

This plot makes it easier to visualize if the residuals follow
and `homoscedastic or heteroschedastic
<https://en.wikipedia.org/wiki/Homoscedasticity_and_heteroscedasticity>`_
distribution. In particular, if the true distribution of `y|X` is Poisson
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
or Gamma distributed, it is expected that the variance of the residuals
of the optimal model to grow with the predicted value of `E[y|X]` (either linearly
for Poisson or quadratically for Gamma).

We can also use this plot to check if the residuals are Gaussian distributed
with a constant variance (homeschedastic residuals) which is the assumption made
when fitting linear least squares regression model (see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the Gauss Markov theorem, we don‘t need the Gaussian. And we even don’t need Gauss Markov. I don't even know the most important assumptions, maybe:

  • Y|X has finite 2nd moment

Note that even time series are allowed.

:class:`sklearn.linear_mnodel.LinearRegression` and
:class:`sklearn.linear_mnodel.Ridge`). If this is not the case, and in particular if
the residual plot some banana-shaped structure, this is a hint that the model is
likely mis-specified and that non-linear feature engineering or switching to a
non-linear regression model might be useful.

Refer to the example below to see a model evaluation that make use of this
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
display.

.. topic:: Example:

* See :ref:`sphx_glr_auto_examples_compose_plot_transformed_target.py` for
an example on how to use :class:`~sklearn.metrics.PredictionErrorDisplay`
to visualize the prediction quality improvement of a regression model
obtained by transforming the target before learning.

.. _clustering_metrics:

Expand Down
1 change: 1 addition & 0 deletions doc/visualizations.rst
Expand Up @@ -86,4 +86,5 @@ Display Objects
metrics.ConfusionMatrixDisplay
metrics.DetCurveDisplay
metrics.PrecisionRecallDisplay
metrics.PredictionErrorDisplay
metrics.RocCurveDisplay
7 changes: 7 additions & 0 deletions doc/whats_new/v1.2.rst
Expand Up @@ -492,6 +492,13 @@ Changelog
of a binary classification problem. :pr:`22518` by
:user:`Arturo Amor <ArturoAmorQ>`.

- |Feature| Add :class:`metrics.PredictionErrorDisplay` to plot the predicted
and actual values to qualitatively assess the behavior of a regressor. The
glemaitre marked this conversation as resolved.
Show resolved Hide resolved
display can be created with the class methods
:func:`metrics.PredictionErrorDisplay.from_estimator` and
:func:`metrics.PredictionErrorDisplay.from_predictions`.
:pr:`18020` by :user:`Guillaume Lemaitre <glemaitre>`.

- |Fix| Allows `csr_matrix` as input for parameter: `y_true` of
the :func:`metrics.label_ranking_average_precision_score` metric.
:pr:`23442` by :user:`Sean Atukorala <ShehanAT>`
Expand Down