Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add CumulativeAccuracyDisplay #28752

Closed

Conversation

JosephBARBIERDARNAL
Copy link

Reference Issue

fix for #10003

What does this implement/fix?

creation of a CumulativeAccuracyDisplay class for plots

"The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate." (wikipedia definition)

It's mainly inspired from the RocCurveDisplay class.

other

It's currently a work in progress.

@JosephBARBIERDARNAL JosephBARBIERDARNAL marked this pull request as draft April 2, 2024 16:56
Copy link

github-actions bot commented Apr 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 73ca739. Link to the linter CI: here

@glemaitre glemaitre changed the title create cumulative accuracy profile class FEA add CumulativeAccuracyDisplay Apr 2, 2024
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

On a first pass, could you fix the linter CI. I would advise to directly use pre-commit. It would corresponds to the optional step 10 of the guideline(https://scikit-learn.org/dev/developers/contributing.html#how-to-contribute). It is optional but it makes life easier :)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the PR. Here is quick first pass of reviews.

sklearn/metrics/_plot/cap_curve.py Outdated Show resolved Hide resolved

Parameters
----------
cumulative_true_positives : ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how to change that name so that it would make sense both for classification problems and for positive regression problems (e.g. as in a Lorenz curve).

Any opinion @lorentzenchr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y_true_cumulative and mention in parameter docstring that for binary classification, the true positive has a value of 1.

sklearn/metrics/_plot/cap_curve.py Show resolved Hide resolved
sklearn/metrics/_plot/cap_curve.py Show resolved Hide resolved

CumulativeAccuracyDisplay.from_predictions(y_test, y_scores, name='Logistic Regression')
plt.title('CAP Curve using from_predictions')
plt.show()
Copy link
Member

@ogrisel ogrisel Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For information when running this code I get:

image

This looks fine but I would like to also have an option to display the curve for oracle predictions and the curve for the random predictions baseline.

EDIT: we need to check the literature to see if it's common to normalize the cumulated values to 1 instead of plotting absolute numbers. Maybe an option to select between relative and absolute scales would make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think we should set better the x- and y-lim (as in ROC curve display).
Also, is it common for this curve to be be in a squared axis? I see this is kind of different from the ROC and PR curve in this regards, where the x- and y-axis are not the same unit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • An option for relative VS absolute scales sounds relevant to me. Don't have a opinion of which should be the default.

  • +1 for squared axis (especially with normalized scale). This paper (Tasche 2006 "Validation of internal rating systems and PD estimates" contains one example with a squared axis cap curve:

Screenshot 2024-04-03 at 20 06 46

It reminds of ROC curve, but maybe there are reasons of not doing so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option for relative VS absolute scales sounds relevant to me. Don't have a opinion of which should be the default.

I would use relative intuitively as the default.

@JosephBARBIERDARNAL
Copy link
Author

I've implemented some of the elements mentioned, but I don't have the time to do everything quickly.

I also have several concerns:

  • I haven't yet understood how to use this type of curve in the case of a regressor, and I don't think I've seen any article on the subject?
  • I'm not sure I understand how to use pre-commit/linters properly (I haven't managed to commit with pre-commit).
  • Being a total beginner in the use of tests, I haven't shifted the code for this, and could greatly use a simple practical explanation.

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2024

I haven't yet understood how to use this type of curve in the case of a regressor, and I don't think I've seen any article on the subject?

The example I linked above is precisely an example of a Lorenz curve which based on the linked discussion is very related to the CAP curve.

EDIT: I also posted an extra reference in the parent issue: #10003 (comment).

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2024

Being a total beginner in the use of tests, I haven't shifted the code for this, and could greatly use a simple practical explanation.

For an example, you can have a look at the tests for RocCurveDisplay in sklearn/metrics/_plot/tests/test_roc_curve_display.py.

You can run all the tests in this module with the pytest command (pip install pytest first if needed):

$ pytest sklearn/metrics/_plot/tests/test_roc_curve_display.py -vl

you can run a specific test with:

$ pytest sklearn/metrics/_plot/tests/test_roc_curve_display.py -vl -k test_roc_curve_display_plotting
================================================================================================ test session starts =================================================================================================
platform darwin -- Python 3.11.7, pytest-8.1.1, pluggy-1.4.0 -- /Users/ogrisel/miniforge3/envs/dev/bin/python3.11
cachedir: .pytest_cache
Using --randomly-seed=2393278150
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/Users/ogrisel/code/scikit-learn/.hypothesis/examples'))
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/ogrisel/code/scikit-learn
configfile: setup.cfg
plugins: snapshot-0.9.0, httpserver-1.0.8, repeat-0.9.3, randomly-3.15.0, cov-4.1.0, clarity-1.0.1, mock-3.12.0, hypothesis-6.93.0, xdist-3.5.0, anyio-4.2.0, benchmark-4.0.0
collected 53 items / 21 deselected / 32 selected                                                                                                                                                                     

sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-False-False-True-decision_function] PASSED                                            [  3%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_predictions-Classifier-True-False-True-decision_function] PASSED                                                   [  6%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-False-True-True-predict_proba] PASSED                                                 [  9%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-False-False-False-decision_function] PASSED                                           [ 12%]
...
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-True-True-False-predict_proba] PASSED                                                 [ 96%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-True-True-True-predict_proba] PASSED                                                  [100%]

========================================================================================= 32 passed, 21 deselected in 0.67s ==========================================================================================

since this is a parametrized test (with @pytest.mark.parametrize decorators) this is runs actually 21 variations of the same test functions. To run a single version you can do:

$ pytest sklearn/metrics/_plot/tests/test_roc_curve_display.py -vl -k "test_roc_curve_display_plotting and from_estimator-LogisticRegression-True-True-True-predict_proba"

So for your branch, you can create a file named sklearn/metrics/_plot/tests/test_cap_curve_display.py or similar and write some test functions that make assertions similar to what we do for the ROC curve display.

More details on pytest usage in https://scikit-learn.org/dev/developers/tips.html#pytest-tips or on the pytest website itself.

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2024

I'm not sure I understand how to use pre-commit/linters properly (I haven't managed to commit with pre-commit).

Some pre-commit tasks such as the black formatted and the ruff import statement formatter can auto-fix the code with you type git commit. If that happens the commit command fails but the files have been fixed: you need to git add the auto-fixed files and then git commit again.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments:

sklearn/metrics/_plot/cap_curve.py Outdated Show resolved Hide resolved

# compute cumulative sums for true positives and all cases
cumulative_true_positives = np.cumsum(y_true_sorted * sample_weight_sorted)
cumulative_total = np.cumsum(sample_weight_sorted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to a Lorenz curve that we manually define in this example.

def lorenz_curve(y_true, y_pred, exposure):
y_true, y_pred = np.asarray(y_true), np.asarray(y_pred)
exposure = np.asarray(exposure)
# order samples by increasing predicted risk:
ranking = np.argsort(y_pred)
ranked_frequencies = y_true[ranking]
ranked_exposure = exposure[ranking]
cumulated_claims = np.cumsum(ranked_frequencies * ranked_exposure)
cumulated_claims /= cumulated_claims[-1]
cumulated_exposure = np.cumsum(ranked_exposure)
cumulated_exposure /= cumulated_exposure[-1]
return cumulated_exposure, cumulated_claims

The differences are (as far as I can see):

  • y_pred is the estimated probability of a positive class membership given by a classifier for CAP/Lift/Gain curves while y_pred hold the continuous prediction of a positive target variable for a regressor.
  • sorted_indices = np.argsort(y_pred)[::-1] is reversed in CAP while it stays in ascending order in a Lorenz curve.

So I think we could indeed reuse the same class both for CAP/Lift/Gain curves and Lorenz curves.

To switch between both, we can probably introduce a keyword parameter x_ordering_style="cap" vs x_ordering_style="lorenz" for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the main thread of the PR, let's keep that for a follow-up PR.

@JosephBARBIERDARNAL
Copy link
Author

I'm not sure I understand how to use pre-commit/linters properly (I haven't managed to commit with pre-commit).

Some pre-commit tasks such as the black formatted and the ruff import statement formatter can auto-fix the code with you type git commit. If that happens the commit command fails but the files have been fixed: you need to git add the auto-fixed files and then git commit again.

thanks! helps a lot actually (also for tests)

@ogrisel
Copy link
Member

ogrisel commented Apr 5, 2024

Great. Now that you have tests, you can see in the diff view that the test coverage analysis has resulted in a few automated inline comments that highlight section of your code that are currently not tested:

https://github.com/scikit-learn/scikit-learn/pull/28752/files

To merge this PR we will need to have improved test coverage.

However, I would rather put the focus on the upgrading the code itself to:

  • make it possible to use the display on positively valued regression tasks;
  • make it possible to swap the order of the x-axis to use a Lorenz curve style instead of a CAP/Lift curve style;
  • updated the linked example to use the display instead of reimplementing a Lorenz curve from scratch.

@lorentzenchr
Copy link
Member

make it possible to swap the order of the x-axis to use a Lorenz curve style instead of a CAP/Lift curve style;

To keep this PR minimal, I would postpone this to a follow-up PR.

@ogrisel
Copy link
Member

ogrisel commented Apr 5, 2024

Alright, I agree to limit the scope of this PR to the case of CAP / Lift / Gain curve on classification tasks for now.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above I am ok to focus this PR on the classification case and leave regression for later. However I think we should already chose public attribute names for the cumulated statistics that are compatible with the anticipated regression/Lorenz curve case (see inline comment below)

Furthermore, I think we should update one of the existing example that displays a ROC curve for a binary classifier to also include a CAP curve. I think that the following example that already displays both a DET curve and a ROC curve is a good candidate:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_det.html

We should probably refactor it a bit to change the title and even the filename to make it explicit that it's about more than DET curves.

The alternative would be to create a new example file dedicated to CAP but I think we all agree that the example gallery has already too many short examples and it's better to consolidate them into larger examples that combine related components from the scikit-learn API and better contrasts the commonalities and differences.

if plot_chance_level:
assert (
cap_display.chance_level_ is not None
), "Chance level line should be present"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also extend the tests to check for the presence shape and dtypes of other public attributes such as y_true_cumulative (new name suggested in https://github.com/scikit-learn/scikit-learn/pull/28752/files#r1553497449).

I might also make sense to check that the values of those attribute match (with assert_allclose) when calling CumulativeAccuracyDisplay.from_estimator or CumulativeAccuracyDisplay.from_predictions.

Comment on lines +233 to +234
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
Copy link
Member

@ogrisel ogrisel Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
This is also known as a (cumulative) gain curve.

Let's not mention the regression case for now.

Comment on lines +146 to +147
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
Copy link
Member

@ogrisel ogrisel Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
This is also known as a (cumulative) gain curve.

To be expanded later in the expected follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the traditional lift curve for classifier is actually distinct from the CAP curve (although it shares some computation and the x-axis in particular): the lift curves is curve with y-values typically larger than one. It's a ratio of CAP curves and the values of the x-axis:

https://github.com/reiinakano/scikit-plot/blob/2dd3e6a76df77edcbd724c4db25575f70abb57cb/scikitplot/metrics.py#L1194-L1204

So indeed better not mention the lift curve as part of this docstring yet. To implement lift curves, we can either decide to add a parameter to CumulativeAccuracyDisplay or to create a dedicated display class in a follow-up PR and cross-reference them in their docstrings (e.g. in the "See also" section).

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of from_estimator.

Comment on lines +238 to +240
estimator : estimator instance
Fitted classifier or a fitted :class:`~sklearn.pipeline.Pipeline`
in which the last estimator is a classifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
estimator : estimator instance
Fitted classifier or a fitted :class:`~sklearn.pipeline.Pipeline`
in which the last estimator is a classifier.
estimator : BaseEstimator
A fitted estimator object implementing :term:`predict`,
:term:`predict_proba`, or :term:`decision_function`.
Multiclass classifiers are not supported.

Comment on lines +254 to +256
:term:`decision_function` as the target response. If set to 'auto',
:term:`predict_proba` is tried first and if it does not exist
:term:`decision_function` is tried next.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:term:`decision_function` as the target response. If set to 'auto',
:term:`predict_proba` is tried first and if it does not exist
:term:`decision_function` is tried next.
:term:`decision_function` as the target response. For regressors
this parameter is ignored and the response is always the output of
:term:`predict`. By default, :term:`predict_proba` is tried first
and we revert to :term:`decision_function` if it doesn't exist.

:term:`decision_function` is tried next.

pos_label : int, float, bool or str, default=None
The class considered as the positive class when computing metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The class considered as the positive class when computing metrics.
The class considered as the positive class when computing CAP.

" method."
)

if response_method == "predict_proba":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response_method == "predict_proba":
if not is_classifier(estimator):
y_pred = estimator.predict(X)
elif response_method == "predict_proba":

and at the top of the file from sklearn.base import is_classifier.

Comment on lines +233 to +234
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
This is also known as a Gain or Lift Curve, or the mirrored Lorenz
curve of Machine Learning for regression with a positively valued target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we agree to leave the Lorenz curve / regression case for a follow-up PR I would defer this change for later. Speaking about regression on an class that is designed and document to only accept classifier predictions would be a source confusion.

@lorentzenchr
Copy link
Member

The alternative would be to create a new example file dedicated to CAP but I think we all agree that the example gallery has already too many short examples

I would also not add another example. We can replace the lorenz curve with CAP in 2 examples (Poisson and Tweedie).

@ogrisel
Copy link
Member

ogrisel commented Apr 8, 2024

We can replace the lorenz curve with CAP in 2 examples (Poisson and Tweedie).

But this is only for regression. I would like to illustrate CAP in a classification context and updating an example with a ROC and DET curve would make that possible and also improve the discoverability of the plot. Furthermore I find it interesting to visually compare the 3 plots for the same models / data.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sklearn.metrics should expose the CumulativeAccuracyDisplay class as part of the public API and include it in its __all__ module level attribute.

I think the docstrings of the 2 class methods should:

  • mention that this curve assesses the ranking power of classifiers, that is the ability to correctly order the individual predictions by probability of actually be labeled by the positive class,
  • make it explicit this curve is invariant under any monotonic transformation of the predictions, therefore it cannot be used to assess the calibration of the predictions, only the ranking power,
  • have a "See also" section (at the end of the docstring) and cross-link RocCurveDisplay as an alternative curve to assess the ranking power of classifiers.

*,
sample_weight=None,
pos_label=None,
normalize_scale=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should normalize by default.

Suggested change
normalize_scale=False,
normalize_scale=True,

We also need to document this parameter in both docstrings.

@lorentzenchr
Copy link
Member

I just want to avoid a new example. Therefore, either extend existing examples or postpone to a followup PR. But please, no new example! (Strong opinion there)

from ...utils._plotting import _BinaryClassifierCurveDisplayMixin


class CumulativeAccuracyDisplay(_BinaryClassifierCurveDisplayMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class CumulativeAccuracyDisplay(_BinaryClassifierCurveDisplayMixin):
class CapCurveDisplay(_BinaryClassifierCurveDisplayMixin):

This name would align with RocCurveDisplay and DetCurveDisplay.

Site remark: I would have preferred capital letters for acronyms, i.e. ROCCurveDisplay and CAPCurceDisplay, but this ship has sailed.

Copy link
Member

@ogrisel ogrisel Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also would have preferred capital letters for acronyms... Maybe it's not too late to avoid propagating a bad convention to new names? We could also rename and alias the old names with a silent deprecation (just in the doc, without a warning with a deadline).

@lorentzenchr
Copy link
Member

Alright, I agree to limit the scope of this PR to the case of CAP / Lift / Gain curve on classification tasks for now.

I'm more in favor of including regression from the start. That's where it adds additional value. For classification, it is kind of equivalent to ROC. On top, this helps to find good names.

@JosephBARBIERDARNAL JosephBARBIERDARNAL closed this by deleting the head repository Apr 27, 2024
@lorentzenchr
Copy link
Member

@JosephBARBIERDARNAL Why did you close? This would be a valuable addition.

@JosephBARBIERDARNAL
Copy link
Author

I forgot that this PR was still open when I deleted my fork, sorry! Any solution for this type of case?

@JosephBARBIERDARNAL
Copy link
Author

I created a new PR (#28972) with the same code. I'll hopefully keep this one open. again sorry for the mistake!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants