Skip to content

Commit

Permalink
[MRG] Add Detection Error Tradeoff (DET) curve classification metrics (
Browse files Browse the repository at this point in the history
…#10591)

* Initial add DET curve to classification metrics

* Add DET to exports

* Fix DET-curve doctest errors

- Sample snippet in  model_evaluation documentation was outdated.

* Clarify wording in DET-curve computation

- Align to the wording of ranking module to make it consistent.
- Add correct describtion of input and outputs.
- Update and fix non-existent links

* Beautify DET curve documentation source

- Limit line length to 80 characters.

* Expand DET curve documentation

- Add an example plot to show difference between ROC and DET curves.
- Expand Usage Note section with background information and properties
of DET curves.

* Update DET-curve documentation

- Fix typos and some grammar improvements.
- Use named references to avoid potential conflicts with other sections.
- Remove unneeded references and improved existing ones by using e.g.
using versioned links.

* Select relevant DET points using slice object

* Remove some dubiety from DET curve doc-string

* Add DET curve contributors

* Add tests for DET curves

* Streamline DET test by using parametrization

* Increase verbosity of DET curve error handling

- Explicitly sanity check input before computing a DET curve.
- Add test for perfect scores.
- Adapt indentation style to match the test module.

* Add reference for DET curves in invariance test

* Add automated invariance checks for DET curves

* Resolve merge artifacts

* Make doctest happy

* Fix whitespaces for doctest

* Revert unintended whitespace changes

* Revert unintended white space changes #2

* Fix typos and grammar

* Fix white space in doc

* Streamline test code

* Remove rebase artifacts

* Fix PR link in doc

* Fix test_ranking

* Fix rebase errors

* Fix import

* Bring back newlines

- Swallowed by copy/paste

* Remove uncited ref link

* Remove matplotlib deprecation warning

* Bring back hidden reference

* Add motivation to DET example

* Fix lint

* Add citation

* Use modern matplotlib API

Co-authored-by: Jeremy Karnowski <jeremy.karnowski@gmail.com>
Co-authored-by: Julien Cornebise <julien@cornebise.com>
Co-authored-by: Daniel Mohns <daniel.mohns@zenguard.org>
  • Loading branch information
4 people committed Aug 16, 2020
1 parent eb7b158 commit 41d648e
Show file tree
Hide file tree
Showing 8 changed files with 442 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Expand Up @@ -946,6 +946,7 @@ details.
metrics.cohen_kappa_score
metrics.confusion_matrix
metrics.dcg_score
metrics.detection_error_tradeoff_curve
metrics.f1_score
metrics.fbeta_score
metrics.hamming_loss
Expand Down
88 changes: 88 additions & 0 deletions doc/modules/model_evaluation.rst
Expand Up @@ -306,6 +306,7 @@ Some of these are restricted to the binary classification case:

precision_recall_curve
roc_curve
detection_error_tradeoff_curve


Others also work in the multiclass case:
Expand Down Expand Up @@ -1437,6 +1438,93 @@ to the given limit.
In Data Mining, 2001.
Proceedings IEEE International Conference, pp. 131-138.
.. _det_curve:

Detection error tradeoff (DET)
------------------------------

The function :func:`detection_error_tradeoff_curve` computes the
detection error tradeoff curve (DET) curve [WikipediaDET2017]_.
Quoting Wikipedia:

"A detection error tradeoff (DET) graph is a graphical plot of error rates for
binary classification systems, plotting false reject rate vs. false accept
rate. The x- and y-axes are scaled non-linearly by their standard normal
deviates (or just by logarithmic transformation), yielding tradeoff curves
that are more linear than ROC curves, and use most of the image area to
highlight the differences of importance in the critical operating region."

DET curves are a variation of receiver operating characteristic (ROC) curves
where False Negative Rate is plotted on the ordinate instead of True Positive
Rate.
DET curves are commonly plotted in normal deviate scale by transformation with
:math:`\phi^{-1}` (with :math:`\phi` being the cumulative distribution
function).
The resulting performance curves explicitly visualize the tradeoff of error
types for given classification algorithms.
See [Martin1997]_ for examples and further motivation.

This figure compares the ROC and DET curves of two example classifiers on the
same classification task:

.. image:: ../auto_examples/model_selection/images/sphx_glr_plot_det_001.png
:target: ../auto_examples/model_selection/plot_det.html
:scale: 75
:align: center

**Properties:**

* DET curves form a linear curve in normal deviate scale if the detection
scores are normally (or close-to normally) distributed.
It was shown by [Navratil2007]_ that the reverse it not necessarily true and even more
general distributions are able produce linear DET curves.

* The normal deviate scale transformation spreads out the points such that a
comparatively larger space of plot is occupied.
Therefore curves with similar classification performance might be easier to
distinguish on a DET plot.

* With False Negative Rate being "inverse" to True Positive Rate the point
of perfection for DET curves is the origin (in contrast to the top left corner
for ROC curves).

**Applications and limitations:**

DET curves are intuitive to read and hence allow quick visual assessment of a
classifier's performance.
Additionally DET curves can be consulted for threshold analysis and operating
point selection.
This is particularly helpful if a comparison of error types is required.

One the other hand DET curves do not provide their metric as a single number.
Therefore for either automated evaluation or comparison to other
classification tasks metrics like the derived area under ROC curve might be
better suited.

.. topic:: Examples:

* See :ref:`sphx_glr_auto_examples_model_selection_plot_det.py`
for an example comparison between receiver operating characteristic (ROC)
curves and Detection error tradeoff (DET) curves.

.. topic:: References:

.. [WikipediaDET2017] Wikipedia contributors. Detection error tradeoff.
Wikipedia, The Free Encyclopedia. September 4, 2017, 23:33 UTC.
Available at: https://en.wikipedia.org/w/index.php?title=Detection_error_tradeoff&oldid=798982054.
Accessed February 19, 2018.
.. [Martin1997] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki,
`The DET Curve in Assessment of Detection Task Performance
<http://www.dtic.mil/docs/citations/ADA530509>`_,
NIST 1997.
.. [Navratil2007] J. Navractil and D. Klusacek,
"`On Linear DETs,
<http://www.research.ibm.com/CBG/papers/icassp07_navratil.pdf>`_"
2007 IEEE International Conference on Acoustics,
Speech and Signal Processing - ICASSP '07, Honolulu,
HI, 2007, pp. IV-229-IV-232.
.. _zero_one_loss:

Expand Down
5 changes: 5 additions & 0 deletions doc/whats_new/v0.24.rst
Expand Up @@ -270,6 +270,11 @@ Changelog
:mod:`sklearn.metrics`
......................

- |Feature| Added :func:`metrics.detection_error_tradeoff_curve` to compute
Detection Error Tradeoff curve classification metric.
:pr:`10591` by :user:`Jeremy Karnowski <jkarnows>` and
:user:`Daniel Mohns <dmohns>`.

- |Feature| Added :func:`metrics.mean_absolute_percentage_error` metric and
the associated scorer for regression problems. :issue:`10708` fixed with the
PR :pr:`15007` by :user:`Ashutosh Hathidara <ashutosh1919>`. The scorer and
Expand Down
145 changes: 145 additions & 0 deletions examples/model_selection/plot_det.py
@@ -0,0 +1,145 @@
"""
=======================================
Detection error tradeoff (DET) curve
=======================================
In this example, we compare receiver operating characteristic (ROC) and
detection error tradeoff (DET) curves for different classification algorithms
for the same classification task.
DET curves are commonly plotted in normal deviate scale.
To achieve this we transform the errors rates as returned by the
``detection_error_tradeoff_curve`` function and the axis scale using
``scipy.stats.norm``.
The point of this example is to demonstrate two properties of DET curves,
namely:
1. It might be easier to visually assess the overall performance of different
classification algorithms using DET curves over ROC curves.
Due to the linear scale used for plotting ROC curves, different classifiers
usually only differ in the top left corner of the graph and appear similar
for a large part of the plot. On the other hand, because DET curves
represent straight lines in normal deviate scale. As such, they tend to be
distinguishable as a whole and the area of interest spans a large part of
the plot.
2. DET curves give the user direct feedback of the detection error tradeoff to
aid in operating point analysis.
The user can deduct directly from the DET-curve plot at which rate
false-negative error rate will improve when willing to accept an increase in
false-positive error rate (or vice-versa).
The plots in this example compare ROC curves on the left side to corresponding
DET curves on the right.
There is no particular reason why these classifiers have been chosen for the
example plot over other classifiers available in scikit-learn.
.. note::
- See :func:`sklearn.metrics.roc_curve` for further information about ROC
curves.
- See :func:`sklearn.metrics.detection_error_tradeoff_curve` for further
information about DET curves.
- This example is loosely based on
:ref:`sphx_glr_auto_examples_classification_plot_classifier_comparison.py`
.
"""
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import detection_error_tradeoff_curve
from sklearn.metrics import roc_curve

from scipy.stats import norm
from matplotlib.ticker import FuncFormatter

N_SAMPLES = 1000

names = [
"Linear SVM",
"Random Forest",
]

classifiers = [
SVC(kernel="linear", C=0.025),
RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
]

X, y = make_classification(
n_samples=N_SAMPLES, n_features=2, n_redundant=0, n_informative=2,
random_state=1, n_clusters_per_class=1)

# preprocess dataset, split into training and test part
X = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=.4, random_state=0)

# prepare plots
fig, [ax_roc, ax_det] = plt.subplots(1, 2, figsize=(10, 5))

# first prepare the ROC curve
ax_roc.set_title('Receiver Operating Characteristic (ROC) curves')
ax_roc.set_xlabel('False Positive Rate')
ax_roc.set_ylabel('True Positive Rate')
ax_roc.set_xlim(0, 1)
ax_roc.set_ylim(0, 1)
ax_roc.grid(linestyle='--')
ax_roc.yaxis.set_major_formatter(
FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
ax_roc.xaxis.set_major_formatter(
FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

# second prepare the DET curve
ax_det.set_title('Detection Error Tradeoff (DET) curves')
ax_det.set_xlabel('False Positive Rate')
ax_det.set_ylabel('False Negative Rate')
ax_det.set_xlim(-3, 3)
ax_det.set_ylim(-3, 3)
ax_det.grid(linestyle='--')

# customized ticks for DET curve plot to represent normal deviate scale
ticks = [0.001, 0.01, 0.05, 0.20, 0.5, 0.80, 0.95, 0.99, 0.999]
tick_locs = norm.ppf(ticks)
tick_lbls = [
'{:.0%}'.format(s) if (100*s).is_integer() else '{:.1%}'.format(s)
for s in ticks
]
plt.sca(ax_det)
plt.xticks(tick_locs, tick_lbls)
plt.yticks(tick_locs, tick_lbls)

# iterate over classifiers
for name, clf in zip(names, classifiers):
clf.fit(X_train, y_train)

if hasattr(clf, "decision_function"):
y_score = clf.decision_function(X_test)
else:
y_score = clf.predict_proba(X_test)[:, 1]

roc_fpr, roc_tpr, _ = roc_curve(y_test, y_score)
det_fpr, det_fnr, _ = detection_error_tradeoff_curve(y_test, y_score)

ax_roc.plot(roc_fpr, roc_tpr)

# transform errors into normal deviate scale
ax_det.plot(
norm.ppf(det_fpr),
norm.ppf(det_fnr)
)

# add a single legend
plt.sca(ax_det)
plt.legend(names, loc="upper right")

# plot
plt.tight_layout()
plt.show()
2 changes: 2 additions & 0 deletions sklearn/metrics/__init__.py
Expand Up @@ -7,6 +7,7 @@
from ._ranking import auc
from ._ranking import average_precision_score
from ._ranking import coverage_error
from ._ranking import detection_error_tradeoff_curve
from ._ranking import dcg_score
from ._ranking import label_ranking_average_precision_score
from ._ranking import label_ranking_loss
Expand Down Expand Up @@ -104,6 +105,7 @@
'coverage_error',
'dcg_score',
'davies_bouldin_score',
'detection_error_tradeoff_curve',
'euclidean_distances',
'explained_variance_score',
'f1_score',
Expand Down
88 changes: 88 additions & 0 deletions sklearn/metrics/_ranking.py
Expand Up @@ -218,6 +218,94 @@ def _binary_uninterpolated_average_precision(
average, sample_weight=sample_weight)


def detection_error_tradeoff_curve(y_true, y_score, pos_label=None,
sample_weight=None):
"""Compute error rates for different probability thresholds.
Note: This metrics is used for ranking evaluation of a binary
classification task.
Read more in the :ref:`User Guide <det_curve>`.
Parameters
----------
y_true : array, shape = [n_samples]
True targets of binary classification in range {-1, 1} or {0, 1}.
y_score : array, shape = [n_samples]
Estimated probabilities or decision function.
pos_label : int, optional (default=None)
The label of the positive class
sample_weight : array-like of shape = [n_samples], optional
Sample weights.
Returns
-------
fpr : array, shape = [n_thresholds]
False positive rate (FPR) such that element i is the false positive
rate of predictions with score >= thresholds[i]. This is occasionally
referred to as false acceptance propability or fall-out.
fnr : array, shape = [n_thresholds]
False negative rate (FNR) such that element i is the false negative
rate of predictions with score >= thresholds[i]. This is occasionally
referred to as false rejection or miss rate.
thresholds : array, shape = [n_thresholds]
Decreasing score values.
See also
--------
roc_curve : Compute Receiver operating characteristic (ROC) curve
precision_recall_curve : Compute precision-recall curve
Examples
--------
>>> import numpy as np
>>> from sklearn.metrics import detection_error_tradeoff_curve
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, fnr, thresholds = detection_error_tradeoff_curve(y_true, y_scores)
>>> fpr
array([0.5, 0.5, 0. ])
>>> fnr
array([0. , 0.5, 0.5])
>>> thresholds
array([0.35, 0.4 , 0.8 ])
"""
if len(np.unique(y_true)) != 2:
raise ValueError("Only one class present in y_true. Detection error "
"tradeoff curve is not defined in that case.")

fps, tps, thresholds = _binary_clf_curve(y_true, y_score,
pos_label=pos_label,
sample_weight=sample_weight)

fns = tps[-1] - tps
p_count = tps[-1]
n_count = fps[-1]

# start with false positives zero
first_ind = (
fps.searchsorted(fps[0], side='right') - 1
if fps.searchsorted(fps[0], side='right') > 0
else None
)
# stop with false negatives zero
last_ind = tps.searchsorted(tps[-1]) + 1
sl = slice(first_ind, last_ind)

# reverse the output such that list of false positives is decreasing
return (
fps[sl][::-1] / n_count,
fns[sl][::-1] / p_count,
thresholds[sl][::-1]
)


def _binary_roc_auc_score(y_true, y_score, sample_weight=None, max_fpr=None):
"""Binary roc auc score."""
if len(np.unique(y_true)) != 2:
Expand Down

0 comments on commit 41d648e

Please sign in to comment.