Skip to content

Commit

Permalink
Draft scikit-learn#2: suggested changes for score_samples and density…
Browse files Browse the repository at this point in the history
… estimator
  • Loading branch information
yc2311 committed Apr 21, 2019
1 parent c3899da commit 7ee238c
Showing 1 changed file with 13 additions and 12 deletions.
25 changes: 13 additions & 12 deletions doc/glossary.rst
Expand Up @@ -811,14 +811,14 @@ Class APIs and Estimator Types

density estimator
An :term:`unsupervised` estimation of input density without a labeled response.
Most commonly used techniques are `Histograms <https://scikit-learn.org/stable/modules/density.html#density-estimation-histograms>`_,
`GaussianMixture`,
Most commonly used techniques are `Histograms <https://scikit-learn.org/stable/modules/density.html#density-estimation-histograms>`_,
`GaussianMixture`,
and `KernelDensity` estimation.

* `Histograms <https://scikit-learn.org/stable/modules/density.html#density-estimation-histograms>`_
visually represents the density of specific bins.
* Gaussian Mixtures are discussed in `Clustering`.
* Kernel density estimation has multiple forms to represent density based on bandwidth.
* Gaussian Mixtures are discussed in `Clustering`.
* Kernel density estimation has multiple forms to represent density based on the chosen kernel and associated bandwidth.

It can also be performed on a multi-dimensional graph.

Expand Down Expand Up @@ -1147,11 +1147,11 @@ Methods
:term:`classes_`.
multilabel classification
Scikit-learn is inconsistent in its representation of multilabel
decision functions. Multi-output multiclass classifiers
decision functions. Multi-output multiclass classifiers
(eg. ``RandomForestClassifier``) represent it as a list of 2d arrays.
Multilabel classifiers (eg. ``OneVsRestClassifier``)
represent it as a single 2d array, where columns correspond to the
individual binary classification decisions. These scores should be
represent it as a single 2d array, where columns correspond to the
individual binary classification decisions. These scores should be
threshold at 0.
multioutput classification
A list of 2d arrays, corresponding to each multiclass decision
Expand Down Expand Up @@ -1341,12 +1341,13 @@ Methods
often the likelihood of the data under the model.

``score_samples``
A method on an array of data points, which evaluates its predictions on
the given dataset, and returns an array consisting of log evaluations
for each.
A method that returns the likelihood of given samples.

It returns low values for high-dimensional data since evaluations are
normalized to probability densities.
For density estimation, it returns the value (log) of the density of the
samples.

For outlier detection, it returns a score for the sample based on its
likelihood (thus if it is an outlier, it's not likely at all).

If the estimator was not already :term:`fitted`, calling this method
should raise a :class:`exceptions.NotFittedError`.
Expand Down

0 comments on commit 7ee238c

Please sign in to comment.