Skip to content

Commit

Permalink
MNT Fix some easy-to-make typos (scikit-learn#15720)
Browse files Browse the repository at this point in the history
  • Loading branch information
bwignall authored and Pan Jan committed Mar 3, 2020
1 parent bf2972d commit 62b11e8
Show file tree
Hide file tree
Showing 18 changed files with 25 additions and 25 deletions.
2 changes: 1 addition & 1 deletion build_tools/azure/install.sh
Expand Up @@ -11,7 +11,7 @@ make_conda() {
}

version_ge() {
# The two version numbers are seperated with a new line is piped to sort
# The two version numbers are separated with a new line is piped to sort
# -rV. The -V activates for version number sorting and -r sorts in
# decending order. If the first argument is the top element of the sort, it
# is greater than or equal to the second argument.
Expand Down
2 changes: 1 addition & 1 deletion doc/developers/advanced_installation.rst
Expand Up @@ -374,7 +374,7 @@ Finally, build the package using the standard command::

pip install --verbose --editable .

For the upcomming FreeBSD 12.1 and 11.3 versions, OpenMP will be included in
For the upcoming FreeBSD 12.1 and 11.3 versions, OpenMP will be included in
the base system and these steps will not be necessary.

.. _OpenMP: https://en.wikipedia.org/wiki/OpenMP
Expand Down
4 changes: 2 additions & 2 deletions doc/modules/computing.rst
Expand Up @@ -529,7 +529,7 @@ Joblib-based parallelism
........................

When the underlying implementation uses joblib, the number of workers
(threads or processes) that are spawned in parallel can be controled via the
(threads or processes) that are spawned in parallel can be controlled via the
``n_jobs`` parameter.

.. note::
Expand Down Expand Up @@ -666,7 +666,7 @@ Python runtime

:working_memory:

the optimal size of temporary arrays used by some algoritms.
the optimal size of temporary arrays used by some algorithms.

.. _environment_variable:

Expand Down
2 changes: 1 addition & 1 deletion doc/modules/model_evaluation.rst
Expand Up @@ -1720,7 +1720,7 @@ relevant), NDCG can be used.

For one sample, given the vector of continuous ground-truth values for each
target :math:`y \in \mathbb{R}^{M}`, where :math:`M` is the number of outputs, and
the prediction :math:`\hat{y}`, which induces the ranking funtion :math:`f`, the
the prediction :math:`\hat{y}`, which induces the ranking function :math:`f`, the
DCG score is

.. math::
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/neighbors.rst
Expand Up @@ -581,7 +581,7 @@ implementation with special data types. The precomputed neighbors
training point as its own neighbor in the count of `n_neighbors`. However,
for compatibility reasons with other estimators which use the other
definition, one extra neighbor will be computed when `mode == 'distance'`.
To maximise compatiblity with all estimators, a safe choice is to always
To maximise compatibility with all estimators, a safe choice is to always
include one extra neighbor in a custom nearest neighbors estimator, since
unnecessary neighbors will be filtered by following estimators.

Expand Down
2 changes: 1 addition & 1 deletion doc/whats_new/v0.20.rst
Expand Up @@ -709,7 +709,7 @@ Support for Python 3.3 has been officially dropped.

- |Feature| |Fix| :class:`decomposition.SparsePCA` now exposes
``normalize_components``. When set to True, the train and test data are
centered with the train mean repsectively during the fit phase and the
centered with the train mean respectively during the fit phase and the
transform phase. This fixes the behavior of SparsePCA. When set to False,
which is the default, the previous abnormal behaviour still holds. The False
value is for backward compatibility and should not be used. :issue:`11585`
Expand Down
2 changes: 1 addition & 1 deletion doc/whats_new/v0.21.rst
Expand Up @@ -295,7 +295,7 @@ Support for Python 3.4 and below has been officially dropped.
......................

- |MajorFeature| A new clustering algorithm: :class:`cluster.OPTICS`: an
algoritm related to :class:`cluster.DBSCAN`, that has hyperparameters easier
algorithm related to :class:`cluster.DBSCAN`, that has hyperparameters easier
to set and that scales better, by :user:`Shane <espg>`,
`Adrin Jalali`_, :user:`Erich Schubert <kno10>`, `Hanmin Qin`_, and
:user:`Assia Benbihi <assiaben>`.
Expand Down
2 changes: 1 addition & 1 deletion doc/whats_new/v0.22.rst
Expand Up @@ -799,7 +799,7 @@ Changelog
- |Fix| :class:`svm.SVC`, :class:`svm.SVR`, :class:`svm.NuSVR` and
:class:`svm.OneClassSVM` when received values negative or zero
for parameter ``sample_weight`` in method fit(), generated an
invalid model. This behavior occured only in some border scenarios.
invalid model. This behavior occurred only in some border scenarios.
Now in these cases, fit() will fail with an Exception.
:pr:`14286` by :user:`Alex Shacked <alexshacked>`.

Expand Down
2 changes: 1 addition & 1 deletion examples/inspection/plot_partial_dependence.py
Expand Up @@ -14,7 +14,7 @@
:class:`~sklearn.ensemble.HistGradientBoostingRegressor` trained on the
California housing dataset. The example is taken from [1]_.
The plots show four 1-way and two 1-way partial dependence plots (ommitted for
The plots show four 1-way and two 1-way partial dependence plots (omitted for
:class:`~sklearn.neural_network.MLPRegressor` due to computation time). The
target variables for the one-way PDP are: median income (`MedInc`), average
occupants per household (`AvgOccup`), median house age (`HouseAge`), and
Expand Down
4 changes: 2 additions & 2 deletions sklearn/decomposition/_dict_learning.py
Expand Up @@ -704,7 +704,7 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
inner_stats : tuple of (A, B) ndarrays
Inner sufficient statistics that are kept by the algorithm.
Passing them at initialization is useful in online settings, to
avoid loosing the history of the evolution.
avoid losing the history of the evolution.
A (n_components, n_components) is the dictionary covariance matrix.
B (n_features, n_components) is the data approximation matrix
Expand Down Expand Up @@ -1351,7 +1351,7 @@ class MiniBatchDictionaryLearning(SparseCodingMixin, BaseEstimator):
inner_stats_ : tuple of (A, B) ndarrays
Internal sufficient statistics that are kept by the algorithm.
Keeping them is useful in online settings, to avoid loosing the
Keeping them is useful in online settings, to avoid losing the
history of the evolution, but they shouldn't have any use for the
end user.
A (n_components, n_components) is the dictionary covariance matrix.
Expand Down
6 changes: 3 additions & 3 deletions sklearn/ensemble/_hist_gradient_boosting/binning.py
Expand Up @@ -32,7 +32,7 @@ def _find_binning_thresholds(data, max_bins, subsample, random_state):
instead of the quantiles.
subsample : int or None
If ``n_samples > subsample``, then ``sub_samples`` samples will be
randomly choosen to compute the quantiles. If ``None``, the whole data
randomly chosen to compute the quantiles. If ``None``, the whole data
is used.
random_state: int or numpy.random.RandomState or None
Pseudo-random number generator to control the random sub-sampling.
Expand Down Expand Up @@ -107,7 +107,7 @@ class _BinMapper(TransformerMixin, BaseEstimator):
instead of the quantiles.
subsample : int or None, optional (default=2e5)
If ``n_samples > subsample``, then ``sub_samples`` samples will be
randomly choosen to compute the quantiles. If ``None``, the whole data
randomly chosen to compute the quantiles. If ``None``, the whole data
is used.
random_state: int or numpy.random.RandomState or None, \
optional (default=None)
Expand All @@ -126,7 +126,7 @@ class _BinMapper(TransformerMixin, BaseEstimator):
equal to ``n_bins - 1``.
missing_values_bin_idx_ : uint8
The index of the bin where missing values are mapped. This is a
constant accross all features. This corresponds to the last bin, and
constant across all features. This corresponds to the last bin, and
it is always equal to ``n_bins - 1``. Note that if ``n_bins_missing_``
is less than ``n_bins - 1`` for a given feature, then there are
empty (and unused) bins.
Expand Down
Expand Up @@ -413,7 +413,7 @@ def test_infinite_values_missing_values():
# High level test making sure that inf and nan values are properly handled
# when both are present. This is similar to
# test_split_on_nan_with_infinite_values() in test_grower.py, though we
# cannot check the predicitons for binned values here.
# cannot check the predictions for binned values here.

X = np.asarray([-np.inf, 0, 1, np.inf, np.nan]).reshape(-1, 1)
y_isnan = np.isnan(X.ravel())
Expand Down
2 changes: 1 addition & 1 deletion sklearn/ensemble/tests/test_gradient_boosting.py
Expand Up @@ -1311,7 +1311,7 @@ def test_gradient_boosting_with_init(gb, dataset_maker, init_estimator):
# Check that GradientBoostingRegressor works when init is a sklearn
# estimator.
# Check that an error is raised if trying to fit with sample weight but
# inital estimator does not support sample weight
# initial estimator does not support sample weight

X, y = dataset_maker()
sample_weight = np.random.RandomState(42).rand(100)
Expand Down
2 changes: 1 addition & 1 deletion sklearn/externals/_arff.py
Expand Up @@ -98,7 +98,7 @@
The above keys must follow the case which were described, i.e., the keys are
case sensitive. The attribute type ``attribute_type`` must be one of these
strings (they are not case sensitive): ``NUMERIC``, ``INTEGER``, ``REAL`` or
``STRING``. For nominal attributes, the ``atribute_type`` must be a list of
``STRING``. For nominal attributes, the ``attribute_type`` must be a list of
strings.
In this format, the XOR dataset presented above can be represented as a python
Expand Down
2 changes: 1 addition & 1 deletion sklearn/metrics/_regression.py
Expand Up @@ -717,7 +717,7 @@ def mean_tweedie_deviance(y_true, y_pred, sample_weight=None, power=0):
message = ("Mean Tweedie deviance error with power={} can only be used on "
.format(power))
if power < 0:
# 'Extreme stable', y_true any realy number, y_pred > 0
# 'Extreme stable', y_true any real number, y_pred > 0
if (y_pred <= 0).any():
raise ValueError(message + "strictly positive y_pred.")
dev = 2 * (np.power(np.maximum(y_true, 0), 2 - power)
Expand Down
2 changes: 1 addition & 1 deletion sklearn/metrics/tests/test_common.py
Expand Up @@ -115,7 +115,7 @@
"unnormalized_accuracy_score": partial(accuracy_score, normalize=False),

# `confusion_matrix` returns absolute values and hence behaves unnormalized
# . Naming it with an unnormalized_ prefix is neccessary for this module to
# . Naming it with an unnormalized_ prefix is necessary for this module to
# skip sample_weight scaling checks which will fail for unnormalized
# metrics.
"unnormalized_confusion_matrix": confusion_matrix,
Expand Down
6 changes: 3 additions & 3 deletions sklearn/metrics/tests/test_score_objects.py
Expand Up @@ -649,7 +649,7 @@ def predict(self, X):


def test_multimetric_scorer_sanity_check():
# scoring dictionary returned is the same as calling each scorer seperately
# scoring dictionary returned is the same as calling each scorer separately
scorers = {'a1': 'accuracy', 'a2': 'accuracy',
'll1': 'neg_log_loss', 'll2': 'neg_log_loss',
'ra1': 'roc_auc', 'ra2': 'roc_auc'}
Expand All @@ -664,13 +664,13 @@ def test_multimetric_scorer_sanity_check():

result = multi_scorer(clf, X, y)

seperate_scores = {
separate_scores = {
name: get_scorer(name)(clf, X, y)
for name in ['accuracy', 'neg_log_loss', 'roc_auc']}

for key, value in result.items():
score_name = scorers[key]
assert_allclose(value, seperate_scores[score_name])
assert_allclose(value, separate_scores[score_name])


@pytest.mark.parametrize('scorer_name, metric', [
Expand Down
4 changes: 2 additions & 2 deletions sklearn/model_selection/_search.py
Expand Up @@ -948,7 +948,7 @@ class GridSearchCV(BaseSearchCV):
returns the selected ``best_index_`` given ``cv_results_``. In that
case, the ``best_estimator_`` and ``best_parameters_`` will be set
according to the returned ``best_index_`` while the ``best_score_``
attribute will not be availble.
attribute will not be available.
The refitted estimator is made available at the ``best_estimator_``
attribute and permits using ``predict`` directly on this
Expand Down Expand Up @@ -1278,7 +1278,7 @@ class RandomizedSearchCV(BaseSearchCV):
returns the selected ``best_index_`` given the ``cv_results``. In that
case, the ``best_estimator_`` and ``best_parameters_`` will be set
according to the returned ``best_index_`` while the ``best_score_``
attribute will not be availble.
attribute will not be available.
The refitted estimator is made available at the ``best_estimator_``
attribute and permits using ``predict`` directly on this
Expand Down

0 comments on commit 62b11e8

Please sign in to comment.