sklearn

Version 1.1.1

May 2022

Changelog

`sklearn.datasets`

Avoid timeouts in datasets.fetch_openml by not passing a timeout argument, 23358 by Loïc Estève <lesteve>.

`sklearn.decomposition`

Avoid spurious warning in decomposition.IncrementalPCA when n_samples == n_components. 23264 by Lucy Liu <lucyleeow>.

`sklearn.feature_selection`

The partial_fit method of feature_selection.SelectFromModel now conducts validation for max_features and feature_names_in parameters. 23299 by Long Bao <lorentzbao>.

`sklearn.metrics`

Fixes metrics.precision_recall_curve to compute precision-recall at 100% recall. The Precision-Recall curve now displays the last point corresponding to a classifier that always predicts the positive class: recall=100% and precision=class balance. 23214 by Stéphane Collot <stephanecollot> and Max Baak <mbaak>.

`sklearn.preprocessing`

preprocessing.PolynomialFeatures with degree equal to 0 will raise error when include_bias is set to False, and outputs a single constant array when include_bias is set to True. 23370 by Zhehao Liu <MaxwellLZH>.

`sklearn.tree`

Fixes performance regression with low cardinality features for tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.GradientBoostingClassifier, and ensemble.GradientBoostingRegressor. 23410 by Loïc Estève <lesteve>.

`sklearn.utils`

utils.class_weight.compute_sample_weight now works with sparse y. 23115 by kernc <kernc>.

Version 1.1.0

May 2022

For a short description of the main highlights of the release, please refer to sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_1_0.py.

Minimal dependencies

Version 1.1.0 of scikit-learn requires python 3.8+, numpy 1.17.3+ and scipy 1.3.2+. Optional minimal dependency is matplotlib 3.1.2+.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

cluster.KMeans now defaults to algorithm="lloyd" instead of algorithm="auto", which was equivalent to algorithm="elkan". Lloyd's algorithm and Elkan's algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd's algorithm uses much less memory, and it is often faster.
Fitting tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.GradientBoostingClassifier, and ensemble.GradientBoostingRegressor is on average 15% faster than in previous versions thanks to a new sort algorithm to find the best split. Models might be different because of a different handling of splits with tied criterion values: both the old and the new sorting algorithm are unstable sorting algorithms. 22868 by Thomas Fan.
The eigenvectors initialization for cluster.SpectralClustering and manifold.SpectralEmbedding now samples from a Gaussian when using the 'amg' or 'lobpcg' solver. This change improves numerical stability of the solver, but may result in a different model.
feature_selection.f_regression and feature_selection.r_regression will now returned finite score by default instead of np.nan and np.inf for some corner case. You can use force_finite=False if you really want to get non-finite values and keep the old behavior.
Panda's DataFrames with all non-string columns such as a MultiIndex no longer warns when passed into an Estimator. Estimators will continue to ignore the column names in DataFrames with non-string columns. For feature_names_in_ to be defined, columns must be all strings. 22410 by Thomas Fan.
preprocessing.KBinsDiscretizer changed handling of bin edges slightly, which might result in a different encoding with the same data.
calibration.calibration_curve changed handling of bin edges slightly, which might result in a different output curve given the same data.
discriminant_analysis.LinearDiscriminantAnalysis now uses the correct variance-scaling coefficient which may result in different model behavior.
feature_selection.SelectFromModel.fit and feature_selection.SelectFromModel.partial_fit can now be called with prefit=True. estimators_ will be a deep copy of estimator when prefit=True. 23271 by Guillaume Lemaitre <glemaitre>.

Changelog

Low-level routines for reductions on pairwise distances for dense float64 datasets have been refactored. The following functions and estimators now benefit from improved performances in terms of hardware scalability and speed-ups:
- sklearn.metrics.pairwise_distances_argmin
- sklearn.metrics.pairwise_distances_argmin_min
- sklearn.cluster.AffinityPropagation
- sklearn.cluster.Birch
- sklearn.cluster.MeanShift
- sklearn.cluster.OPTICS
- sklearn.cluster.SpectralClustering
- sklearn.feature_selection.mutual_info_regression
- sklearn.neighbors.KNeighborsClassifier
- sklearn.neighbors.KNeighborsRegressor
- sklearn.neighbors.RadiusNeighborsClassifier
- sklearn.neighbors.RadiusNeighborsRegressor
- sklearn.neighbors.LocalOutlierFactor
- sklearn.neighbors.NearestNeighbors
- sklearn.manifold.Isomap
- sklearn.manifold.LocallyLinearEmbedding
- sklearn.manifold.TSNE
- sklearn.manifold.trustworthiness
- sklearn.semi_supervised.LabelPropagation
- sklearn.semi_supervised.LabelSpreading
For instance sklearn.neighbors.NearestNeighbors.kneighbors and sklearn.neighbors.NearestNeighbors.radius_neighbors can respectively be up to ×20 and ×5 faster than previously.

21987, 22064, 22065, 22288 and 22320 by Julien Jerphanion <jjerphan>.
All scikit-learn models now generate a more informative error message when some input contains unexpected NaN or infinite values. In particular the message contains the input name ("X", "y" or "sample_weight") and if an unexpected NaN value is found in X, the error message suggests potential solutions. 21219 by Olivier Grisel <ogrisel>.
All scikit-learn models now generate a more informative error message when setting invalid hyper-parameters with set_params. 21542 by Olivier Grisel <ogrisel>.
Removes random unique identifiers in the HTML representation. With this change, jupyter notebooks are reproducible as long as the cells are run in the same order. 23098 by Thomas Fan.
Estimators with non_deterministic tag set to True will skip both check_methods_sample_order_invariance and check_methods_subset_invariance tests. 22318 by Zhehao Liu <MaxwellLZH>.
The option for using the log loss, aka binomial or multinomial deviance, via the loss parameters was made more consistent. The preferred way is by setting the value to "log_loss". Old option names are still valid and produce the same models, but are deprecated and will be removed in version 1.3.
- For ensemble.GradientBoostingClassifier, the loss parameter name "deviance" is deprecated in favor of the new name "log_loss", which is now the default. 23036 by Christian Lorentzen <lorentzenchr>.
- For ensemble.HistGradientBoostingClassifier, the loss parameter names "auto", "binary_crossentropy" and "categorical_crossentropy" are deprecated in favor of the new name "log_loss", which is now the default. 23040 by Christian Lorentzen <lorentzenchr>.
- For linear_model.SGDClassifier, the loss parameter name "log" is deprecated in favor of the new name "log_loss". 23046 by Christian Lorentzen <lorentzenchr>.
Rich html representation of estimators is now enabled by default in Jupyter notebooks. It can be deactivated by setting display='text' in sklearn.set_config. 22856 by Jérémie du Boisberranger <jeremiedbb>.
The error message is improved when importing model_selection.HalvingGridSearchCV, model_selection.HalvingRandomSearchCV, or impute.IterativeImputer without importing the experimental flag. 23194 by Thomas Fan.
Added an extension in doc/conf.py to automatically generate the list of estimators that handle NaN values. 23198 by Lise Kleiber, Zhehao Liu <MaxwellLZH> and Chiara Marmo <cmarmo>.

`sklearn.calibration`

calibration.calibration_curve accepts a parameter pos_label to specify the positive class label. 21032 by Guillaume Lemaitre <glemaitre>.
calibration.CalibratedClassifierCV.fit now supports passing fit_params, which are routed to the base_estimator. 18170 by Benjamin Bossan <BenjaminBossan>.
calibration.CalibrationDisplay accepts a parameter pos_label to add this information to the plot. 21038 by Guillaume Lemaitre <glemaitre>.
calibration.calibration_curve handles bin edges more consistently now. 14975 by Andreas Müller and 22526 by Meekail Zain <micky774>.
calibration.calibration_curve's normalize parameter is now deprecated and will be removed in version 1.3. It is recommended that a proper probability (i.e. a classifier's predict_proba positive class) is used for y_prob. 23095 by Jordan Silke <jsilke>.

`sklearn.cluster`

BisectingKMeans introducing Bisecting K-Means algorithm 20031 by Michal Krawczyk <michalkrawczyk>, Tom Dupre la Tour <TomDLT> and Jérémie du Boisberranger <jeremiedbb>.
cluster.SpectralClustering and cluster.spectral_clustering now include the new 'cluster_qr' method that clusters samples in the embedding space as an alternative to the existing 'kmeans' and 'discrete' methods. See cluster.spectral_clustering for more details. 21148 by Andrew Knyazev <lobpcg>.
Adds get_feature_names_out to cluster.Birch, cluster.FeatureAgglomeration, cluster.KMeans, cluster.MiniBatchKMeans. 22255 by Thomas Fan.
cluster.SpectralClustering now raises consistent error messages when passed invalid values for n_clusters, n_init, gamma, n_neighbors, eigen_tol or degree. 21881 by Hugo Vassard <hvassard>.
cluster.AffinityPropagation now returns cluster centers and labels if they exist, even if the model has not fully converged. When returning these potentially-degenerate cluster centers and labels, a new warning message is shown. If no cluster centers were constructed, then the cluster centers remain an empty list with labels set to -1 and the original warning message is shown. 22217 by Meekail Zain <micky774>.
In cluster.KMeans, the default algorithm is now "lloyd" which is the full classical EM-style algorithm. Both "auto" and "full" are deprecated and will be removed in version 1.3. They are now aliases for "lloyd". The previous default was "auto", which relied on Elkan's algorithm. Lloyd's algorithm uses less memory than Elkan's, it is faster on many datasets, and its results are identical, hence the change. 21735 by Aurélien Geron <ageron>.
cluster.KMeans's init parameter now properly supports array-like input and NumPy string scalars. 22154 by Thomas Fan.

`sklearn.compose`

compose.ColumnTransformer now removes validation errors from __init__ and set_params methods. 22537 by iofall <iofall> and Arisa Y. <arisayosh>.
get_feature_names_out functionality in compose.ColumnTransformer was broken when columns were specified using slice. This is fixed in 22775 and 22913 by randomgeek78 <randomgeek78>.

`sklearn.covariance`

covariance.GraphicalLassoCV now accepts NumPy array for the parameter alphas. 22493 by Guillaume Lemaitre <glemaitre>.

`sklearn.cross_decomposition`

the inverse_transform method of cross_decomposition.PLSRegression, cross_decomposition.PLSCanonical and cross_decomposition.CCA now allows reconstruction of a X target when a Y parameter is given. 19680 by Robin Thibaut <robinthibaut>.
Adds get_feature_names_out to all transformers in the ~sklearn.cross_decomposition module: cross_decomposition.CCA, cross_decomposition.PLSSVD, cross_decomposition.PLSRegression, and cross_decomposition.PLSCanonical. 22119 by Thomas Fan.
The shape of the coef_ attribute of cross_decomposition.CCA, cross_decomposition.PLSCanonical and cross_decomposition.PLSRegression will change in version 1.3, from (n_features, n_targets) to (n_targets, n_features), to be consistent with other linear models and to make it work with interface expecting a specific shape for coef_ (e.g. feature_selection.RFE). 22016 by Guillaume Lemaitre <glemaitre>.
add the fitted attribute intercept_ to cross_decomposition.PLSCanonical, cross_decomposition.PLSRegression, and cross_decomposition.CCA. The method predict is indeed equivalent to Y = X @ coef_ + intercept_. 22015 by Guillaume Lemaitre <glemaitre>.

`sklearn.datasets`

datasets.load_files now accepts a ignore list and an allow list based on file extensions. 19747 by Tony Attalla <tonyattalla> and 22498 by Meekail Zain <micky774>.
datasets.make_swiss_roll now supports the optional argument hole; when set to True, it returns the swiss-hole dataset. 21482 by Sebastian Pujalte <pujaltes>.
datasets.make_blobs no longer copies data during the generation process, therefore uses less memory. 22412 by Zhehao Liu <MaxwellLZH>.
datasets.load_diabetes now accepts the parameter scaled, to allow loading unscaled data. The scaled version of this dataset is now computed from the unscaled data, and can produce slightly different results that in previous version (within a 1e-4 absolute tolerance). 16605 by Mandy Gu <happilyeverafter95>.
datasets.fetch_openml now has two optional arguments n_retries and delay. By default, datasets.fetch_openml will retry 3 times in case of a network failure with a delay between each try. 21901 by Rileran <rileran>.
datasets.fetch_covtype is now concurrent-safe: data is downloaded to a temporary directory before being moved to the data directory. 23113 by Ilion Beyst <iasoon>.
datasets.make_sparse_coded_signal now accepts a parameter data_transposed to explicitly specify the shape of matrix X. The default behavior True is to return a transposed matrix X corresponding to a (n_features, n_samples) shape. The default value will change to False in version 1.3. 21425 by Gabriel Stefanini Vicente <g4brielvs>.

`sklearn.decomposition`

Added a new estimator decomposition.MiniBatchNMF. It is a faster but less accurate version of non-negative matrix factorization, better suited for large datasets. 16948 by Chiara Marmo <cmarmo>, Patricio Cerda <pcerda> and Jérémie du Boisberranger <jeremiedbb>.
decomposition.dict_learning, decomposition.dict_learning_online and decomposition.sparse_encode preserve dtype for numpy.float32. decomposition.DictionaryLearning, decomposition.MiniBatchDictionaryLearning and decomposition.SparseCoder preserve dtype for numpy.float32. 22002 by Takeshi Oura <takoika>.
decomposition.PCA exposes a parameter n_oversamples to tune utils.randomized_svd and get accurate results when the number of features is large. 21109 by Smile <x-shadow-man>.
The decomposition.MiniBatchDictionaryLearning and decomposition.dict_learning_online have been refactored and now have a stopping criterion based on a small change of the dictionary or objective function, controlled by the new max_iter, tol and max_no_improvement parameters. In addition, some of their parameters and attributes are deprecated.
- the n_iter parameter of both is deprecated. Use max_iter instead.
- the iter_offset, return_inner_stats, inner_stats and return_n_iter parameters of decomposition.dict_learning_online serve internal purpose and are deprecated.
- the inner_stats_, iter_offset_ and random_state_ attributes of decomposition.MiniBatchDictionaryLearning serve internal purpose and are deprecated.
- the default value of the batch_size parameter of both will change from 3 to 256 in version 1.3.
18975 by Jérémie du Boisberranger <jeremiedbb>.
decomposition.SparsePCA and decomposition.MiniBatchSparsePCA preserve dtype for numpy.float32. 22111 by Takeshi Oura <takoika>.
decomposition.TruncatedSVD now allows n_components == n_features, if algorithm='randomized'. 22181 by Zach Deane-Mayer <zachmayer>.
Adds get_feature_names_out to all transformers in the ~sklearn.decomposition module: decomposition.DictionaryLearning, decomposition.FactorAnalysis, decomposition.FastICA, decomposition.IncrementalPCA, decomposition.KernelPCA, decomposition.LatentDirichletAllocation, decomposition.MiniBatchDictionaryLearning, decomposition.MiniBatchSparsePCA, decomposition.NMF, decomposition.PCA, decomposition.SparsePCA, and decomposition.TruncatedSVD. 21334 by Thomas Fan.
decomposition.TruncatedSVD exposes the parameter n_oversamples and power_iteration_normalizer to tune utils.randomized_svd and get accurate results when the number of features is large, the rank of the matrix is high, or other features of the matrix make low rank approximation difficult. 21705 by Jay S. Stanley III <stanleyjs>.
decomposition.PCA exposes the parameter power_iteration_normalizer to tune utils.randomized_svd and get more accurate results when low rank approximation is difficult. 21705 by Jay S. Stanley III <stanleyjs>.
decomposition.FastICA now validates input parameters in fit instead of __init__. 21432 by Hannah Bohle <hhnnhh> and Maren Westermann <marenwestermann>.
decomposition.FastICA now accepts np.float32 data without silent upcasting. The dtype is preserved by fit and fit_transform and the main fitted attributes use a dtype of the same precision as the training data. 22806 by Jihane Bennis <JihaneBennis> and Olivier Grisel <ogrisel>.
decomposition.FactorAnalysis now validates input parameters in fit instead of __init__. 21713 by Haya <HayaAlmutairi> and Krum Arnaudov <krumeto>.
decomposition.KernelPCA now validates input parameters in fit instead of __init__. 21567 by Maggie Chege <MaggieChege>.
decomposition.PCA and decomposition.IncrementalPCA more safely calculate precision using the inverse of the covariance matrix if self.noise_variance_ is zero. 22300 by Meekail Zain <micky774> and 15948 by sysuresh.
Greatly reduced peak memory usage in decomposition.PCA when calling fit or fit_transform. 22553 by Meekail Zain <micky774>.
decomposition.FastICA now supports unit variance for whitening. The default value of its whiten argument will change from True (which behaves like 'arbitrary-variance') to 'unit-variance' in version 1.3. 19490 by Facundo Ferrin <fferrin> and Julien Jerphanion <jjerphan>.

`sklearn.discriminant_analysis`

Adds get_feature_names_out to discriminant_analysis.LinearDiscriminantAnalysis. 22120 by Thomas Fan.
discriminant_analysis.LinearDiscriminantAnalysis now uses the correct variance-scaling coefficient which may result in different model behavior. 15984 by Okon Samuel <OkonSamuel> and 22696 by Meekail Zain <micky774>.

`sklearn.dummy`

dummy.DummyRegressor no longer overrides the constant parameter during fit. 22486 by Thomas Fan.

`sklearn.ensemble`

Added additional option loss="quantile" to ensemble.HistGradientBoostingRegressor for modelling quantiles. The quantile level can be specified with the new parameter quantile. 21800 and 20567 by Christian Lorentzen <lorentzenchr>.
fit of ensemble.GradientBoostingClassifier and ensemble.GradientBoostingRegressor now calls utils.check_array with parameter force_all_finite=False for non initial warm-start runs as it has already been checked before. 22159 by Geoffrey Paris <Geoffrey-Paris>.
ensemble.HistGradientBoostingClassifier is faster, for binary and in particular for multiclass problems thanks to the new private loss function module. 20811, 20567 and 21814 by Christian Lorentzen <lorentzenchr>.
Adds support to use pre-fit models with cv="prefit" in ensemble.StackingClassifier and ensemble.StackingRegressor. 16748 by Siqi He <siqi-he> and 22215 by Meekail Zain <micky774>.
ensemble.RandomForestClassifier and ensemble.ExtraTreesClassifier have the new criterion="log_loss", which is equivalent to criterion="entropy". 23047 by Christian Lorentzen <lorentzenchr>.
Adds get_feature_names_out to ensemble.VotingClassifier, ensemble.VotingRegressor, ensemble.StackingClassifier, and ensemble.StackingRegressor. 22695 and 22697 by Thomas Fan.
ensemble.RandomTreesEmbedding now has an informative get_feature_names_out function that includes both tree index and leaf index in the output feature names. 21762 by Zhehao Liu <MaxwellLZH> and Thomas Fan.
Fitting a ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.ExtraTreesClassifier, ensemble.ExtraTreesRegressor, and ensemble.RandomTreesEmbedding is now faster in a multiprocessing setting, especially for subsequent fits with warm_start enabled. 22106 by Pieter Gijsbers <PGijsbers>.
Change the parameter validation_fraction in ensemble.GradientBoostingClassifier and ensemble.GradientBoostingRegressor so that an error is raised if anything other than a float is passed in as an argument. 21632 by Genesis Valencia <genvalen>.
Removed a potential source of CPU oversubscription in ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor when CPU resource usage is limited, for instance using cgroups quota in a docker container. 22566 by Jérémie du Boisberranger <jeremiedbb>.
ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor no longer warns when fitting on a pandas DataFrame with a non-default scoring parameter and early_stopping enabled. 22908 by Thomas Fan.
Fixes HTML repr for ensemble.StackingClassifier and ensemble.StackingRegressor. 23097 by Thomas Fan.
The attribute loss_ of ensemble.GradientBoostingClassifier and ensemble.GradientBoostingRegressor has been deprecated and will be removed in version 1.3. 23079 by Christian Lorentzen <lorentzenchr>.
Changed the default of max_features to 1.0 for ensemble.RandomForestRegressor and to "sqrt" for ensemble.RandomForestClassifier. Note that these give the same fit results as before, but are much easier to understand. The old default value "auto" has been deprecated and will be removed in version 1.3. The same changes are also applied for ensemble.ExtraTreesRegressor and ensemble.ExtraTreesClassifier. 20803 by Brian Sun <bsun94>.
Improve runtime performance of ensemble.IsolationForest by skipping repetitive input checks. 23149 by Zhehao Liu <MaxwellLZH>.

`sklearn.feature_extraction`

feature_extraction.FeatureHasher now supports PyPy. 23023 by Thomas Fan.
feature_extraction.FeatureHasher now validates input parameters in transform instead of __init__. 21573 by Hannah Bohle <hhnnhh> and Maren Westermann <marenwestermann>.
feature_extraction.text.TfidfVectorizer now does not create a feature_extraction.text.TfidfTransformer at __init__ as required by our API. 21832 by Guillaume Lemaitre <glemaitre>.

`sklearn.feature_selection`

Added auto mode to feature_selection.SequentialFeatureSelector. If the argument n_features_to_select is 'auto', select features until the score improvement does not exceed the argument tol. The default value of n_features_to_select changed from None to 'warn' in 1.1 and will become 'auto' in 1.3. None and 'warn' will be removed in 1.3. 20145 by murata-yu <murata-yu>.
Added the ability to pass callables to the max_features parameter of feature_selection.SelectFromModel. Also introduced new attribute max_features_ which is inferred from max_features and the data during fit. If max_features is an integer, then max_features_ = max_features. If max_features is a callable, then max_features_ = max_features(X). 22356 by Meekail Zain <micky774>.
feature_selection.GenericUnivariateSelect preserves float32 dtype. 18482 by Thierry Gameiro <titigmr> and Daniel Kharsa <aflatoune> and 22370 by Meekail Zain <micky774>.
Add a parameter force_finite to feature_selection.f_regression and feature_selection.r_regression. This parameter allows to force the output to be finite in the case where a feature or a the target is constant or that the feature and target are perfectly correlated (only for the F-statistic). 17819 by Juan Carlos Alfaro Jiménez <alfaro96>.
Improve runtime performance of feature_selection.chi2 with boolean arrays. 22235 by Thomas Fan.
Reduced memory usage of feature_selection.chi2. 21837 by Louis Wagner <lrwagner>.

`sklearn.gaussian_process`

predict and sample_y methods of gaussian_process.GaussianProcessRegressor now return arrays of the correct shape in single-target and multi-target cases, and for both normalize_y=False and normalize_y=True. 22199 by Guillaume Lemaitre <glemaitre>, Aidar Shakerimoff <AidarShakerimoff> and Tenavi Nakamura-Zimmerer <Tenavi>.
gaussian_process.GaussianProcessClassifier raises a more informative error if CompoundKernel is passed via kernel. 22223 by MarcoM <marcozzxx810>.

`sklearn.impute`

impute.SimpleImputer now warns with feature names when features which are skipped due to the lack of any observed values in the training set. 21617 by Christian Ritter <chritter>.
Added support for pd.NA in impute.SimpleImputer. 21114 by Ying Xiong <yxiong>.
Adds get_feature_names_out to impute.SimpleImputer, impute.KNNImputer, impute.IterativeImputer, and impute.MissingIndicator. 21078 by Thomas Fan.
The verbose parameter was deprecated for impute.SimpleImputer. A warning will always be raised upon the removal of empty columns. 21448 by Oleh Kozynets <OlehKSS> and Christian Ritter <chritter>.

`sklearn.inspection`

Add a display to plot the boundary decision of a classifier by using the method inspection.DecisionBoundaryDisplay.from_estimator. 16061 by Thomas Fan.
In inspection.PartialDependenceDisplay.from_estimator, allow kind to accept a list of strings to specify which type of plot to draw for each feature interaction. 19438 by Guillaume Lemaitre <glemaitre>.
inspection.PartialDependenceDisplay.from_estimator, inspection.PartialDependenceDisplay.plot, and inspection.plot_partial_dependence now support plotting centered Individual Conditional Expectation (cICE) and centered PDP curves controlled by setting the parameter centered. 18310 by Johannes Elfner <JoElfner> and Guillaume Lemaitre <glemaitre>.

`sklearn.isotonic`

Adds get_feature_names_out to isotonic.IsotonicRegression. 22249 by Thomas Fan.

`sklearn.kernel_approximation`

Adds get_feature_names_out to kernel_approximation.AdditiveChi2Sampler. kernel_approximation.Nystroem, kernel_approximation.PolynomialCountSketch, kernel_approximation.RBFSampler, and kernel_approximation.SkewedChi2Sampler. 22137 and 22694 by Thomas Fan.

`sklearn.linear_model`

linear_model.ElasticNet, linear_model.ElasticNetCV, linear_model.Lasso and linear_model.LassoCV support sample_weight for sparse input X. 22808 by Christian Lorentzen <lorentzenchr>.
linear_model.Ridge with solver="lsqr" now supports to fit sparse input with fit_intercept=True. 22950 by Christian Lorentzen <lorentzenchr>.
linear_model.QuantileRegressor support sparse input for the highs based solvers. 21086 by Venkatachalam Natchiappan <venkyyuvy>. In addition, those solvers now use the CSC matrix right from the beginning which speeds up fitting. 22206 by Christian Lorentzen <lorentzenchr>.
linear_model.LogisticRegression is faster for solvers="lbfgs" and solver="newton-cg", for binary and in particular for multiclass problems thanks to the new private loss function module. In the multiclass case, the memory consumption has also been reduced for these solvers as the target is now label encoded (mapped to integers) instead of label binarized (one-hot encoded). The more classes, the larger the benefit. 21808, 20567 and 21814 by Christian Lorentzen <lorentzenchr>.
linear_model.GammaRegressor, linear_model.PoissonRegressor and linear_model.TweedieRegressor are faster for solvers="lbfgs". 22548, 21808 and 20567 by Christian Lorentzen <lorentzenchr>.
Rename parameter base_estimator to estimator in linear_model.RANSACRegressor to improve readability and consistency. base_estimator is deprecated and will be removed in 1.3. 22062 by Adrian Trujillo <trujillo9616>.
linear_model.ElasticNet and and other linear model classes using coordinate descent show error messages when non-finite parameter weights are produced. 22148 by Christian Ritter <chritter> and Norbert Preining <norbusan>.
linear_model.ElasticNet and linear_model.Lasso now raise consistent error messages when passed invalid values for l1_ratio, alpha, max_iter and tol. 22240 by Arturo Amor <ArturoAmorQ>.
linear_model.BayesianRidge and linear_model.ARDRegression now preserve float32 dtype. 9087 by Arthur Imbert <Henley13> and 22525 by Meekail Zain <micky774>.
linear_model.RidgeClassifier is now supporting multilabel classification. 19689 by Guillaume Lemaitre <glemaitre>.
linear_model.RidgeCV and linear_model.RidgeClassifierCV now raise consistent error message when passed invalid values for alphas. 21606 by Arturo Amor <ArturoAmorQ>.
linear_model.Ridge and linear_model.RidgeClassifier now raise consistent error message when passed invalid values for alpha, max_iter and tol. 21341 by Arturo Amor <ArturoAmorQ>.
linear_model.orthogonal_mp_gram preservse dtype for numpy.float32. 22002 by Takeshi Oura <takoika>.
linear_model.LassoLarsIC now correctly computes AIC and BIC. An error is now raised when n_features > n_samples and when the noise variance is not provided. 21481 by Guillaume Lemaitre <glemaitre> and Andrés Babino <ababino>.
linear_model.TheilSenRegressor now validates input parameter max_subpopulation in fit instead of __init__. 21767 by Maren Westermann <marenwestermann>.
linear_model.ElasticNetCV now produces correct warning when l1_ratio=0. 21724 by Yar Khine Phyo <yarkhinephyo>.
linear_model.LogisticRegression and linear_model.LogisticRegressionCV now set the n_iter_ attribute with a shape that respects the docstring and that is consistent with the shape obtained when using the other solvers in the one-vs-rest setting. Previously, it would record only the maximum of the number of iterations for each binary sub-problem while now all of them are recorded. 21998 by Olivier Grisel <ogrisel>.
The property family of linear_model.TweedieRegressor is not validated in __init__ anymore. Instead, this (private) property is deprecated in linear_model.GammaRegressor, linear_model.PoissonRegressor and linear_model.TweedieRegressor, and will be removed in 1.3. 22548 by Christian Lorentzen <lorentzenchr>.
The coef_ and intercept_ attributes of linear_model.LinearRegression are now correctly computed in the presence of sample weights when the input is sparse. 22891 by Jérémie du Boisberranger <jeremiedbb>.
The coef_ and intercept_ attributes of linear_model.Ridge with solver="sparse_cg" and solver="lbfgs" are now correctly computed in the presence of sample weights when the input is sparse. 22899 by Jérémie du Boisberranger <jeremiedbb>.
linear_model.SGDRegressor and linear_model.SGDClassifier now computes the validation error correctly when early stopping is enabled. 23256 by Zhehao Liu <MaxwellLZH>.
linear_model.LassoLarsIC now exposes noise_variance as a parameter in order to provide an estimate of the noise variance. This is particularly relevant when n_features > n_samples and the estimator of the noise variance cannot be computed. 21481 by Guillaume Lemaitre <glemaitre>.

`sklearn.manifold`

manifold.Isomap now supports radius-based neighbors via the radius argument. 19794 by Zhehao Liu <MaxwellLZH>.
manifold.spectral_embedding and manifold.SpectralEmbedding supports np.float32 dtype and will preserve this dtype. 21534 by Andrew Knyazev <lobpcg>.
Adds get_feature_names_out to manifold.Isomap and manifold.LocallyLinearEmbedding. 22254 by Thomas Fan.
added metric_params to manifold.TSNE constructor for additional parameters of distance metric to use in optimization. 21805 by Jeanne Dionisi <jeannedionisi> and 22685 by Meekail Zain <micky774>.
manifold.trustworthiness raises an error if n_neighbours >= n_samples / 2 to ensure a correct support for the function. 18832 by Hong Shao Yang <hongshaoyang> and 23033 by Meekail Zain <micky774>.
manifold.spectral_embedding now uses Gaussian instead of the previous uniform on [0, 1] random initial approximations to eigenvectors in eigen_solvers lobpcg and amg to improve their numerical stability. 21565 by Andrew Knyazev <lobpcg>.

`sklearn.metrics`

metrics.r2_score and metrics.explained_variance_score have a new force_finite parameter. Setting this parameter to False will return the actual non-finite score in case of perfect predictions or constant y_true, instead of the finite approximation (1.0 and 0.0 respectively) currently returned by default. 17266 by Sylvain Marié <smarie>.
metrics.d2_pinball_score and metrics.d2_absolute_error_score calculate the D² regression score for the pinball loss and the absolute error respectively. metrics.d2_absolute_error_score is a special case of metrics.d2_pinball_score with a fixed quantile parameter alpha=0.5 for ease of use and discovery. The D² scores are generalizations of the r2_score and can be interpeted as the fraction of deviance explained. 22118 by Ohad Michel <ohadmich>.
metrics.top_k_accuracy_score raises an improved error message when y_true is binary and y_score is 2d. 22284 by Thomas Fan.
metrics.roc_auc_score now supports average=None in the multiclass case when multiclass='ovr' which will return the score per class. 19158 by Nicki Skafte <SkafteNicki>.
Adds im_kw parameter to metrics.ConfusionMatrixDisplay.from_estimator metrics.ConfusionMatrixDisplay.from_predictions, and metrics.ConfusionMatrixDisplay.plot. The im_kw parameter is passed to the matplotlib.pyplot.imshow call when plotting the confusion matrix. 20753 by Thomas Fan.
metrics.silhouette_score now supports integer input for precomputed distances. 22108 by Thomas Fan.
Fixed a bug in metrics.normalized_mutual_info_score which could return unbounded values. 22635 by Jérémie du Boisberranger <jeremiedbb>.
Fixes metrics.precision_recall_curve and metrics.average_precision_score when true labels are all negative. 19085 by Varun Agrawal <varunagrawal>.
metrics.SCORERS is now deprecated and will be removed in 1.3. Please use metrics.get_scorer_names to retrieve the names of all available scorers. 22866 by Adrin Jalali.
Parameters sample_weight and multioutput of metrics.mean_absolute_percentage_error are now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. 21576 by Paul-Emile Dugnat <pedugnat>.
The "wminkowski" metric of metrics.DistanceMetric is deprecated and will be removed in version 1.3. Instead the existing "minkowski" metric now takes in an optional w parameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. 21873 by Yar Khine Phyo <yarkhinephyo>.
metrics.DistanceMetric has been moved from sklearn.neighbors to sklearn.metrics. Using neighbors.DistanceMetric for imports is still valid for backward compatibility, but this alias will be removed in 1.3. 21177 by Julien Jerphanion <jjerphan>.

`sklearn.mixture`

mixture.GaussianMixture and mixture.BayesianGaussianMixture can now be initialized using k-means++ and random data points. 20408 by Gordon Walsh <g-walsh>, Alberto Ceballos<alceballosa> and Andres Rios<ariosramirez>.
Fix a bug that correctly initialize precisions_cholesky_ in mixture.GaussianMixture when providing precisions_init by taking its square root. 22058 by Guillaume Lemaitre <glemaitre>.
mixture.GaussianMixture now normalizes weights_ more safely, preventing rounding errors when calling mixture.GaussianMixture.sample with n_components=1. 23034 by Meekail Zain <micky774>.

`sklearn.model_selection`

it is now possible to pass scoring="matthews_corrcoef" to all model selection tools with a scoring argument to use the Matthews correlation coefficient (MCC). 22203 by Olivier Grisel <ogrisel>.
raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. 21026 by Loïc Estève <lesteve>.
model_selection.GridSearchCV, model_selection.HalvingGridSearchCV now validate input parameters in fit instead of __init__. 21880 by Mrinal Tyagi <MrinalTyagi>.
model_selection.learning_curve now supports partial_fit with regressors. 22982 by Thomas Fan.

`sklearn.multiclass`

multiclass.OneVsRestClassifier now supports a verbose parameter so progress on fitting can be seen. 22508 by Chris Combs <combscCode>.
multiclass.OneVsOneClassifier.predict returns correct predictions when the inner classifier only has a predict_proba. 22604 by Thomas Fan.

`sklearn.neighbors`

Adds get_feature_names_out to neighbors.RadiusNeighborsTransformer, neighbors.KNeighborsTransformer and neighbors.NeighborhoodComponentsAnalysis. 22212 by Meekail Zain <micky774>.
neighbors.KernelDensity now validates input parameters in fit instead of __init__. 21430 by Desislava Vasileva <DessyVV> and Lucy Jimenez <LucyJimenez>.
neighbors.KNeighborsRegressor.predict now works properly when given an array-like input if KNeighborsRegressor is first constructed with a callable passed to the weights parameter. 22687 by Meekail Zain <micky774>.

`sklearn.neural_network`

neural_network.MLPClassifier and neural_network.MLPRegressor show error messages when optimizers produce non-finite parameter weights. 22150 by Christian Ritter <chritter> and Norbert Preining <norbusan>.
Adds get_feature_names_out to neural_network.BernoulliRBM. 22248 by Thomas Fan.

`sklearn.pipeline`

Added support for "passthrough" in pipeline.FeatureUnion. Setting a transformer to "passthrough" will pass the features unchanged. 20860 by Shubhraneel Pal <shubhraneel>.
pipeline.Pipeline now does not validate hyper-parameters in __init__ but in .fit(). 21888 by iofall <iofall> and Arisa Y. <arisayosh>.
pipeline.FeatureUnion does not validate hyper-parameters in __init__. Validation is now handled in .fit() and .fit_transform(). 21954 by iofall <iofall> and Arisa Y. <arisayosh>.
Defines __sklearn_is_fitted__ in pipeline.FeatureUnion to return correct result with utils.validation.check_is_fitted. 22953 by randomgeek78 <randomgeek78>.

`sklearn.preprocessing`

preprocessing.OneHotEncoder now supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with min_frequency or max_categories. 16018 by Thomas Fan.
Adds a subsample parameter to preprocessing.KBinsDiscretizer. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when strategy is set to quantile. 21445 by Felipe Bidu <fbidu> and Amanda Dsouza <amy12xx>.
Adds encoded_missing_value to preprocessing.OrdinalEncoder to configure the encoded value for missing data. 21988 by Thomas Fan.
Added the get_feature_names_out method and a new parameter feature_names_out to preprocessing.FunctionTransformer. You can set feature_names_out to 'one-to-one' to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If feature_names_out is None (which is the default), then get_output_feature_names is not defined. 21569 by Aurélien Geron <ageron>.
Adds get_feature_names_out to preprocessing.Normalizer, preprocessing.KernelCenterer, preprocessing.OrdinalEncoder, and preprocessing.Binarizer. 21079 by Thomas Fan.
preprocessing.PowerTransformer with method='yeo-johnson' better supports significantly non-Gaussian data when searching for an optimal lambda. 20653 by Thomas Fan.
preprocessing.LabelBinarizer now validates input parameters in fit instead of __init__. 21434 by Krum Arnaudov <krumeto>.
preprocessing.FunctionTransformer with check_inverse=True now provides informative error message when input has mixed dtypes. 19916 by Zhehao Liu <MaxwellLZH>.
preprocessing.KBinsDiscretizer handles bin edges more consistently now. 14975 by Andreas Müller and 22526 by Meekail Zain <micky774>.
Adds preprocessing.KBinsDiscretizer.get_feature_names_out support when encode="ordinal". 22735 by Thomas Fan.

`sklearn.random_projection`

Adds an inverse_transform method and a compute_inverse_transform parameter to random_projection.GaussianRandomProjection and random_projection.SparseRandomProjection. When the parameter is set to True, the pseudo-inverse of the components is computed during fit and stored as inverse_components_. 21701 by Aurélien Geron <ageron>.
random_projection.SparseRandomProjection and random_projection.GaussianRandomProjection preserves dtype for numpy.float32. 22114 by Takeshi Oura <takoika>.
Adds get_feature_names_out to all transformers in the sklearn.random_projection module: random_projection.GaussianRandomProjection and random_projection.SparseRandomProjection. 21330 by Loïc Estève <lesteve>.

`sklearn.svm`

svm.OneClassSVM, svm.NuSVC, svm.NuSVR, svm.SVC and svm.SVR now expose n_iter_, the number of iterations of the libsvm optimization routine. 21408 by Juan Martín Loyola <jmloyola>.
svm.SVR, svm.SVC, svm.NuSVR, svm.OneClassSVM, svm.NuSVC now raise an error when the dual-gap estimation produce non-finite parameter weights. 22149 by Christian Ritter <chritter> and Norbert Preining <norbusan>.
svm.NuSVC, svm.NuSVR, svm.SVC, svm.SVR, svm.OneClassSVM now validate input parameters in fit instead of __init__. 21436 by Haidar Almubarak <Haidar13 >.

`sklearn.tree`

tree.DecisionTreeClassifier and tree.ExtraTreeClassifier have the new criterion="log_loss", which is equivalent to criterion="entropy". 23047 by Christian Lorentzen <lorentzenchr>.
Fix a bug in the Poisson splitting criterion for tree.DecisionTreeRegressor. 22191 by Christian Lorentzen <lorentzenchr>.
Changed the default value of max_features to 1.0 for tree.ExtraTreeRegressor and to "sqrt" for tree.ExtraTreeClassifier, which will not change the fit result. The original default value "auto" has been deprecated and will be removed in version 1.3. Setting max_features to "auto" is also deprecated for tree.DecisionTreeClassifier and tree.DecisionTreeRegressor. 22476 by Zhehao Liu <MaxwellLZH>.

`sklearn.utils`

utils.check_array and utils.multiclass.type_of_target now accept an input_name parameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). 21219 by Olivier Grisel <ogrisel>.
utils.check_array returns a float ndarray with np.nan when passed a Float32 or Float64 pandas extension array with pd.NA. 21278 by Thomas Fan.
utils.estimator_html_repr shows a more helpful error message when running in a jupyter notebook that is not trusted. 21316 by Thomas Fan.
utils.estimator_html_repr displays an arrow on the top left corner of the HTML representation to show how the elements are clickable. 21298 by Thomas Fan.
utils.check_array with dtype=None returns numeric arrays when passed in a pandas DataFrame with mixed dtypes. dtype="numeric" will also make better infer the dtype when the DataFrame has mixed dtypes. 22237 by Thomas Fan.
utils.check_scalar now has better messages when displaying the type. 22218 by Thomas Fan.
Changes the error message of the ValidationError raised by utils.check_X_y when y is None so that it is compatible with the check_requires_y_none estimator check. 22578 by Claudio Salvatore Arcidiacono <ClaudioSalvatoreArcidiacono>.
utils.class_weight.compute_class_weight now only requires that all classes in y have a weight in class_weight. An error is still raised when a class is present in y but not in class_weight. 22595 by Thomas Fan.
utils.estimator_html_repr has an improved visualization for nested meta-estimators. 21310 by Thomas Fan.
utils.check_scalar raises an error when include_boundaries={"left", "right"} and the boundaries are not set. 22027 by Marie Lanternier <mlant>.
utils.metaestimators.available_if correctly returns a bounded method that can be pickled. 23077 by Thomas Fan.
utils.estimator_checks.check_estimator's argument is now called estimator (previous name was Estimator). 22188 by Mathurin Massias <mathurinm>.
utils.metaestimators.if_delegate_has_method is deprecated and will be removed in version 1.3. Use utils.metaestimators.available_if instead. 22830 by Jérémie du Boisberranger <jeremiedbb>.

Code and Documentation Contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.0, including:

2357juan, Abhishek Gupta, adamgonzo, Adam Li, adijohar, Aditya Kumawat, Aditya Raghuwanshi, Aditya Singh, Adrian Trujillo Duron, Adrin Jalali, ahmadjubair33, AJ Druck, aj-white, Alan Peixinho, Alberto Mario Ceballos-Arroyo, Alek Lefebvre, Alex, Alexandre Gramfort, alexanmv, almeidayoel, Amanda Dsouza, Aman Sharma, Amar pratap singh, Amit, amrcode, András Simon, Andreas Mueller, Andrew Knyazev, Andriy, Angus L'Herrou, Ankit Sharma, Anne Ducout, Arisa, Arth, arthurmello, Arturo Amor, ArturoAmor, Atharva Patil, aufarkari, Aurélien Geron, avm19, Ayan Bag, baam, Behrouz B, Ben3940, Benjamin Bossan, Bharat Raghunathan, Bijil Subhash, bmreiniger, Brandon Truth, Brenden Kadota, Brian Sun, cdrig, Chalmer Lowe, Chiara Marmo, Chitteti Srinath Reddy, Chloe-Agathe Azencott, Christian Lorentzen, Christian Ritter, christopherlim98, Christoph T. Weidemann, Christos Aridas, Claudio Salvatore Arcidiacono, combscCode, Daniela Fernandes, Dave Eargle, David Poznik, Dea María Léon, Dennis Osei, DessyVV, Dev514, Dimitri Papadopoulos Orfanos, Diwakar Gupta, Dr. Felix M. Riese, drskd, Emiko Sano, Emmanouil Gionanidis, EricEllwanger, Erich Schubert, Eric Larson, Eric Ndirangu, Estefania Barreto-Ojeda, eyast, Fatima GASMI, Federico Luna, Felix Glushchenkov, fkaren27, Fortune Uwha, FPGAwesome, francoisgoupil, Frans Larsson, Gabor Berei, Gabor Kertesz, Gabriel Stefanini Vicente, Gabriel S Vicente, Gael Varoquaux, GAURAV CHOUDHARY, Gauthier I, genvalen, Geoffrey-Paris, Giancarlo Pablo, glennfrutiz, gpapadok, Guillaume Lemaitre, Guillermo Tomás Fernández Martín, Gustavo Oliveira, Haidar Almubarak, Hannah Bohle, Haoyin Xu, Haya, Helder Geovane Gomes de Lima, henrymooresc, Hideaki Imamura, Himanshu Kumar, Hind-M, hmasdev, hvassard, i-aki-y, iasoon, Inclusive Coding Bot, Ingela, iofall, Ishan Kumar, Jack Liu, Jake Cowton, jalexand3r, J Alexander, Jauhar, Jaya Surya Kommireddy, Jay Stanley, Jeff Hale, je-kr, JElfner, Jenny Vo, Jérémie du Boisberranger, Jihane, Jirka Borovec, Joel Nothman, Jon Haitz Legarreta Gorroño, Jordan Silke, Jorge Ciprián, Jorge Loayza, Joseph Chazalon, Joseph Schwartz-Messing, JSchuerz, Juan Carlos Alfaro Jiménez, Juan Martin Loyola, Julien Jerphanion, katotten, Kaushik Roy Chowdhury, Ken4git, kernc, Kevin Doucet, KimAYoung, Koushik Joshi, Kranthi Sedamaki, krumetoft, lesnee, Long Bao, Logan Thomas, Loic Esteve, Louis Wagner, LucieClair, Lucy Liu, Luiz Eduardo Amaral, Magali, MaggieChege, Mai, mandjevant, Mandy Gu, Manimaran, MarcoM, Maren Westermann, Maria Boerner, MarieS-WiMLDS, Martel Corentin, mathurinm, Matías, matjansen, Matteo Francia, Maxwell, Max Baak, Meekail Zain, Megabyte, Mehrdad Moradizadeh, melemo2, Michael I Chen, michalkrawczyk, Micky774, milana2, millawell, Ming-Yang Ho, Mitzi, miwojc, Mizuki, mlant, Mohamed Haseeb, Mohit Sharma, Moonkyung94, mpoemsl, MrinalTyagi, Mr. Leu, msabatier, murata-yu, N, Nadirhan Şahin, NartayXD, nastegiano, nathansquan, nat-salt, Nicki Skafte Detlefsen, Nicolas Hug, Niket Jain, Nikhil Suresh, Nikita Titov, Nikolay Kondratyev, Ohad Michel, Oleksandr Husak, Olivier Grisel, partev, Patrick Ferreira, Paul, pelennor, PierreAttard, Pieter Gijsbers, Pinky, poloso, Pramod Anantharam, puhuk, Purna Chandra Mansingh, QuadV, Rahil Parikh, Randall Boyes, randomgeek78, Raz Hoshia, Reshama Shaikh, Ricardo Ferreira, Richard Taylor, Rileran, Rishabh, Robin Thibaut, Roman Feldbauer, Roman Yurchak, Ross Barnowski, rsnegrin, Sachin Yadav, sakinaOuisrani, Sam Adam Day, Sanjay Marreddi, Sebastian Pujalte, SEELE, Seyedsaman (Sam) Emami, ShanDeng123, Shao Yang Hong, sharmadharmpal, shaymerNaturalint, Shubhraneel Pal, siavrez, slishak, Smile, spikebh, sply88, Stéphane Collot, Sultan Orazbayev, Sumit Saha, Sven Eschlbeck, Swapnil Jha, Sylvain Marié, Takeshi Oura, Tamires Santana, Tenavi, teunpe, Theis Ferré Hjortkjær, Thiruvenkadam, Thomas J. Fan, t-jakubek, Tom Dupré la Tour, TONY GEORGE, Tyler Martin, Tyler Reddy, Udit Gupta, Ugo Marchand, Varun Agrawal, Venkatachalam N, Vera Komeyer, victoirelouis, Vikas Vishwakarma, Vikrant khedkar, Vladimir Chernyy, Vladimir Kim, WeijiaDu, Xiao Yuan, Yar Khine Phyo, Ying Xiong, yiyangq, Yosshi999, Yuki Koyama, Zach Deane-Mayer, Zeel B Patel, zempleni, zhenfisher, 赵丰 (Zhao Feng)

Files

v1.1.rst

Latest commit

History

v1.1.rst

File metadata and controls

Version 1.1.1

Changelog

sklearn.datasets

sklearn.decomposition

sklearn.feature_selection

sklearn.metrics

sklearn.preprocessing

sklearn.tree

sklearn.utils

Version 1.1.0

Minimal dependencies

Changed models

Changelog

sklearn.calibration

sklearn.cluster

sklearn.compose

sklearn.covariance

sklearn.cross_decomposition

sklearn.datasets

sklearn.decomposition

sklearn.discriminant_analysis

sklearn.dummy

sklearn.ensemble

sklearn.feature_extraction

sklearn.feature_selection

sklearn.gaussian_process

sklearn.impute

sklearn.inspection

sklearn.isotonic

sklearn.kernel_approximation

sklearn.linear_model

sklearn.manifold

sklearn.metrics

sklearn.mixture

sklearn.model_selection

sklearn.multiclass

sklearn.neighbors

sklearn.neural_network

sklearn.pipeline

sklearn.preprocessing

sklearn.random_projection

sklearn.svm

sklearn.tree

sklearn.utils

Code and Documentation Contributors

`sklearn.datasets`

`sklearn.decomposition`

`sklearn.feature_selection`

`sklearn.metrics`

`sklearn.preprocessing`

`sklearn.tree`

`sklearn.utils`

`sklearn.calibration`

`sklearn.cluster`

`sklearn.compose`

`sklearn.covariance`

`sklearn.cross_decomposition`

`sklearn.datasets`

`sklearn.decomposition`

`sklearn.discriminant_analysis`

`sklearn.dummy`

`sklearn.ensemble`

`sklearn.feature_extraction`

`sklearn.feature_selection`

`sklearn.gaussian_process`

`sklearn.impute`

`sklearn.inspection`

`sklearn.isotonic`

`sklearn.kernel_approximation`

`sklearn.linear_model`

`sklearn.manifold`

`sklearn.metrics`

`sklearn.mixture`

`sklearn.model_selection`

`sklearn.multiclass`

`sklearn.neighbors`

`sklearn.neural_network`

`sklearn.pipeline`

`sklearn.preprocessing`

`sklearn.random_projection`

`sklearn.svm`

`sklearn.tree`

`sklearn.utils`