.. currentmodule:: sklearn
In Development
Version 1.1.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.3+.
Put the changes in their relevant module.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
- |Efficiency| :class:`cluster.KMeans` now defaults to
algorithm="lloyd"
instead ofalgorithm="auto"
, which was equivalent toalgorithm="elkan"
. Lloyd's algorithm and Elkan's algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd's algorithm uses much less memory, and it is often faster.
- |Enhancement| All scikit-learn models now generate a more informative error message when some input contains unexpected NaN or infinite values. In particular the message contains the input name ("X", "y" or "sample_weight") and if an unexpected NaN value is found in X, the error message suggests potential solutions. :pr:`21219` by :user:`Olivier Grisel <ogrisel>`.
- |Enhancement| All scikit-learn models now generate a more informative error message when setting invalid hyper-parameters with set_params. :pr:`21542` by :user:`Olivier Grisel <ogrisel>`.
- |Enhancement| :func:`calibration.calibration_curve` accepts a parameter pos_label to specify the positive class label. :pr:`21032` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Enhancement| :class:`CalibrationDisplay` accepts a parameter pos_label to add this information to the plot. :pr:`21038` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Enhancement| :class:`cluster.SpectralClustering` and :func:`cluster.spectral` now include the new 'cluster_qr' method from :func:`cluster.cluster_qr` that clusters samples in the embedding space as an alternative to the existing 'kmeans' and 'discrete' methods. See :func:`cluster.spectral_clustering` for more details. :pr:`21148` by :user:`Andrew Knyazev <lobpcg>`
- |Efficiency| In :class:`cluster.KMeans`, the default
algorithm
is now"lloyd"
which is the full classical EM-style algorithm. Both"auto"
and"full"
are deprecated and will be removed in version 1.3. They are now aliases for"lloyd"
. The previous default was"auto"
, which relied on Elkan's algorithm. Lloyd's algorithm uses less memory than Elkan's, it is faster on many datasets, and its results are identical, hence the change. :pr:`21735` by :user:`Aurélien Geron <ageron>`.
- |Enhancement| :func:`cross_decomposition._PLS.inverse_transform` now allows reconstruction of a X target when a Y parameter is given. :pr:`19680` by :user:`Robin Thibaut <robinthibaut>`.
- |Enhancement| :func:`datasets.make_swiss_roll` now supports the optional argument hole; when set to True, it returns the swiss-hole dataset. :pr:`21482` by :user:`Sebastian Pujalte <pujaltes>`.
- |Enhancement| :class:`decomposition.PCA` exposes a parameter n_oversamples to tune :func:`sklearn.decomposition.randomized_svd` and get accurate results when the number of features is large. :pr:`21109` by :user:`Smile <x-shadow-man>`.
- |Fix| :class:`decomposition.FastICA` now validates input parameters in fit instead of __init__. :pr:`21432` by :user:`Hannah Bohle <hhnnhh>` and :user:`Maren Westermann <marenwestermann>`.
- |Fix| :class:`decomposition.FactorAnalysis` now validates input parameters in fit instead of __init__. :pr:`21713` by :user:`Haya <HayaAlmutairi>` and :user:`Krum Arnaudov <krumeto>`.
- |Fix| :class:`decomposition.KernelPCA` now validates input parameters in fit instead of __init__. :pr:`21567` by :user:`Maggie Chege <MaggieChege>`.
- |API| Adds :term:`get_feature_names_out` to all transformers in the :mod:`~sklearn.decomposition` module: :class:`~sklearn.decomposition.DictionaryLearning`, :class:`~sklearn.decomposition.FactorAnalysis`, :class:`~sklearn.decomposition.FastICA`, :class:`~sklearn.decomposition.IncrementalPCA`, :class:`~sklearn.decomposition.KernelPCA`, :class:`~sklearn.decomposition.LatentDirichletAllocation`, :class:`~sklearn.decomposition.MiniBatchDictionaryLearning`, :class:`~sklearn.decomposition.MiniBatchSparsePCA`, :class:`~sklearn.decomposition.NMF`, :class:`~sklearn.decomposition.PCA`, :class:`~sklearn.decomposition.SparsePCA`, and :class:`~sklearn.decomposition.TruncatedSVD`. :pr:`21334` by `Thomas Fan`_.
- |API| :func:`decomposition.FastICA` now supports unit variance for whitening. The default value of its whiten argument will change from True (which behaves like 'arbitrary-variance') to 'unit-variance' in version 1.3. :pr:`19490` by :user:`Facundo Ferrin <fferrin>` and :user:`Julien Jerphanion <jjerphan>`
- |Fix| :class:`ensemble.RandomForestClassifier`,
:class:`ensemble.RandomForestRegressor`,
:class:`ensemble.ExtraTreesClassifier`, :class:`ensemble.ExtraTreesRegressor`,
and :class:`ensemble.RandomTreesEmbedding` now raise a
ValueError
whenbootstrap=False
andmax_samples
is notNone
. :pr:`21295` :user:`Haoyin Xu <PSSF23>`. - |API| Changed the default of :func:`max_features` to 1.0 for :class:`ensemble.RandomForestRegressor` and to "sqrt" for :class:`ensemble.RandomForestClassifier`. Note that these give the same fit results as before, but are much easier to understand. The old default value "auto" has been deprecated and will be removed in version 1.3. The same changes are also applied for :class:`ensemble.ExtraTreesRegressor` and :class:`ensemble.ExtraTreesClassifier`. :pr:`20803` by :user:`Brian Sun <bsun94>`.
- |Enhancement| Added support for pd.NA in :class:`SimpleImputer`. :pr:`21114` by :user:`Ying Xiong <yxiong>`.
- |API| Adds :meth:`get_feature_names_out` to :class:`impute.SimpleImputer`, :class:`impute.KNNImputer`, :class:`impute.IterativeImputer`, and :class:`impute.MissingIndicator`. :pr:`21078` by `Thomas Fan`_.
- |API| The verbose parameter was deprecated for :class:`impute.SimpleImputer`. A warning will always be raised upon the removal of empty columns. :pr:`21448` by :user:`Oleh Kozynets <OlehKSS>` and :user:`Christian Ritter <chritter>`.
- |Enhancement| :class:`SimpleImputer` now warns with feature names when features which are skipped due to the lack of any observed values in the training set. :pr:`21617` by :user:`Christian Ritter <chritter>`.
- |Fix| Fix a bug in :class:`linear_model.RidgeClassifierCV` where the method predict was performing an argmax on the scores obtained from decision_function instead of returning the multilabel indicator matrix. :pr:`19869` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Enhancement| :class:`linear_model.RidgeClassifier` is now supporting multilabel classification. :pr:`19689` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Enhancement| :class:`linear_model.RidgeCV` and :class:`linear_model.RidgeClassifierCV` now raise consistent error message when passed invalid values for alphas. :pr:`21606` by :user:`Arturo Amor <ArturoAmorQ>`.
- |Enhancement| :class:`linear_model.Ridge` and :class:`linear_model.RidgeClassifier` now raise consistent error message when passed invalid values for alpha, max_iter and tol. :pr:`21341` by :user:`Arturo Amor <ArturoAmorQ>`.
- |API| :class:`linear_model.LassoLarsIC` now exposes noise_variance as a parameter in order to provide an estimate of the noise variance. This is particularly relevant when n_features > n_samples and the estimator of the noise variance cannot be computed. :pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>`
- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC and BIC. An error is now raised when n_features > n_samples and when the noise variance is not provided. :pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and :user:`Andrés Babino <ababino>`.
- |API| :class:`metrics.DistanceMetric` has been moved from :mod:`sklearn.neighbors` to :mod:`sklearn.metric`. Using neighbors.DistanceMetric for imports is still valid for backward compatibility, but this alias will be removed in 1.3. :pr:`21177` by :user:`Julien Jerphanion <jjerphan>`.
- |API| Parameters
sample_weight
andmultioutput
of :func:`metrics. mean_absolute_percentage_error` are now keyword-only, in accordance with SLEP009 <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>. A deprecation cycle was introduced. :pr:`21576` by :user:`Paul-Emile Dugnat <pedugnat>`.
- |Enhancement| :func:`manifold.spectral_embedding` and :class:`manifold.SpectralEmbedding` supports np.float32 dtype and will preserve this dtype. :pr:`21534` by :user:`Andrew Knyazev <lobpcg>`.
- |Enhancement| raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. :pr:`21026` by :user:`Loïc Estève <lesteve>`.
- |Enhancement| Added support for "passthrough" in :class:`FeatureUnion`. Setting a transformer to "passthrough" will pass the features unchanged. :pr:`20860` by :user:`Shubhraneel Pal <shubhraneel>`.
- |Enhancement| Added :func:`partial_fit` to :class:`tree.DecisionTreeClassifier` and :class:`tree.ExtraTreeClassifier`. :pr:`18889` by :user:`Haoyin Xu <PSSF23>`.
- |Enhancement| Adds a subsample parameter to :class:`preprocessing.KBinsDiscretizer`. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when strategy is set to quantile. :pr:`21445` by :user:`Felipe Bidu <fbidu>` and :user:`Amanda Dsouza <amy12xx>`.
- |Fix| :class:`preprocessing.LabelBinarizer` now validates input parameters in fit instead of __init__. :pr:`21434` by :user:`Krum Arnaudov <krumeto>`.
- |Enhancement| Added the get_feature_names_out method and a new parameter feature_names_out to :class:`preprocessing.FunctionTransformer`. You can set feature_names_out to 'one-to-one' to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If feature_names_out is None (which is the default), then get_output_feature_names is not defined. :pr:`21569` by :user:`Aurélien Geron <ageron>`.
- |Fix| :class:`smv.NuSVC`, :class:`svm.NuSVR`, :class:`svm.SVC`, :class:`svm.SVR`, :class:`svm.OneClassSVM` now validate input parameters in fit instead of __init__. :pr:`21436` by :user:`Haidar Almubarak <Haidar13 >`.
- |Enhancement| :func:`utils.estimator_html_repr` shows a more helpful error message when running in a jupyter notebook that is not trusted. :pr:`21316` by `Thomas Fan`_.
- |Enhancement| :func:`utils.estimator_html_repr` displays an arrow on the top left corner of the HTML representation to show how the elements are clickable. :pr:`21298` by `Thomas Fan`_.
- |Fix| :class:`neighbors.KernelDensity` now validates input parameters in fit instead of __init__. :pr:`21430` by :user:`Desislava Vasileva <DessyVV>` and :user:`Lucy Jimenez <LucyJimenez>`.
- |Enhancement| utils.validation.check_array and utils.validation.type_of_target now accept an input_name parameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). :pr:`21219` by :user:`Olivier Grisel <ogrisel>`.
- |Enhancement| :func:`utils.validation.check_array` returns a float ndarray with np.nan when passed a Float32 or Float64 pandas extension array with pd.NA. :pr:`21278` by `Thomas Fan`_.
- |API| Adds :term:`get_feature_names_out` to all transformers in the :mod:`~sklearn.random_projection` module: :class:`~sklearn.random_projection.GaussianRandomProjection` and :class:`~sklearn.random_projection.SparseRandomProjection`. :pr:`21330` by :user:`Loïc Estève <lesteve>`.
- |Fix| Support loading pickles of decision tree models when the pickle has been generated on a platform with a different bitness. A typical example is to train and pickle the model on 64 bit machine and load the model on a 32 bit machine for prediction. :pr:`21552` by :user:`Loïc Estève <lesteve>`.
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.0, including:
TODO: update at the time of the release.