sklearn
In Development
Put the changes in their relevant module.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
decomposition.KernelPCA
behaviour is now more consistent between 32-bits and 64-bits data when the kernel has small positive eigenvalues.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
calibration.CalibratedClassifierCV.fit
now supports parallelization via joblib.Parallel using argument n_jobs.17107
byJulien Jerphanion <jjerphan>
.
- Allow
calibrator.CalibratedClassifierCV
use with prefitpipeline.Pipeline
where data is not X is not array-like, sparse matrix or dataframe at the start.17546
byLucy Liu <lucyleeow>
.
cluster.AgglomerativeClustering
has a new parameter compute_distances. When set to True, distances between clusters are computed and stored in the distances_ attribute even when the parameter distance_threshold is not used. This new parameter is useful to produce dendrogram visualizations, but introduces a computational and memory overhead.17984
byMichael Riedmann <mriedmann>
,Emilie Delattre <EmilieDel>
, andFrancesco Casalegno <FrancescoCasalegno>
.cluster.SpectralClustering
andcluster.spectral_clustering
have a new keyword argument verbose. When set to True, additional messages will be displayed which can aid with debugging.18052
bySean O. Stalley <sstalley>
.cluster.MiniBatchKMeans
attributes, counts_ and init_size_, are deprecated and will be removed in 0.26.17864
byJérémie du Boisberranger <jeremiedbb>
.
compose.ColumnTransformer
will skip transformers the column selector is a list of bools that are False.17616
by Thomas Fan.
- Deprecates cv_alphas_ in favor of cv_results['alphas'] and grid_scores_ in favor of split scores in cv_results_ in
covariance.GraphicalLassoCV
. cv_alphas_ and grid_scores_ will be removed in version 0.26.16392
by Thomas Fan.
datasets.fetch_openml
now allows argument as_frame to be 'auto', which tries to convert returned data to pandas DataFrame unless data is sparse.17396
byJiaxiang <fujiaxiang>
.datasets.fetch_openml
now validates md5checksum of arff files downloaded or cached to ensure data integrity.14800
byShashank Singh <shashanksingh28>
and Joel Nothman.datasets.fetch_covtype
now now supports the optional argument as_frame; when it is set to True, the returned Bunch object's data and frame members are pandas DataFrames, and the target member is a pandas Series.17491
byAlex Liang <tianchuliang>
.- The default value of as_frame in
datasets.fetch_openml
is changed from False to 'auto'.17610
byJiaxiang <fujiaxiang>
.
decomposition.KernelPCA
behaviour is now more consistent between 32-bits and 64-bits data input when the kernel has small positive eigenvalues. Small positive eigenvalues were not correctly discarded for 32-bits data.18149
bySylvain Marié <smarie>
.- Fix
decomposition.SparseCoder
such that it follows scikit-learn API and support cloning. The attribute components_ is deprecated in 0.24 and will be removed in 0.26. This attribute was redundant with the dictionary attribute and constructor parameter.17679
byXavier Dupré <sdpython>
. decomposition.FactorAnalysis
now supports the optional argument rotation, which can take the value None, 'varimax' or 'quartimax'.11064
byJona Sassenhagen <jona-sassenhagen>
.decomposition.NMF
now supports the optional parameter regularization, which can take the values None, components, transformation or both, in accordance withdecomposition.NMF.non_negative_factorization
.17414
byBharat Raghunathan <Bharat123rox>
.
ensemble.HistGradientBoostingRegressor
andensemble.HistGradientClassifier
now support staged_predict, which allows monitoring of each stage.16985
byHao Chun Chang <haochunchang>
.- : The parameter
n_classes_
is now deprecated inensemble.GradientBoostingRegressor
and returns 1.17702
bySimona Maggio <simonamaggio>
.
exceptions.ChangedBehaviorWarning
andexceptions.NonBLASDotWarning
are deprecated and will be removed in v0.26,17804
by Adrin Jalali.
feature_extraction.DictVectorizer
accepts multiple values for one categorical feature.17367
byPeng Yu <yupbank>
andChiara Marmo <cmarmo>
- A new parameter importance_getter was added to
feature_selection.RFE
,feature_selection.RFECV
andfeature_selection.SelectFromModel
, allowing the user to specify an attribute name/path or a callable for extracting feature importance from the estimator.15361
byVenkatachalam N <venkyyuvy>
- Added the option for the number of n_features_to_select to be given as a float representing the percentage of features to select.
17090
byLisa Schwetlick <lschwetlick>
andMarija Vlajic Wheeler <marijavlajic>
. - Reduce memory footprint in
feature_selection.mutual_info_classif
andfeature_selection.mutual_info_regression
by callingneighbors.KDTree
for counting nearest neighbors.17878
byNoel Rogers <noelano>
- A new method
gaussian_process.Kernel._check_bounds_params
is called after fitting a Gaussian Process and raises aConvergenceWarning
if the bounds of the hyperparameters are too tight.12638
bySylvain Lannuzel <SylvainLan>
- replace the default values in
impute.IterativeImputer
of min_value and max_value parameters to -np.inf and np.inf, respectively instead of None. However, the behaviour of the class does not change since None was defaulting to these values already.16493
byDarshan N <DarshanGowda0>
. impute.SimpleImputer
now supports a list of strings whenstrategy='most_frequent'
orstrategy='constant'
.17526
byAyako YAGI <yagi-3>
andJuan Carlos Alfaro Jiménez <alfaro96>
.impute.SimpleImputer
now supportsinverse_transform
functionality to revert imputed data to original when instantiated with add_indicator=True.17612
bySrimukh Sripada <d3b0unce>
inspection.partial_dependence
andinspection.plot_partial_dependence
now support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by thekind
parameter.16619
byMadhura Jayratne <madhuracj>
.- Add sample_weight parameter to
inspection.permutation_importance
.16906
byRoei Kahny <RoeiKa>
.
- Expose fitted attributes
X_thresholds_
andy_thresholds_
that hold the de-duplicated interpolation thresholds of anisotonic.IsotonicRegression
instance for model inspection purpose.16289
byMasashi Kishimoto <kishimoto-banana>
andOlivier Grisel <ogrisel>
. isotonic.IsotonicRegression
now accepts 2darray with 1 feature as input array.17379
byJiaxiang <fujiaxiang>
.
linear_model.LinearRegression
now forces coefficients to be all positive whenpositive
is set toTrue
.17578
byJoseph Knox <jknox13>
,Nelle Varoquaux <NelleV>
andChiara Marmo <cmarmo>
.linear_model.RidgeCV
now supports finding an optimal regularization value alpha for each target separately by settingalpha_per_target=True
. This is only supported when using the default efficient leave-one-out cross-validation schemecv=None
.6624
byMarijn van Vliet <wmvanvliet>
.
- Add square_distances parameter to
manifold.TSNE
, which provides backward compatibility during deprecation of legacy squaring behavior. Distances will be squared by default in 0.26, and this parameter will be removed in 0.28.17662
byJoshua Newton <joshuacwnewton>
. - Fixed
10493
. Improve Local Linear Embedding (LLE) that raised MemoryError exception when used with large inputs.17997
byBertrand Maisonneuve <bmaisonn>
.
- Added
metrics.detection_error_tradeoff_curve
to compute Detection Error Tradeoff curve classification metric.10591
byJeremy Karnowski <jkarnows>
andDaniel Mohns <dmohns>
. - Added
metrics.mean_absolute_percentage_error
metric and the associated scorer for regression problems.10708
fixed with the PR15007
byAshutosh Hathidara <ashutosh1919>
. The scorer and some practical test cases were taken from PR10711
byMohamed Ali Jamaoui <mohamed-ali>
. - Fixed a bug in
metrics.classification_report
which was raising AttributeError when called with output_dict=True for 0-length values.17777
byShubhanshu Mishra <napsternxg>
- Add sample_weight parameter to
metrics.median_absolute_error
.17225
byLucy Liu <lucyleeow>
. - Add pos_label parameter in
metrics.plot_precision_recall_curve
in order to specify the positive class to be used when computing the precision and recall statistics.17569
byGuillaume Lemaitre <glemaitre>
. metrics.plot_confusion_matrix
now supports making colorbar optional in the matplotlib plot by setting colorbar=False.17192
byAvi Gupta <avigupta2612>
- Add pos_label parameter in
metrics.plot_roc_curve
in order to specify the positive class to be used when computing the roc auc statistics.17651
byClara Matos <claramatos>
.
model_selection.TimeSeriesSplit
has two new keyword arguments test_size and gap. test_size allows the out-of-sample time series length to be fixed for all folds. gap removes a fixed number of samples between the train and test set on each fold.13204
byKyle Kosic <kykosic>
.model_selection.RandomizedSearchCV
andmodel_selection.GridSearchCV
now have the method,score_samples
17478
byTeon Brooks <teonbrooks>
andMohamed Maskani <maskani-moh>
.
- A fix to allow
multiclass.OutputCodeClassifier
to accept sparse input data in its fit and predict methods. The check for validity of the input is now delegated to the base estimator.17233
byZolisa Bleki <zoj613>
.
- : The attributes
coef_
andintercept_
are now deprecated innaive_bayes.MultinomialNB
,naive_bayes.ComplementNB
,naive_bayes.BernoulliNB
andnaive_bayes.CategoricalNB
, and will be removed in v0.26.17427
byJuan Carlos Alfaro Jiménez <alfaro96>
.
- Speed up
seuclidean
,wminkowski
,mahalanobis
andhaversine
metrics inneighbors.DistanceMetric
by avoiding unexpected GIL acquiring in Cython when settingn_jobs>1
inneighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
,metrics.pairwise_distances
and by validating data out of loops.17038
byWenbo Zhao <webber26232>
. neighbors.NeighborsBase
benefits of an improved algorithm = 'auto' heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15, brute is selected, assuming the data intrinsic dimensionality is too high for tree-based methods.17148
byGeoffrey Bolmier <gbolmier>
.
- Neural net training and prediction are now a little faster.
17603
,17604
,17606
,17608
,17609
,17633
,17661
,17932
byAlex Henrie <alexhenrie>
. - Avoid converting float32 input to float64 in
neural_network.BernoulliRBM
.16352
byArthur Imbert <Henley13>
. - Support 32-bit computations in
neural_network.MLPClassifier
andneural_network.MLPRegressor
.17759
bySrimukh Sripada <d3b0unce>
.
- Add a new
handle_unknown
parameter with ause_encoded_value
option, along with a newunknown_value
parameter, topreprocessing.OrdinalEncoder
to allow unknown categories during transform and set the encoded value of the unknown categories.17406
byFelix Wick <FelixWick>
. - Add
clip
parameter topreprocessing.MinMaxScaler
, which clips the transformed values of test data tofeature_range
.17833
byYashika Sharma <yashika51>
. - Verbose output of
model_selection.GridSearchCV
has been improved for readability.16935
byRaghav Rajagopalan <raghavrv>
andChiara Marmo <cmarmo>
. - Add
unit_variance
topreprocessing.RobustScaler
, which scales output data such that normally distributed features have a variance of 1.17193
byLucy Liu <lucyleeow>
andMabel Villalba <mabelvj>
. - Add dtype parameter to
preprocessing.KBinsDiscretizer
.16335
byArthur Imbert <Henley13>
.
- invoke scipy blas api for svm kernel function in
fit
,predict
and related methods ofsvm.SVC
,svm.NuSVC
,svm.SVR
,svm.NuSVR
,OneClassSVM
.16530
byShuhua Fan <jim0421>
.
tree.plot_tree
now uses colors from the matplotlib configuration settings.17187
by Andreas Müller.- : The parameter
X_idx_sorted
is now deprecated intree.DecisionTreeClassifier.fit
andtree.DecisionTreeRegressor.fit
, and has not effect.17614
byJuan Carlos Alfaro Jiménez <alfaro96>
. - Allow serialized tree based models to be unpickled on a machine with different endianness.
17644
byQi Zhang <qzhang90>
.
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.20, including: