Skip to content

Latest commit

 

History

History
426 lines (324 loc) · 17.3 KB

v0.24.rst

File metadata and controls

426 lines (324 loc) · 17.3 KB

sklearn

Version 0.24.0

In Development

Put the changes in their relevant module.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • decomposition.KernelPCA behaviour is now more consistent between 32-bits and 64-bits data when the kernel has small positive eigenvalues.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.calibration

  • calibration.CalibratedClassifierCV.fit now supports parallelization via joblib.Parallel using argument n_jobs. 17107 by Julien Jerphanion <jjerphan>.

sklearn.calibrator

  • Allow calibrator.CalibratedClassifierCV use with prefit pipeline.Pipeline where data is not X is not array-like, sparse matrix or dataframe at the start. 17546 by Lucy Liu <lucyleeow>.

sklearn.cluster

  • cluster.AgglomerativeClustering has a new parameter compute_distances. When set to True, distances between clusters are computed and stored in the distances_ attribute even when the parameter distance_threshold is not used. This new parameter is useful to produce dendrogram visualizations, but introduces a computational and memory overhead. 17984 by Michael Riedmann <mriedmann>, Emilie Delattre <EmilieDel>, and Francesco Casalegno <FrancescoCasalegno>.
  • cluster.SpectralClustering and cluster.spectral_clustering have a new keyword argument verbose. When set to True, additional messages will be displayed which can aid with debugging. 18052 by Sean O. Stalley <sstalley>.
  • cluster.MiniBatchKMeans attributes, counts_ and init_size_, are deprecated and will be removed in 0.26. 17864 by Jérémie du Boisberranger <jeremiedbb>.

sklearn.compose

  • compose.ColumnTransformer will skip transformers the column selector is a list of bools that are False. 17616 by Thomas Fan.

sklearn.covariance

  • Deprecates cv_alphas_ in favor of cv_results['alphas'] and grid_scores_ in favor of split scores in cv_results_ in covariance.GraphicalLassoCV. cv_alphas_ and grid_scores_ will be removed in version 0.26. 16392 by Thomas Fan.

sklearn.datasets

  • datasets.fetch_openml now allows argument as_frame to be 'auto', which tries to convert returned data to pandas DataFrame unless data is sparse. 17396 by Jiaxiang <fujiaxiang>.
  • datasets.fetch_openml now validates md5checksum of arff files downloaded or cached to ensure data integrity. 14800 by Shashank Singh <shashanksingh28> and Joel Nothman.
  • datasets.fetch_covtype now now supports the optional argument as_frame; when it is set to True, the returned Bunch object's data and frame members are pandas DataFrames, and the target member is a pandas Series. 17491 by Alex Liang <tianchuliang>.
  • The default value of as_frame in datasets.fetch_openml is changed from False to 'auto'. 17610 by Jiaxiang <fujiaxiang>.

sklearn.decomposition

  • decomposition.KernelPCA behaviour is now more consistent between 32-bits and 64-bits data input when the kernel has small positive eigenvalues. Small positive eigenvalues were not correctly discarded for 32-bits data. 18149 by Sylvain Marié <smarie>.
  • Fix decomposition.SparseCoder such that it follows scikit-learn API and support cloning. The attribute components_ is deprecated in 0.24 and will be removed in 0.26. This attribute was redundant with the dictionary attribute and constructor parameter. 17679 by Xavier Dupré <sdpython>.
  • decomposition.FactorAnalysis now supports the optional argument rotation, which can take the value None, 'varimax' or 'quartimax'. 11064 by Jona Sassenhagen <jona-sassenhagen>.
  • decomposition.NMF now supports the optional parameter regularization, which can take the values None, components, transformation or both, in accordance with decomposition.NMF.non_negative_factorization. 17414 by Bharat Raghunathan <Bharat123rox>.

sklearn.ensemble

  • ensemble.HistGradientBoostingRegressor and ensemble.HistGradientClassifier now support staged_predict, which allows monitoring of each stage. 16985 by Hao Chun Chang <haochunchang>.
  • : The parameter n_classes_ is now deprecated in ensemble.GradientBoostingRegressor and returns 1. 17702 by Simona Maggio <simonamaggio>.

sklearn.exceptions

  • exceptions.ChangedBehaviorWarning and exceptions.NonBLASDotWarning are deprecated and will be removed in v0.26, 17804 by Adrin Jalali.

sklearn.feature_extraction

  • feature_extraction.DictVectorizer accepts multiple values for one categorical feature. 17367 by Peng Yu <yupbank> and Chiara Marmo <cmarmo>

sklearn.feature_selection

  • A new parameter importance_getter was added to feature_selection.RFE, feature_selection.RFECV and feature_selection.SelectFromModel, allowing the user to specify an attribute name/path or a callable for extracting feature importance from the estimator. 15361 by Venkatachalam N <venkyyuvy>
  • Added the option for the number of n_features_to_select to be given as a float representing the percentage of features to select. 17090 by Lisa Schwetlick <lschwetlick> and Marija Vlajic Wheeler <marijavlajic>.
  • Reduce memory footprint in feature_selection.mutual_info_classif and feature_selection.mutual_info_regression by calling neighbors.KDTree for counting nearest neighbors. 17878 by Noel Rogers <noelano>

sklearn.gaussian_process

  • A new method gaussian_process.Kernel._check_bounds_params is called after fitting a Gaussian Process and raises a ConvergenceWarning if the bounds of the hyperparameters are too tight. 12638 by Sylvain Lannuzel <SylvainLan>

sklearn.impute

  • replace the default values in impute.IterativeImputer of min_value and max_value parameters to -np.inf and np.inf, respectively instead of None. However, the behaviour of the class does not change since None was defaulting to these values already. 16493 by Darshan N <DarshanGowda0>.
  • impute.SimpleImputer now supports a list of strings when strategy='most_frequent' or strategy='constant'. 17526 by Ayako YAGI <yagi-3> and Juan Carlos Alfaro Jiménez <alfaro96>.
  • impute.SimpleImputer now supports inverse_transform functionality to revert imputed data to original when instantiated with add_indicator=True. 17612 by Srimukh Sripada <d3b0unce>

sklearn.inspection

  • inspection.partial_dependence and inspection.plot_partial_dependence now support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by the kind parameter. 16619 by Madhura Jayratne <madhuracj>.
  • Add sample_weight parameter to inspection.permutation_importance. 16906 by Roei Kahny <RoeiKa>.

sklearn.isotonic

  • Expose fitted attributes X_thresholds_ and y_thresholds_ that hold the de-duplicated interpolation thresholds of an isotonic.IsotonicRegression instance for model inspection purpose. 16289 by Masashi Kishimoto <kishimoto-banana> and Olivier Grisel <ogrisel>.
  • isotonic.IsotonicRegression now accepts 2darray with 1 feature as input array. 17379 by Jiaxiang <fujiaxiang>.

sklearn.linear_model

  • linear_model.LinearRegression now forces coefficients to be all positive when positive is set to True. 17578 by Joseph Knox <jknox13>, Nelle Varoquaux <NelleV> and Chiara Marmo <cmarmo>.
  • linear_model.RidgeCV now supports finding an optimal regularization value alpha for each target separately by setting alpha_per_target=True. This is only supported when using the default efficient leave-one-out cross-validation scheme cv=None. 6624 by Marijn van Vliet <wmvanvliet>.

sklearn.manifold

  • Add square_distances parameter to manifold.TSNE, which provides backward compatibility during deprecation of legacy squaring behavior. Distances will be squared by default in 0.26, and this parameter will be removed in 0.28. 17662 by Joshua Newton <joshuacwnewton>.
  • Fixed 10493. Improve Local Linear Embedding (LLE) that raised MemoryError exception when used with large inputs. 17997 by Bertrand Maisonneuve <bmaisonn>.

sklearn.metrics

  • Added metrics.detection_error_tradeoff_curve to compute Detection Error Tradeoff curve classification metric. 10591 by Jeremy Karnowski <jkarnows> and Daniel Mohns <dmohns>.
  • Added metrics.mean_absolute_percentage_error metric and the associated scorer for regression problems. 10708 fixed with the PR 15007 by Ashutosh Hathidara <ashutosh1919>. The scorer and some practical test cases were taken from PR 10711 by Mohamed Ali Jamaoui <mohamed-ali>.
  • Fixed a bug in metrics.classification_report which was raising AttributeError when called with output_dict=True for 0-length values. 17777 by Shubhanshu Mishra <napsternxg>
  • Add sample_weight parameter to metrics.median_absolute_error. 17225 by Lucy Liu <lucyleeow>.
  • Add pos_label parameter in metrics.plot_precision_recall_curve in order to specify the positive class to be used when computing the precision and recall statistics. 17569 by Guillaume Lemaitre <glemaitre>.
  • metrics.plot_confusion_matrix now supports making colorbar optional in the matplotlib plot by setting colorbar=False. 17192 by Avi Gupta <avigupta2612>
  • Add pos_label parameter in metrics.plot_roc_curve in order to specify the positive class to be used when computing the roc auc statistics. 17651 by Clara Matos <claramatos>.

sklearn.model_selection

  • model_selection.TimeSeriesSplit has two new keyword arguments test_size and gap. test_size allows the out-of-sample time series length to be fixed for all folds. gap removes a fixed number of samples between the train and test set on each fold. 13204 by Kyle Kosic <kykosic>.
  • model_selection.RandomizedSearchCV and model_selection.GridSearchCV now have the method, score_samples 17478 by Teon Brooks <teonbrooks> and Mohamed Maskani <maskani-moh>.

sklearn.multiclass

  • A fix to allow multiclass.OutputCodeClassifier to accept sparse input data in its fit and predict methods. The check for validity of the input is now delegated to the base estimator. 17233 by Zolisa Bleki <zoj613>.

sklearn.naive_bayes

  • : The attributes coef_ and intercept_ are now deprecated in naive_bayes.MultinomialNB, naive_bayes.ComplementNB, naive_bayes.BernoulliNB and naive_bayes.CategoricalNB, and will be removed in v0.26. 17427 by Juan Carlos Alfaro Jiménez <alfaro96>.

sklearn.neighbors

  • Speed up seuclidean, wminkowski, mahalanobis and haversine metrics in neighbors.DistanceMetric by avoiding unexpected GIL acquiring in Cython when setting n_jobs>1 in neighbors.KNeighborsClassifier, neighbors.KNeighborsRegressor, neighbors.RadiusNeighborsClassifier, neighbors.RadiusNeighborsRegressor, metrics.pairwise_distances and by validating data out of loops. 17038 by Wenbo Zhao <webber26232>.
  • neighbors.NeighborsBase benefits of an improved algorithm = 'auto' heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15, brute is selected, assuming the data intrinsic dimensionality is too high for tree-based methods. 17148 by Geoffrey Bolmier <gbolmier>.

sklearn.neural_network

  • Neural net training and prediction are now a little faster. 17603, 17604, 17606, 17608, 17609, 17633, 17661, 17932 by Alex Henrie <alexhenrie>.
  • Avoid converting float32 input to float64 in neural_network.BernoulliRBM. 16352 by Arthur Imbert <Henley13>.
  • Support 32-bit computations in neural_network.MLPClassifier and neural_network.MLPRegressor. 17759 by Srimukh Sripada <d3b0unce>.

sklearn.preprocessing

  • Add a new handle_unknown parameter with a use_encoded_value option, along with a new unknown_value parameter, to preprocessing.OrdinalEncoder to allow unknown categories during transform and set the encoded value of the unknown categories. 17406 by Felix Wick <FelixWick>.
  • Add clip parameter to preprocessing.MinMaxScaler, which clips the transformed values of test data to feature_range. 17833 by Yashika Sharma <yashika51>.
  • Verbose output of model_selection.GridSearchCV has been improved for readability. 16935 by Raghav Rajagopalan <raghavrv> and Chiara Marmo <cmarmo>.
  • Add unit_variance to preprocessing.RobustScaler, which scales output data such that normally distributed features have a variance of 1. 17193 by Lucy Liu <lucyleeow> and Mabel Villalba <mabelvj>.
  • Add dtype parameter to preprocessing.KBinsDiscretizer. 16335 by Arthur Imbert <Henley13>.

sklearn.svm

  • invoke scipy blas api for svm kernel function in fit, predict and related methods of svm.SVC, svm.NuSVC, svm.SVR, svm.NuSVR, OneClassSVM. 16530 by Shuhua Fan <jim0421>.

sklearn.tree

  • tree.plot_tree now uses colors from the matplotlib configuration settings. 17187 by Andreas Müller.
  • : The parameter X_idx_sorted is now deprecated in tree.DecisionTreeClassifier.fit and tree.DecisionTreeRegressor.fit, and has not effect. 17614 by Juan Carlos Alfaro Jiménez <alfaro96>.
  • Allow serialized tree based models to be unpickled on a machine with different endianness. 17644 by Qi Zhang <qzhang90>.

Code and Documentation Contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.20, including: