Ensure all attributes are documented #14312

amueller · 2019-07-12T15:14:08Z

as discussed in #13385 we need to ensure all attributes are documented.

if you want to work on this, you should pick a specific submodule and fix all the attribute documentation mismatches in that submodule.

Here's a script to find remaining ones (there might be some false positives):

import numpy as np
from sklearn.base import clone
from sklearn.utils.testing import all_estimators
from sklearn.utils.estimator_checks import pairwise_estimator_convert_X, enforce_estimator_tags_y
from numpydoc import docscrape

ests = all_estimators()

for name, Est in ests:
    try:
        estimator_orig = Est()
    except:
        continue
    rng = np.random.RandomState(0)
    X = pairwise_estimator_convert_X(rng.rand(40, 10), estimator_orig)
    X = X.astype(object)
    y = (X[:, 0] * 4).astype(np.int)
    est = clone(estimator_orig)
    y = enforce_estimator_tags_y(est, y)
    try:
        est.fit(X, y)
    except:
        continue
    fitted_attrs = [(x, getattr(est, x, None))
                    for x in est.__dict__.keys() if x.endswith("_")
                    and not x.startswith("_")]
    doc = docscrape.ClassDoc(type(est))
    doc_attributes = []
    incorrect = []
    for att_name, type_definition, param_doc in doc['Attributes']:
        if not type_definition.strip():
            if ':' in att_name and att_name[:att_name.index(':')][-1:].strip():
                incorrect += [name +
                              ' There was no space between the param name and '
                              'colon (%r)' % att_name]
            elif name.rstrip().endswith(':'):
                incorrect += [name +
                              ' Parameter %r has an empty type spec. '
                              'Remove the colon' % (att_name.lstrip())]

        if '*' not in att_name:
            doc_attributes.append(att_name.split(':')[0].strip('` '))
    assert incorrect == []
    fitted_attrs_names = [x[0] for x in fitted_attrs]

    bad = sorted(list(set(fitted_attrs_names) ^ set(doc_attributes)))
    if len(bad) > 0:
        msg = '{}\n'.format(name) + '\n'.join(bad)
        print("Docstring Error: Attribute mismatch in " + msg)

alexitkes · 2019-07-12T20:07:54Z

I have already found at least one mismatch in attribute documentation in NMF class description. I think I can take some of this work. I am almost ready to propose some changes within decomposition and random_projection submodules.

thomasjpfan · 2019-07-13T12:34:35Z

Missing attribute docstrings for each estimator

Reference this issue in your PR

ARDRegression, [intercept_]
AdaBoostClassifier, [base_estimator_]
AdaBoostRegressor, [base_estimator_]
AdditiveChi2Sampler, [sample_interval_]
AgglomerativeClustering, [n_components_] (deprecated)
BaggingClassifier, [n_features_]
BaggingRegressor, [base_estimator_, n_features_]
BayesianGaussianMixture, [mean_precision_prior, mean_precision_prior_]
BayesianRidge, [X_offset_, X_scale_]
BernoulliNB, [coef_, intercept_]
BernoulliRBM, [h_samples_]
Birch, [fit_, partial_fit_]
CCA, [coef_, x_mean_, x_std_, y_mean_, y_std_]
CheckingClassifier, [classes_]
ComplementNB, [coef_, intercept_]
CountVectorizer, [stop_words_, vocabulary_]
DecisionTreeRegressor, [classes_, n_classes_]
DictVectorizer, [feature_names_, vocabulary_]
DummyClassifier, [output_2d_]
DummyRegressor, [output_2d_]
ElasticNet, [dual_gap_]
ElasticNetCV, [dual_gap_]
EllipticEnvelope, [dist_, raw_covariance_, raw_location_, raw_support_]
ExtraTreeClassifier, [feature_importances_]
ExtraTreeRegressor, [classes_, feature_importances_, n_classes_]
ExtraTreesClassifier, [base_estimator_]
ExtraTreesRegressor, [base_estimator_]
FactorAnalysis, [mean_]
FeatureAgglomeration, [n_components_]
GaussianProcessClassifier, [base_estimator_]
GaussianRandomProjection, [components_]
GradientBoostingClassifier, [max_features_, n_classes_, n_features_, oob_improvement_]
GradientBoostingRegressor, [max_features_, n_classes_, n_estimators_, n_features_, oob_improvement_]
HistGradientBoostingClassifier, [bin_mapper_, classes_, do_early_stopping_, loss_, n_features_, scorer_]
HistGradientBoostingRegressor, [bin_mapper_, do_early_stopping_, loss_, n_features_, scorer_]
IncrementalPCA, [batch_size_]
IsolationForest, [base_estimator_, estimators_features_, n_features_]
IsotonicRegression, [X_max_, X_min_, f_]
IterativeImputer, [random_state_]
KNeighborsClassifier, [classes_, effective_metric_, effective_metric_params_, outputs_2d_]
KNeighborsRegressor, [effective_metric_, effective_metric_params_]
KernelCenterer, [K_fit_all_, K_fit_rows_]
KernelDensity, [tree_]
KernelPCA, [X_transformed_fit_, dual_coef_]
LabelBinarizer, [classes_, sparse_input_, y_type_]
LabelEncoder, [classes_]
LarsCV, [active_]
Lasso, [dual_gap_]
LassoLarsCV, [active_]
LassoLarsIC, [alphas_]
LatentDirichletAllocation, [bound_, doc_topic_prior_, exp_dirichlet_component_, random_state_, topic_word_prior_]
LinearDiscriminantAnalysis, [covariance_]
LinearRegression, [rank_, singular_]
LinearSVC, [classes_]
LocalOutlierFactor, [effective_metric_, effective_metric_params_]
MDS, [dissimilarity_matrix_, n_iter_]
MLPClassifier, [best_loss_, loss_curve_, t_]
MLPRegressor, [best_loss_, loss_curve_, t_]
MinMaxScaler, [n_samples_seen_]
MiniBatchDictionaryLearning, [iter_offset_]
MiniBatchKMeans, [counts_, init_size_, n_iter_]
MultiLabelBinarizer, [classes_]
MultiTaskElasticNet, [dual_gap_, eps_, sparse_coef_]
MultiTaskElasticNetCV, [dual_gap_]
MultiTaskLasso, [dual_gap_, eps_, sparse_coef_]
MultiTaskLassoCV, [dual_gap_]
NearestCentroid, [classes_]
NearestNeighbors, [effective_metric_, effective_metric_params_]
NeighborhoodComponentsAnalysis, [random_state_]
NuSVC, [class_weight_, fit_status_, probA_, probB_, shape_fit_]
NuSVR, [class_weight_, fit_status_, n_support_, probA_, probB_, shape_fit_]
OAS, [location_]
OneClassSVM, [class_weight_, fit_status_, n_support_, probA_, probB_, shape_fit_]
OneVsOneClassifier, [n_classes_]
OneVsRestClassifier, [coef_, intercept_, n_classes_]
OrthogonalMatchingPursuit, [n_nonzero_coefs_]
PLSCanonical, [coef_, x_mean_, x_std_, y_mean_, y_std_]
PLSRegression, [x_mean_, x_std_, y_mean_, y_std_]
PLSSVD, [x_mean_, x_std_, y_mean_, y_std_]
PassiveAggressiveClassifier, [loss_function_, t_]
PassiveAggressiveRegressor, [t_]
Perceptron, [loss_function_]
QuadraticDiscriminantAnalysis, [classes_, covariance_]
RBFSampler, [random_offset_, random_weights_]
RFE, [classes_]
RFECV, [classes_]
RadiusNeighborsClassifier, [classes_, effective_metric_, effective_metric_params_, outputs_2d_]
RadiusNeighborsRegressor, [effective_metric_, effective_metric_params_]
RandomForestClassifier, [oob_decision_function_, oob_score_]
RandomForestRegressor, [oob_prediction_, oob_score_]
RandomTreesEmbedding, [base_estimator_, feature_importances_, n_features_, n_outputs_, one_hot_encoder_]
RidgeCV, [cv_values_]
RidgeClassifier, [classes_]
RidgeClassifierCV, [cv_values_]
SGDClassifier, [classes_, t_]
SGDRegressor, [average_coef_, average_intercept_]
SVC, [class_weight_, shape_fit_]
SVR, [class_weight_, fit_status_, n_support_, probA_, probB_, shape_fit_]
SelectKBest, [pvalues_, scores_]
ShrunkCovariance, [shrinkage]
SkewedChi2Sampler, [random_offset_, random_weights_]
SparseRandomProjection, [components_, density_]
SpectralEmbedding, [n_neighbors_]
TfidfVectorizer, [stop_words_, vocabulary_]

mepa · 2019-07-13T16:35:44Z

I can take up the tree submodule attribute documentation mismatches, which includes:

DecisionTreeRegressor, [classes_, n_classes_]
ExtraTreeClassifier, [classes_, max_features_, n_classes_, n_features_, n_outputs_, tree_]
ExtraTreeRegressor, [classes_, max_features_, n_classes_, n_features_, n_outputs_, tree_]

wendyhhu · 2019-07-13T16:39:13Z

I'm working on LinearRegression, [rank_, singular_].

wendyhhu · 2019-07-13T20:35:59Z

I'm working on LinearSVC, [n_iter_] and LinearSVR, [n_iter_]

matsmaiwald · 2019-07-13T23:15:16Z

I'll take up Gradient boosting i.e.

GradientBoostingClassifier [base_estimator_, max_features_, n_classes_, n_features_]
GradientBoostingRegressor [base_estimator_, classes_, max_features_, n_estimators_, n_features_]

matsmaiwald · 2019-07-14T08:39:58Z

nevermind, misread where attributes are missing and where not

alexitkes · 2019-07-14T09:54:07Z

It's looking like there is also classes_ attribute undocumented for classifiers of naive_bayes submodule. I have started to fix it.

mandalbiswadip · 2019-07-14T10:18:56Z

I will work on TfidfVectorizer, [fixed_vocabulary_]

rcwoolston · 2019-07-14T14:38:15Z

I will work on:

RandomForestClassifier, [base_estimator_]
RandomForestRegressor, [base_estimator_, n_classes_]
ExtraTreesClassifier, [base_estimator_]
ExtraTreesRegressor, [base_estimator_, n_classes_]

wendyhhu · 2019-07-14T14:57:23Z

I'm working on:

SGDClassifier, [average_coef_, average_intercept_, standard_coef_, standard_intercept_]
SGDRegressor, [standard_coef_, standard_intercept_]

EDIT: opened an issue to change these attributes from public to private (reference: #14364)

SwordKnight6216 · 2019-07-14T15:47:37Z

I am working on:
KernelCenterer, [K_fit_all_, K_fit_rows_]
MinMaxScaler, [n_samples_seen_]

rcwoolston · 2019-07-14T15:52:59Z

I will work on:

RandomTreesEmbedding, [base_estimator_, classes_, feature_importances_, n_classes_, n_features_, n_outputs_, one_hot_encoder_]

marenwestermann · 2020-07-10T12:42:08Z

I'm working on Lasso.

marenwestermann · 2020-07-10T13:26:48Z

I'm now working on adding the attribute sparse_coef_ to MultiTaskElasticNet and MultiTaskLasso.

marenwestermann · 2020-07-10T15:04:11Z

I'm working on LarsCV.

marenwestermann · 2020-08-07T10:02:04Z

@thomasjpfan it is said in the classes SVR and OneClassSVM:
"The probA_ attribute is deprecated in version 0.23 and will be removed in version 0.25." and
"The probB_ attribute is deprecated in version 0.23 and will be removed in version 0.25."

Therefore, these attributes probably don't need documentation anymore, right?
Going from here, will these two attributes also be deprecated in the class NuSVR?

marenwestermann · 2020-09-11T14:25:10Z

The attributes classes_ and n_classes_ for ExtraTreeRegressor are false positives.

thomasjpfan · 2020-09-11T16:07:14Z

Therefore, these attributes probably don't need documentation anymore, right?
Going from here, will these two attributes also be deprecated in the class NuSVR?

Since we are deprecating them I would say we would not need document them.

The attributes classes_ and n_classes_ for ExtraTreeRegressor are false positives.

Yup those should be deprecated then removed if they are not already.

Abilityguy · 2020-09-16T12:59:58Z

The DecisionTreeRegressor class says:
"the n_classes_ attribute is to be deprecated from version 0.22 and will be removed in 0.24."
"the classes_ attribute is to be deprecated from version 0.22 and will be removed in 0.24."

So these attributes also don't need documentation right?

cmarmo · 2020-09-16T14:34:23Z

So these attributes also don't need documentation right?

Right @Abilityguy, thanks for pointing out that.

mynkdsi1011 · 2020-09-24T19:28:21Z

I can see below mismatch in RidgeGCV :
Docstring Error: Attribute mismatch in RidgeGCV
alpha
best_score
coef_
dual_coef_
intercept_
n_features_in_

and in BaseRidgeCV:
Docstring Error: Attribute mismatch in BaseRidgeCV
alpha
best_score
coef_
intercept_
n_features_in_

Can I take it up? I am first timer and wants to contribute.

srivathsa729 · 2020-09-26T13:33:54Z

@marenwestermann in the class FeatureAgglomeration, it is said that, in version 0.21, n_connected_components_ was added to replace n_components_, then n_components_ would be false positive right..?

marenwestermann · 2020-09-29T17:58:04Z

@srivathsa729 from my understanding yes. However, it would be good if one of the core developers could double check.

disha4u · 2020-10-05T12:31:21Z

I will take up ElasticNet

marenwestermann · 2020-11-04T12:59:12Z

Documentation of the attributes X_offset_ and X_scale_ for BayesianRidge has been added with #18607 .

marenwestermann · 2020-11-05T19:59:28Z

The attribute output_2d_ is deprecated in DummyClassifier and DummyRegressor (see #14933).

marenwestermann · 2020-11-09T17:05:02Z

I ran the script provided by @amueller at the top of this PR (the code needs to be slightly modified because things have moved around). I couldn't find any more attributes that need to be documented with the exception of n_features_in_ which I see has been introduced in #16112. This attribute is undocumented in I think all classes it was introduced to. Should it be documented?
ping @NicolasHug

ShyamDesai · 2021-02-21T00:28:21Z

Hello. I wanted to take this on as a first issue, but it seems that all attributes have already been documented?

cmarmo · 2021-02-23T17:22:45Z

Thanks @marenwestermann for checking! This is very helpful.
n_features_in_ documentation is now tracked in #19333.

cmarmo · 2021-02-23T20:11:27Z

It turns out that all detections from the script in the descriptions are false positives, I'm closing this one. Thanks to all the contributors for their helpful work!

amueller added Easy Well-defined and straightforward way to resolve Documentation good first issue Easy with clear instructions to resolve help wanted Sprint labels Jul 12, 2019

This was referenced Jul 13, 2019

WIP: Update attribute description in documentation strings. alexitkes/scikit-learn#1

Closed

Fix attribute mismatches in documentation strings. #14320

Merged

thomasjpfan added this to To do in Sprint Scipy 2019 Jul 13, 2019

amueller mentioned this issue Jul 13, 2019

Fix warnings in examples #14117

Closed

6 tasks

mepa mentioned this issue Jul 13, 2019

[MRG] Add tree submodule attribute documentation #14339

Merged

This was referenced Jul 13, 2019

[MRG] DOC add missing attributes to linear regression #14341

Merged

[MRG] DOC add missing attributes to LinearSVR and LinearSVC #14344

Merged

TomDLT closed this as completed in #14320 Jul 14, 2019

TomDLT reopened this Jul 14, 2019

alexitkes mentioned this issue Jul 14, 2019

Describe classes_ attribute for GaussianNB, MultinomialNB, etc. #14354

Merged

4 tasks

rcwoolston mentioned this issue Jul 14, 2019

Added base_estimator attribute documentation to RandomForest* #14362

Merged

thomasjpfan mentioned this issue Jul 14, 2019

[MRG] DOC shrinkage is not learnt in ShrunkCovariance #14363

Merged

marenwestermann mentioned this issue Jul 10, 2020

DOC dual_gap_ attribute for Lasso #17882

Merged

marenwestermann mentioned this issue Jul 10, 2020

DOC sparse_coef_ documentation for MultiTaskElasticNet and MultiTaskLasso #17884

Merged

marenwestermann mentioned this issue Jul 10, 2020

DOC add attribute active_ to LarsCV #17886

Merged

cmarmo mentioned this issue Aug 17, 2020

Fix shape parameter inconsistencies in developer's guide #18175

Closed

4 tasks

marenwestermann mentioned this issue Aug 21, 2020

DOC alphas_ documentation of LassoLarsIC #18224

Merged

marenwestermann mentioned this issue Sep 11, 2020

DOC mean_precision_prior attribute documentation of class BayesianGaussianMixture #18375

Closed

This was referenced Nov 4, 2020

DOC for n_nonzero_coefs_ attribute in OrthogonalMatchingPursuit #18756

Merged

DOC add attribute documentation to PLSRegression #18764

Closed

MNT deprecate attributes in Partial Least Squares module #18768

Merged

marenwestermann mentioned this issue Nov 9, 2020

DOC fit_ and partial_fit_ documentation for class Birch #18796

Closed

cmarmo closed this as completed Feb 23, 2021

cmarmo mentioned this issue Apr 30, 2021

Failed xtests on test_docstring_parameters.py (documentation issues) #19781

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure all attributes are documented #14312

Ensure all attributes are documented #14312

amueller commented Jul 12, 2019 •

edited by NicolasHug

alexitkes commented Jul 12, 2019 •

edited

thomasjpfan commented Jul 13, 2019 •

edited by adrinjalali

mepa commented Jul 13, 2019 •

edited

wendyhhu commented Jul 13, 2019

wendyhhu commented Jul 13, 2019

matsmaiwald commented Jul 13, 2019

matsmaiwald commented Jul 14, 2019

alexitkes commented Jul 14, 2019

mandalbiswadip commented Jul 14, 2019

rcwoolston commented Jul 14, 2019 •

edited

wendyhhu commented Jul 14, 2019 •

edited

SwordKnight6216 commented Jul 14, 2019

rcwoolston commented Jul 14, 2019

marenwestermann commented Jul 10, 2020

marenwestermann commented Jul 10, 2020

marenwestermann commented Jul 10, 2020

marenwestermann commented Aug 7, 2020

marenwestermann commented Sep 11, 2020

thomasjpfan commented Sep 11, 2020

Abilityguy commented Sep 16, 2020

cmarmo commented Sep 16, 2020

mynkdsi1011 commented Sep 24, 2020

srivathsa729 commented Sep 26, 2020 •

edited

marenwestermann commented Sep 29, 2020

disha4u commented Oct 5, 2020

marenwestermann commented Nov 4, 2020 •

edited

marenwestermann commented Nov 5, 2020

marenwestermann commented Nov 9, 2020

ShyamDesai commented Feb 21, 2021

cmarmo commented Feb 23, 2021 •

edited

cmarmo commented Feb 23, 2021

Ensure all attributes are documented #14312

Ensure all attributes are documented #14312

Comments

amueller commented Jul 12, 2019 • edited by NicolasHug

alexitkes commented Jul 12, 2019 • edited

thomasjpfan commented Jul 13, 2019 • edited by adrinjalali

Missing attribute docstrings for each estimator

Reference this issue in your PR

mepa commented Jul 13, 2019 • edited

wendyhhu commented Jul 13, 2019

wendyhhu commented Jul 13, 2019

matsmaiwald commented Jul 13, 2019

matsmaiwald commented Jul 14, 2019

alexitkes commented Jul 14, 2019

mandalbiswadip commented Jul 14, 2019

rcwoolston commented Jul 14, 2019 • edited

wendyhhu commented Jul 14, 2019 • edited

SwordKnight6216 commented Jul 14, 2019

rcwoolston commented Jul 14, 2019

marenwestermann commented Jul 10, 2020

marenwestermann commented Jul 10, 2020

marenwestermann commented Jul 10, 2020

marenwestermann commented Aug 7, 2020

marenwestermann commented Sep 11, 2020

thomasjpfan commented Sep 11, 2020

Abilityguy commented Sep 16, 2020

cmarmo commented Sep 16, 2020

mynkdsi1011 commented Sep 24, 2020

srivathsa729 commented Sep 26, 2020 • edited

marenwestermann commented Sep 29, 2020

disha4u commented Oct 5, 2020

marenwestermann commented Nov 4, 2020 • edited

marenwestermann commented Nov 5, 2020

marenwestermann commented Nov 9, 2020

ShyamDesai commented Feb 21, 2021

cmarmo commented Feb 23, 2021 • edited

cmarmo commented Feb 23, 2021

amueller commented Jul 12, 2019 •

edited by NicolasHug

alexitkes commented Jul 12, 2019 •

edited

thomasjpfan commented Jul 13, 2019 •

edited by adrinjalali

mepa commented Jul 13, 2019 •

edited

rcwoolston commented Jul 14, 2019 •

edited

wendyhhu commented Jul 14, 2019 •

edited

srivathsa729 commented Sep 26, 2020 •

edited

marenwestermann commented Nov 4, 2020 •

edited

cmarmo commented Feb 23, 2021 •

edited