Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Fix warnings about references and links #14976

Merged
merged 15 commits into from
Sep 23, 2019

Conversation

cmarmo
Copy link
Member

@cmarmo cmarmo commented Sep 13, 2019

Reference Issues/PRs

Close #14975

What does this implement/fix? Explain your changes.

Fix warnings on references and links in the documentation.
Add referenced missing entries in the Glossary.

@cmarmo
Copy link
Member Author

cmarmo commented Sep 13, 2019

CircleCI failure is my bad... I will fix it as soon as possible.

``pos_label``
Value with which positive labels must be encoded in binary
classification problems in which the positive class is not assumed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something like. "This value is typically required to compute asymmetric evaluation metrics such as precision and recall".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in bdd700f

[1] "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman,
David M. Blei, Francis Bach, 2010
.. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D.
Hoffman, David M. Blei, Francis Bach, 2010
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make such bibliographic references with number indices, we then have indentifier conflicts that automatically get resolved by sphinx using a hash of the text of the reference:

Screenshot from 2019-09-16 09-33-14

Which leads to:

Screenshot from 2019-09-16 09-33-02

in the body of the docstring when the reference is cited.

So we have two options:

  • either we keep using numbers as references indices in such docstring but then we remove the the _ suffix of mentions such as [1]_ and the .. prefix in the entries so as to not make those actual references and let the user scroll down and scan the docstring manually instead of generating a link.

  • alternatively we stop using integer index in such references in docstrings and instead use unique identifiers such as [Hoffman2010].

For short class / function docstrings both options are possible. Whenever the text is long to the point the reader has to scroll by more than 1 screen length, I believe the second option makes most sense.

Updating all references to follow the second option (explicit identifiers) would lead to a large PR, so if we want to do this I would do it in several small localized PRs that can be merged progressively, starting with the PRs that actually cause sphinx warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the second option. But then all refs (even those not linked in the text) must be updated for consistency (and future use).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel, do you mind if I address the citing problem in a new issue? Just to focus here on the sphinx warnings... and close that one ASAP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the core of the referencing problem is also reported there #4344

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree let's focus on fixing the sphinx warnings first and see about the general consistency of references across the full code-base for later PRs.

@@ -932,13 +932,13 @@ class HistGradientBoostingClassifier(BaseHistGradientBoosting,
The number of tree that are built at each iteration. This is equal to 1
for binary classification, and to ``n_classes`` for multiclass
classification.
train_score_ : ndarray, shape (n_iter_ + 1,)
train_score_ : ndarray, shape (`n_iter_ + 1`,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't code escape shapes in general see #14744 Was this producing a warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: the n_iter_ was interpreted as a label for a link producing a warning.

WARNING: Unknown target name: "n_iter".

Maybe another fix is possible....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that simply removing spaces solved the problem (765e0fe)

@@ -108,7 +108,7 @@ class BayesianRidge(RegressorMixin, LinearModel):
sigma_ : array-like of shape (n_features, n_features)
Estimated variance-covariance matrix of the weights

scores_ : array-like of shape (n_iter_ + 1,)
scores_ : array-like of shape (`n_iter_ + 1`,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

doc/glossary.rst Outdated
operations involve using a large amount of temporary memory.
Where computations can be performed in fixed-memory chunks the user is
allowed to hint at the maximum size of this working memory (defaulting
to 1GB).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use this outside of set_config and pairwise_distances_chunked? I don't know if we want to add it to the glossary.

If we do add it, we would need to reference the glossary from those docstrings, but then the behavior is not consistent as in functions outside of set_config

When None (default), the value of sklearn.get_config()['working_memory'] is used.

So postponing this until we have more occurrences might be simplest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure to understand... Anyway I have added working_memory in the glossary because of the warning

scikit-learn/doc/modules/computing.rst:568: WARNING: term not in glossary: working_memory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-reading... understood. Let's do the other way around. I have removed the link to working_memory and deleted working_memory from the glossary (bdd700f)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments but otherwise, LGTM.

[1] "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman,
David M. Blei, Francis Bach, 2010
.. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D.
Hoffman, David M. Blei, Francis Bach, 2010
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree let's focus on fixing the sphinx warnings first and see about the general consistency of references across the full code-base for later PRs.

The scores at each iteration on the training data. The first entry
is the score of the ensemble before the first iteration. Scores are
computed according to the ``scoring`` parameter. If ``scoring`` is
not 'loss', scores are computed on a subset of at most 10 000
samples. Empty if no early stopping.
validation_score_ : ndarray, shape (n_iter_ + 1,)
validation_score_ : ndarray, shape (`n_iter_ + 1`,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validation_score_ : ndarray, shape (`n_iter_ + 1`,)
validation_score_ : ndarray, shape (n_iter_+1,)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry... just fixed

@@ -932,13 +932,13 @@ class HistGradientBoostingClassifier(BaseHistGradientBoosting,
The number of tree that are built at each iteration. This is equal to 1
for binary classification, and to ``n_classes`` for multiclass
classification.
train_score_ : ndarray, shape (n_iter_ + 1,)
train_score_ : ndarray, shape (`n_iter_ + 1`,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
train_score_ : ndarray, shape (`n_iter_ + 1`,)
train_score_ : ndarray, shape (n_iter_+1,)

@cmarmo cmarmo changed the title [DOC] Fix warnings about references and links DOC Fix warnings about references and links Sep 20, 2019
@cmarmo
Copy link
Member Author

cmarmo commented Sep 23, 2019

Thanks @ogrisel! Maybe someone else could check?
Recent merges have already generated new sphinx warnings... this is likely to become a Sisyphus punishement... :(

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating all references to follow the second option (explicit identifiers) would lead to a large PR...

Having the references use a [NAMEYEAR] style through out the repo woud be great.

Thank you @cmarmo on working on reducing the warnings in sphinx! Next would be to get the CI to fail when there are any warnings, so we do not need to keep on fixing the warnings manually.

@thomasjpfan thomasjpfan merged commit 6bd8df0 into scikit-learn:master Sep 23, 2019
@glemaitre
Copy link
Member

Thank you @cmarmo on working on reducing the warnings in sphinx! Next would be to get the CI to fail when there are any warnings, so we do not need to keep on fixing the warnings manually.

We were discussing this in IRL. We should be able to quickly introspect the warning created by a PR somehow.

@thomasjpfan
Copy link
Member

We were discussing this in IRL. We should be able to quickly introspect the warning created by a PR somehow.

Agreed. It would most likely involve some fun bash/text processing, so one does not need to download the complete stdout to get the warnings.

@cmarmo cmarmo deleted the warningdoc branch September 24, 2019 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] WARNING: citation not found
5 participants