Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstrings checking #2011

Closed
xmnlab opened this issue Oct 25, 2019 · 29 comments
Closed

Docstrings checking #2011

xmnlab opened this issue Oct 25, 2019 · 29 comments

Comments

@xmnlab
Copy link
Contributor

xmnlab commented Oct 25, 2019

Accordingly pydocstyle, currently there are 3445 docstings issues inside the project.

I have started to fix that #1996

I will open one PR for each backend dosctring fixing

@xmnlab xmnlab changed the title Docstrings issues Docstrings checking Oct 31, 2019
@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

Currently there are a lot of docstrings issue inside ibis. Accordingly pydocstyle, currently there are 3064 docstrings error inside the project.

Ibis CI now is checking just files at ibis/ and ibis/omniscidb but for the other backends, a "precise" validation should be done locally using pydocstyle. and it takes times.

My point of view is that a docstring validations is similar to code validation. In ibis we have already code format validation, and black checking, also pydocstyle validation for two folders.

Ibis also has pre-commit hooks for black, flake8 and isort.

That helps a lot to keep review simple and easier.

I think that would be good to work on that into 2 phases.

  1. fix in a simple way the docstring issues and
  2. improve and standardize (as much as possible) the docstrings.

the 1st step is important because after this fixing every new contributions will automatically be checked the docstrings on CI.

for the phase 2, we need to discuss best practices and create a document specifying the ibis documentation guideline.

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

Following the discussion on #2018 :

There are a lot of functions and method inside Ibis with no docstrings or with docstrings that doesn't fit in the numpy specifications.

it is an evidence that it is hard for reviewers to check that manually also maybe it also not clear for reviewers that ibis use numpy docstring standard for docstrings.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

pls show a sample of the warnings

the linters oftentimes show IMHO a lot of useless things that simply should be ignored

we already do quite a lot of linking so need to be quite mindful of adding baggage

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

in the PR #2018 also I am proposing to keep as private functions that don't need to be visible to end-users.

But if that functions need to be visible to these functions should be very well documented and the documentation should be updated.

https://docs.ibis-project.org/api.html seems to be out of date.

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

here is the current log from pydocstyle: https://pastebin.com/uUsRHkhP

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

the linters oftentimes show IMHO a lot of useless things that simply should be ignored

the problem is do it in a subjective approach .. also my opinion could be different than yours.

using something objective makes the review workflow easier and it guarantees at least a minimum standard level for the code.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

you are missing the point

the linters check for too much stuff here

you are going to cause massive changes for IMHO not much gain and a lot of pain from contributors

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

the checks just contemplate the numpy docstring specification.

I think all these changes will have a good impact, for the documentation, for new contributors, for reviewers, for users ...

as I am proposing that .. I am working on fixing these issues .. I am trying to work on one backend each week.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

what exactly is the problem with the current lint checks?

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

currently on CI pydocstyle just run for ibis/ and ibis/omniscidb/ .. the docstrings there were already fixed.

related to #2018 these are the current docstring errors: https://pastebin.com/pB7Qgc4y

as you can see numpy docstring specification specify that all public functions methods or classes need a docstring.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 1, 2019

@toryhaavik @scottcode @cpcloud any thought about this?

@scottcode
Copy link
Contributor

@xmnlab I appreciate your enthusiasm for docstrings as well as automated style checkers. I especially like our use of black and isort, because they impose some consistency without any added burden to the developer. If something doesn't fit the style for those two, they automatically fix them. The fixes are pretty uncontroversial: for example, reorder the import statements or change whitespace/linebreaks in the code.

The problem with pydocstyle is that it complains about a whole bunch of stuff that I don't see much value in. I don't care if the first line ends in a period or not. It doesn't bother me much if some docstrings fit in one line including the quotes while others don't. If the style checker doesn't automatically make uncontroversial fixes to these kind of issues, then I'd prefer not having my commits hung up on such tiny style details.

Some things it finds can be important, like missing docstrings, but even in that case, sometimes method names are self-explanatory enough. Especially in a batch of many very similar methods, ensuring docstrings on every single one might actually more distracting than helpful.

The few ways in which standardization would be helpful don't seem to be checked by pydocstyle anyway. Like consistently using either :param arg: style argument documentation versus more informal-looking as below:

Parameters
-----------
arg:   str Name of table

(Which are we supposed to use, by the way?)

In short, I'd be on board with taking a little more care with our docstrings in ways that add value, but let's not overburden ourselves with minutiae.

@scottcode
Copy link
Contributor

Also, I think using pydocstyle in the way you have with fixing docstrings for the Pyspark backend can lead to unintended consequences, like making a bunch of methods private just so the style checker will ignore them. I don't think all those methods that you changed to private in #2018 really ought to be private.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 3, 2019

hey @scottcode thanks for the feedback :)

personally I don't worry also if the summary ends with period or not ... but it is good to be consistent on the entire docstring/documentation .. some places that ends with period .. and other places that doesn't end with period seems to be inconsistent (I am using the period here just as an example).

Currently we use numpydoc style as our default style https://docs.ibis-project.org/contributing.html#style-and-formatting

if we decide to add exceptions we should update the documentation.

using pydocstyle we can specify to ignore some kind of errors .. and IMHO this info should be described into the documentation.

as you mentioned pydocstyle doesn't check some important stuff. but we can contribute pydocstyle to improve it, I think that would be a great help (also we can motivate other people to contribute).

Currently there is an issue that makes hard to add pydocstyle into git pre-commit hook. I am planing to help into that*

related to #2018 I change that methods to private because It didn't look as it need to be public or that is used externally .. ibis will register it and it is not used directly ... but I think that also private functions/methods should be documented. but I don't worry to keep all these functions as public ... but my first guess is that if it is public it should be public into the API documentation ... and so .. the docstrings should be improved.

detailed documentation helps contributors, users and also reviewers.

@saulshanabrook
Copy link
Contributor

saulshanabrook commented Nov 3, 2019

@datapythonista was telling me about a lot of work they are doing in Pandas to automate and standardize their docstrings, to make it easier for review changes to them and verify they are accurate. @datapythonista do you have a link that summarizes that work that might be useful for ibis to learn from?

@datapythonista
Copy link
Contributor

numpydoc master will have a python -m numpydoc --validate foo.Bar that you can use to get a list of "errors"

So far can just be called from Python, see this PR: numpy/numpydoc#238

Let me know if you have any question, I'll probably put together a blog post when everything is in place.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 4, 2019

thanks @datapythonista ! It looks interesting! I will play with that!

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 4, 2019

@datapythonista is pandas using already this new feature of numpydoc?
I think it makes sense using the same/similar approach used by pandas.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 4, 2019

@datapythonista also could you give an overview about the decisions you made on pandas group related to docstrings verification? it would be good to learn more about the experience you had and implement good practices also here.

@datapythonista
Copy link
Contributor

pandas has been adding new validations for around two years while building that script. Now we're moving it to numpydoc so the rest of the community can also use it. There is still some discussion, because numpydoc allows several options here and there, and we are more strict and want "one and only one way to do things". But the points under discussion are minimal, and you can ignore the errors you don't care about.

In conclusion, the script added a lot of value for pandas in keeping docstrings right and consistent. For other projects it's still "in beta" and if you start using right now, you'll be part of the discussions and testing.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 4, 2019

thanks @datapythonista for the details !

@jreback @scottcode @cpcloud any thoughts about using the same/similar approach used by pandas?

@jreback
Copy link
Contributor

jreback commented Nov 4, 2019

@xmnlab +1 in having some doc string validations

there are currently pretty good validations in place. sure i think some of this can be improved

but let’s go slow with a POC and not wholesale just change things

for example having copy pasted doc strings is actually a big -1; this causes a lot of maintenance costs

so happy for you or others to propose things

but as i said let’s go step by step

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 4, 2019

there are currently pretty good validations in place.

@jreback not sure I am following you ... could you give more details/examples?

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 4, 2019

for example having copy pasted doc strings is actually a big -1; this causes a lot of maintenance costs

and how people are dealing with these cases in pandas?

by the other hand, a good documentation could avoid contribution costs (eg: #1989)

it is not related but, for example, pyopensci review requires that all external package functions, classes, and methods should be fully documented with examples. https://www.pyopensci.org/dev_guide/packaging/packaging_guide#Documentation

I understand your point .. but I think our current state about the documentation is not good .. and it is harder to maintain that than spending time to improve docstrings and documentation.

@jreback
Copy link
Contributor

jreback commented Nov 5, 2019

I understand your point .. but I think our current state about the documentation is not good .. and it is harder to maintain that than spending time to improve docstrings and documentation.

how so? what exactly is problematic about the current docs / documentation?

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 6, 2019

@jreback good point. I am analyzing the documentation (at least an overview) and I will share some points here today or tomorrow.

@datapythonista
Copy link
Contributor

Probably worth sharing our experience with pandas. We've got around 1,600 docstrings, and many of them could benefit from.having better content (too short descriptions, lack of examples...).

There is no way senior people can fix all those, there are just too many. So we encourage first time contributors to help us. But that takes a lot of our of time reviewing code, and most of that commenting about formatting and punctuation errors.

We are adding validation to the CI (which requires fixing the errors first) to be able to have the contributions we need in a way we don't need to waste lots of hours in reviews we can automate.

We spent lots of hours in this for more than two years, and there is still a lot to do. I don't know much about ibis documentation, but my recommendation is to avoid a quest to fix docstrings unless you really have a good reason to or you're very bored. ;)

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 9, 2019

I created an issue to organize some points related to documentation #2025.

@datapythonista thank you for your feedback. You are doing a very nice job related to the docstrings. the validation option looks very nice.

In the case of ibis ... I think that probably we could fix and organize the docstrings texts and validations soon. There are a lot of internal functions that don't need docstrings or their docstrings could be something simple. Maybe the hard work here would be define some strategies to do it.

just for sharing ... I was reading today sympy's documentation about documentation style and it seems very interesting:
https://github.com/sympy/sympy/blob/master/doc/src/documentation-style-guide.rst#id15

@cpcloud
Copy link
Member

cpcloud commented Dec 17, 2021

Feel free to make things conform to pydocstyle, but a wholesale change to enforce this is out of scope.

@cpcloud cpcloud closed this as completed Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants