-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added variable docstring support #101
Conversation
logical lines that begin with a string and don't follow other docstrings are now considered to be docstrings fixes zheller#100 requires flake8>=3.0 drop support for python 2.7 and 3.4 added some type assertions update tests for new docstring rules and to call flake8 directly
I wasn't able to remove the old docstring logic completely, it's still needed to detect inline docstrings. But a lot of other code was able to go away. Given that we now rely on flake8 to determine what a logical line is, the testing code had to change to call flake8 rather than the checker directly. The checker now also returns 1 based column numbers (using the line and column reported by flake8s tokenizer). Now any logical line with a leading string is considered a docstring, unless if follows another docstring. I did the previous line check to match previous behavior, but I'm not sure if I should have. A second docstring is not processed as a docstring (AFAICT), but it's still logically a docstring since it's ignored by the python interpreter, so it probably shouldn't be considered a multiline string either as it is now. I'm thinking to add another error code for duplicate docstrings, but still consider them docstrings. Though checking for duplicate docstrings seems out of scope for this module... I welcome your feedback on that point. If this PR works for you, I'll add code to fix #82 shortly. I know it's a significant change. I also added type annotations (mostly) which required dropping support for Python 3.4 as well. This should still work with 3.4 if we just strip the type annotations or switch them all to comment style. |
Afk for a while today. Might not be able to review until tomorrow or the
weekend even
At first glance, seems like overreach in the PR =/ If we're just changing
from physical to logical lines and dropping our own parsing support, then
I'd imagine that only the init parameter changes and we drop some
conditional logic
One example of overreach is this looks like it includes changes like adding
a typing convention which I feel is a whole other discussion since it
should require willingness of contributors to maintain and lint that
…On Thu, Jun 4, 2020, 12:17 PM Peter Linss ***@***.***> wrote:
I wasn't able to remove the old docstring logic completely, it's still
needed to detect inline docstrings. But a lot of other code was able to go
away.
Given that we now rely on flake8 to determine what a logical line is, the
testing code had to change to call flake8 rather than the checker directly.
The checker now also returns 1 based column numbers (using the line and
column reported by flake8s tokenizer).
Now any logical line with a leading string is considered a docstring,
unless if follows another docstring. I did the previous line check to match
previous behavior, but I'm not sure if I should have.
A second docstring is not processed as a docstring (AFAICT), but it's
still logically a docstring since it's ignored by the python interpreter,
so it probably shouldn't be considered a multiline string either as it is
now.
I'm thinking to add another error code for duplicate docstrings, but still
consider them docstrings. Though checking for duplicate docstrings seems
out of scope for this module... I welcome your feedback on that point.
If this PR works for you, I'll add code to fix #82
<#82> shortly. I know it's
a significant change.
I also added type annotations (mostly) which required dropping support for
Python 3.4 as well. This should still work with 3.4 if we just strip the
type annotations or switch them all to comment style.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG4KWF535E2G5Z65FV4OFDRU7XMRANCNFSM4NS4URPA>
.
|
No problem on the timing, it's a significant change in behavior, take your time. I can run on my fork in the interim (or if you decide you don't want to go this route, which is also totally fine). The type annotations are minor (and just a habit at this point, I find they really do help eliminate bugs), I'm happy to remove them or move them to a separate PR. I don't mean to impose extra workflow or maintenance burden on you. They're really ignored by everything in Python 3.5+ unless you add specific type checks (like mypy or flake8 extensions to require and validate them). If I convert them to comments then they're really ignored by everything except the type check tooling, I just find the comment variant less easy to read and maintain than the annotation. Aside from that, the only changes not related to the parsing change are dropping Python 2 support; dropping old flake8 support, since it's now dependent on >=3.0 for the logical line processing (the change was simplifying the option processing); and dropping the noqa support, since flake8 handles that for us now too. As I suspected the primary quote logic didn't change at all (except adjusting the return types to match what flake8 expects since it's calling the iterator directly now ( I did bump the major version as you mentioned due to the flake8 3.0 dependency, I'm not sure if a major version increment is warranted, but totally your call. The bulk of the diff is the testing code, which was needed since we can't call directly into the checker without computing the logical lines, so we just run flake8 now and parse its output (no sense in trying to replicate that behavior and getting it wrong/out of sync over time). Aside from how the tests are run, the only changes are updating the column numbers, and syncing to the new docstring detection which turns some previously multiline strings into valid docstrings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started reviewing but getting a bit overwhelmed by all the changes still. As a sanity check, this is only moving from physical lines to logical lines and not yet handling variable docstrings. Is that correct?
Also, can you explain how logical lines make it easier to handle variable docstrings as opposed to physical lines? My current take is that they're not too different and we usually need full file state awareness for the docstring parser to work (i.e. don't just care about previous line but all previous lines in file -- e.g. |
Sure, when flake8 breaks the file up into logical lines, each complete statement is passed as one line (even when it spans multiple physical lines), so any docstring is it's own statement and will the the first significant token on the line. The docstring detection code for module docstrings sees variable docstrings as module docstrings when only looking at a single logcial line in isolation. If it weren't for the edge case of inline docstrings, e.g. There's no need for full file awareness and the only reason to look at the previous line at all is to detect:
where each docstring is passed in as a separate logical line. Frankly, I consider the second docstring in that example to still be a docstring, just one that will be ignored by pretty much all documentation processors. I'd rather flag it as a 'bad docstring continuation', but still consider it a docstring. e.g. there's no reason for it to have the multiline string style. If you agree with that reasoning, we can drop the previous line handling entirely and just let them be docstrings, or use it to surface a different error about continuation docstrings. This PR does in fact detect all the variable docstrings, as well as other non-standard docstrings, e.g.
as evidenced by the test suite. Look for places where I changed """not a docstring""" to """variable docstring""" or """non-standard docstring""". Python still ignores those non-standard docstrings (as well as docstring continuations), they're just not considered part of the standard documentation. |
Did some more research and found out that what I was referring to as 'continuation docstrings' are actually called 'additional docstrings' and are part of PEP258. While that PEP was rejected, it's still referred to in PEP257 which is still active, and 'additional docstrings' are explicitly mentioned along with 'attribute docstrings'. So I strongly suggest we consider them docstrings and remove the previous line check entirely. |
Was just about to review/process replies but Internet has cut out again
today. Hopefully will be able to go through this tomorrow or Sunday
…On Fri, Jun 5, 2020, 11:44 AM Peter Linss ***@***.***> wrote:
Did some more research and found out that what I was referring to as
'continuation docstrings' are actually called 'additional docstrings' and
are part of PEP258
<https://www.python.org/dev/peps/pep-0258/#additional-docstrings>. While
that PEP was rejected, it's still referred to in PEP257
<https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring> which is
still active, and 'additional docstrings' are explicitly mentioned along
with 'attribute docstrings'.
So I strongly suggest we consider them docstrings and remove the previous
line check entirely.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG4KWFTAMXLRJDMLEIQSUDRVE4HVANCNFSM4NS4URPA>
.
|
My internet is back, going to give reviewing a go Updated the title to better reflect the intention of the PR since logical lines seems more auxiliary based on your comment |
Yea, I concur that our goal should be to be in alignment with active PEPs (it doesn't look like there's a finalized state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really break up this PR into 2:
- Dropping
flake8@2
support - Adding variable docstring support
The reasoning here is while taking the attempt at I saw a bunch of flake8@2
polyfills which were missed in removal =/
|
||
def __iter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find a reference to this in the flake8
docs but I can't seem to. Where's this documented and can we add a comment linking to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find it referenced in the flake8 docs. Frankly the plugin docs are somewhat lacking. I learned about how logical line plugins work by looking at other examples and reading the flake8 source.
import subprocess | ||
from unittest import TestCase | ||
|
||
OUTPUT_REGEX = re.compile(r'.*?:([\d]+):([\d]+): (.*)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really prefer to use flake8
APIs instead of running the command directly =/ It slows down the tests and makes debugging weirder/harder (e.g. print statements are now in assert
blocks instead of just stdout
Unfortunately, I took a shot but don't have time to dig to find a nice default options for FileChecker
or FileProcessor
=/
https://flake8.pycqa.org/en/3.8.2/internal/checker.html#processing-files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's good practice to avoid using .*
when you can. Better to use not class identifiers to avoid accidentally greedy code:
OUTPUT_REGEX = re.compile(r'.*?:([\d]+):([\d]+): (.*)') | |
OUTPUT_REGEX = re.compile(r'[^:]+:([\d]+):([\d]+): (.*)') |
Also can we add an example comment line before so someone can better understand what the regex is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .*?
is a non-greedy match, that change doesn't hurt but is unnecessary as the variants are functionally equivalent. If it's a preferred style thing, have at it, it's your project.
Personally I don't see the value in spending time optimizing unit test code, it's not something users will be dealing with and effort spent there should be balanced against developer pain and the amount of development effort going in to the project.
Hi Todd, I appreciate the effort you put into reviewing this PR, and the work you've done on this project already, but I'm not going to spend the effort breaking this PR up. Also, when switching to logical_line handling, you already get the variable docstring handling for free, so they're not really distinct changes. You could in theory drop flake8@2 support first without switching to logical_line handling, but I don't see the value. I had a need for better docstring handling and rather than just complain, I decided to take care of it myself. As a way of partly paying back the value I have already received from this plugin, I decided to contribute my changes. My needs have been met and I'm using my fork in my projects at this point. It's your project, do what you want with it. You have my contribution. It's a gift, along with my thanks. Feel free to take my code and do what you want with it, or not. I won't be offended either way, and I don't have a need to be credited by having my commits land directly, so feel free to cherry-pick changes in your own way. I have too many other projects going on right now to spend any more time on this facet of this one. I am happy to continue to answer any questions you may have about my code. FWIW, if you decide to accept my suggestion of treating additional docstrings as docstrings, you can simply remove all the |
Unsure what you mean by "it's your project", open source with broad licensing is the public's project imo -- it just so happens that I'm a maintainer here. Using phrasing like "it's your project" seems like a way to sneak out of responsibility here and in future bug fixes for said relevant code ._. I similarly don't have time to build out this enhancement fully, hence the "help wanted" tag on the original issue Closing this PR and making a comment on the original issue for future contributors Thanks for your time and contributions =) |
logical lines that begin with a string and don't follow other docstrings are now considered to be docstrings
fixes #100
requires flake8>=3.0
drop support for python 2.7 and 3.4
added some type assertions
update tests for new docstring rules and to call flake8 directly