Added variable docstring support #101

plinss · 2020-06-04T18:58:28Z

logical lines that begin with a string and don't follow other docstrings are now considered to be docstrings
fixes #100
requires flake8>=3.0
drop support for python 2.7 and 3.4
added some type assertions
update tests for new docstring rules and to call flake8 directly

logical lines that begin with a string and don't follow other docstrings are now considered to be docstrings fixes zheller#100 requires flake8>=3.0 drop support for python 2.7 and 3.4 added some type assertions update tests for new docstring rules and to call flake8 directly

…on 3.5

plinss · 2020-06-04T19:17:13Z

I wasn't able to remove the old docstring logic completely, it's still needed to detect inline docstrings. But a lot of other code was able to go away.

Given that we now rely on flake8 to determine what a logical line is, the testing code had to change to call flake8 rather than the checker directly. The checker now also returns 1 based column numbers (using the line and column reported by flake8s tokenizer).

Now any logical line with a leading string is considered a docstring, unless if follows another docstring. I did the previous line check to match previous behavior, but I'm not sure if I should have.

A second docstring is not processed as a docstring (AFAICT), but it's still logically a docstring since it's ignored by the python interpreter, so it probably shouldn't be considered a multiline string either as it is now.

I'm thinking to add another error code for duplicate docstrings, but still consider them docstrings. Though checking for duplicate docstrings seems out of scope for this module... I welcome your feedback on that point.

If this PR works for you, I'll add code to fix #82 shortly. I know it's a significant change.

I also added type annotations (mostly) which required dropping support for Python 3.4 as well. This should still work with 3.4 if we just strip the type annotations or switch them all to comment style.

twolfson · 2020-06-04T19:45:46Z

Afk for a while today. Might not be able to review until tomorrow or the weekend even At first glance, seems like overreach in the PR =/ If we're just changing from physical to logical lines and dropping our own parsing support, then I'd imagine that only the init parameter changes and we drop some conditional logic One example of overreach is this looks like it includes changes like adding a typing convention which I feel is a whole other discussion since it should require willingness of contributors to maintain and lint that

…

On Thu, Jun 4, 2020, 12:17 PM Peter Linss ***@***.***> wrote: I wasn't able to remove the old docstring logic completely, it's still needed to detect inline docstrings. But a lot of other code was able to go away. Given that we now rely on flake8 to determine what a logical line is, the testing code had to change to call flake8 rather than the checker directly. The checker now also returns 1 based column numbers (using the line and column reported by flake8s tokenizer). Now any logical line with a leading string is considered a docstring, unless if follows another docstring. I did the previous line check to match previous behavior, but I'm not sure if I should have. A second docstring is not processed as a docstring (AFAICT), but it's still logically a docstring since it's ignored by the python interpreter, so it probably shouldn't be considered a multiline string either as it is now. I'm thinking to add another error code for duplicate docstrings, but still consider them docstrings. Though checking for duplicate docstrings seems out of scope for this module... I welcome your feedback on that point. If this PR works for you, I'll add code to fix #82 <#82> shortly. I know it's a significant change. I also added type annotations (mostly) which required dropping support for Python 3.4 as well. This should still work with 3.4 if we just strip the type annotations or switch them all to comment style. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#101 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG4KWF535E2G5Z65FV4OFDRU7XMRANCNFSM4NS4URPA> .

plinss · 2020-06-04T20:34:43Z

No problem on the timing, it's a significant change in behavior, take your time. I can run on my fork in the interim (or if you decide you don't want to go this route, which is also totally fine).

The type annotations are minor (and just a habit at this point, I find they really do help eliminate bugs), I'm happy to remove them or move them to a separate PR. I don't mean to impose extra workflow or maintenance burden on you. They're really ignored by everything in Python 3.5+ unless you add specific type checks (like mypy or flake8 extensions to require and validate them). If I convert them to comments then they're really ignored by everything except the type check tooling, I just find the comment variant less easy to read and maintain than the annotation. ~~Let me know, it's minutes of work for me~~ I went ahead and removed them and re-added Python 3.4 support, you're right, let's keep that change separate.

Aside from that, the only changes not related to the parsing change are dropping Python 2 support; dropping old flake8 support, since it's now dependent on >=3.0 for the logical line processing (the change was simplifying the option processing); and dropping the noqa support, since flake8 handles that for us now too.

As I suspected the primary quote logic didn't change at all (except adjusting the return types to match what flake8 expects since it's calling the iterator directly now (~~I just noticed I forgot to return the type, I'll add that~~ actually logical line plugins can only return a location and a string, so the existing code is correct on that front))

I did bump the major version as you mentioned due to the flake8 3.0 dependency, I'm not sure if a major version increment is warranted, but totally your call.

The bulk of the diff is the testing code, which was needed since we can't call directly into the checker without computing the logical lines, so we just run flake8 now and parse its output (no sense in trying to replicate that behavior and getting it wrong/out of sync over time). Aside from how the tests are run, the only changes are updating the column numbers, and syncing to the new docstring detection which turns some previously multiline strings into valid docstrings.

twolfson

Started reviewing but getting a bit overwhelmed by all the changes still. As a sanity check, this is only moving from physical lines to logical lines and not yet handling variable docstrings. Is that correct?

flake8_quotes/__about__.py

flake8_quotes/__init__.py

twolfson · 2020-06-05T07:42:17Z

Also, can you explain how logical lines make it easier to handle variable docstrings as opposed to physical lines?

My current take is that they're not too different and we usually need full file state awareness for the docstring parser to work (i.e. don't just care about previous line but all previous lines in file -- e.g. class statement could take a lot of line breaks to finish its closing parens)

plinss · 2020-06-05T16:56:38Z

Sure, when flake8 breaks the file up into logical lines, each complete statement is passed as one line (even when it spans multiple physical lines), so any docstring is it's own statement and will the the first significant token on the line.

The docstring detection code for module docstrings sees variable docstrings as module docstrings when only looking at a single logcial line in isolation. If it weren't for the edge case of inline docstrings, e.g. class Foo: """docstring""" we could drop the docstring detection code entirely and just look for lines that start with strings.

There's no need for full file awareness and the only reason to look at the previous line at all is to detect:

"""docstring"""
"""not a docstring"""

where each docstring is passed in as a separate logical line.

Frankly, I consider the second docstring in that example to still be a docstring, just one that will be ignored by pretty much all documentation processors. I'd rather flag it as a 'bad docstring continuation', but still consider it a docstring. e.g. there's no reason for it to have the multiline string style.

If you agree with that reasoning, we can drop the previous line handling entirely and just let them be docstrings, or use it to surface a different error about continuation docstrings.

This PR does in fact detect all the variable docstrings, as well as other non-standard docstrings, e.g.

if something:
    """non-standard docstring"""

as evidenced by the test suite. Look for places where I changed """not a docstring""" to """variable docstring""" or """non-standard docstring""".

Python still ignores those non-standard docstrings (as well as docstring continuations), they're just not considered part of the standard documentation.

plinss · 2020-06-05T18:43:52Z

Did some more research and found out that what I was referring to as 'continuation docstrings' are actually called 'additional docstrings' and are part of PEP258. While that PEP was rejected, it's still referred to in PEP257 which is still active, and 'additional docstrings' are explicitly mentioned along with 'attribute docstrings'.

So I strongly suggest we consider them docstrings and remove the previous line check entirely.

twolfson · 2020-06-06T04:57:11Z

Was just about to review/process replies but Internet has cut out again today. Hopefully will be able to go through this tomorrow or Sunday

…

On Fri, Jun 5, 2020, 11:44 AM Peter Linss ***@***.***> wrote: Did some more research and found out that what I was referring to as 'continuation docstrings' are actually called 'additional docstrings' and are part of PEP258 <https://www.python.org/dev/peps/pep-0258/#additional-docstrings>. While that PEP was rejected, it's still referred to in PEP257 <https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring> which is still active, and 'additional docstrings' are explicitly mentioned along with 'attribute docstrings'. So I strongly suggest we consider them docstrings and remove the previous line check entirely. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#101 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG4KWFTAMXLRJDMLEIQSUDRVE4HVANCNFSM4NS4URPA> .

twolfson · 2020-06-06T07:23:54Z

My internet is back, going to give reviewing a go

Updated the title to better reflect the intention of the PR since logical lines seems more auxiliary based on your comment

twolfson · 2020-06-06T07:28:45Z

Yea, I concur that our goal should be to be in alignment with active PEPs (it doesn't look like there's a finalized state)

twolfson

We should really break up this PR into 2:

Dropping flake8@2 support
Adding variable docstring support

The reasoning here is while taking the attempt at I saw a bunch of flake8@2 polyfills which were missed in removal =/

twolfson · 2020-06-06T07:32:15Z

flake8_quotes/__init__.py


+    def __iter__(self):


I tried to find a reference to this in the flake8 docs but I can't seem to. Where's this documented and can we add a comment linking to it?

I don't find it referenced in the flake8 docs. Frankly the plugin docs are somewhat lacking. I learned about how logical line plugins work by looking at other examples and reading the flake8 source.

twolfson · 2020-06-06T07:51:33Z

test/test_checks.py

 import subprocess
 from unittest import TestCase

+OUTPUT_REGEX = re.compile(r'.*?:([\d]+):([\d]+): (.*)')


I'd really prefer to use flake8 APIs instead of running the command directly =/ It slows down the tests and makes debugging weirder/harder (e.g. print statements are now in assert blocks instead of just stdout

Unfortunately, I took a shot but don't have time to dig to find a nice default options for FileChecker or FileProcessor =/

https://flake8.pycqa.org/en/3.8.2/internal/checker.html#processing-files

Also, it's good practice to avoid using .* when you can. Better to use not class identifiers to avoid accidentally greedy code:

Suggested change

OUTPUT_REGEX = re.compile(r'.*?:([\d]+):([\d]+): (.*)')

OUTPUT_REGEX = re.compile(r'[^:]+:([\d]+):([\d]+): (.*)')

Also can we add an example comment line before so someone can better understand what the regex is doing?

The .*? is a non-greedy match, that change doesn't hurt but is unnecessary as the variants are functionally equivalent. If it's a preferred style thing, have at it, it's your project.

Personally I don't see the value in spending time optimizing unit test code, it's not something users will be dealing with and effort spent there should be balanced against developer pain and the amount of development effort going in to the project.

plinss · 2020-06-06T16:04:00Z

Hi Todd,

I appreciate the effort you put into reviewing this PR, and the work you've done on this project already, but I'm not going to spend the effort breaking this PR up.

Also, when switching to logical_line handling, you already get the variable docstring handling for free, so they're not really distinct changes. You could in theory drop flake8@2 support first without switching to logical_line handling, but I don't see the value.

I had a need for better docstring handling and rather than just complain, I decided to take care of it myself. As a way of partly paying back the value I have already received from this plugin, I decided to contribute my changes. My needs have been met and I'm using my fork in my projects at this point.

It's your project, do what you want with it. You have my contribution. It's a gift, along with my thanks. Feel free to take my code and do what you want with it, or not. I won't be offended either way, and I don't have a need to be credited by having my commits land directly, so feel free to cherry-pick changes in your own way. I have too many other projects going on right now to spend any more time on this facet of this one. I am happy to continue to answer any questions you may have about my code.

FWIW, if you decide to accept my suggestion of treating additional docstrings as docstrings, you can simply remove all the previous_line handling code in the constructor, and in get_docstring_tokens remove the for token in prev_tokens loop and simply initialize state to STATE_EXPECT_MODULE_DOCSTRING and you should be set. It's all just additional code to preserve old behavior. You'll also have to update the tests, of course, to match the change in behavior.

twolfson · 2020-06-06T23:27:25Z

Unsure what you mean by "it's your project", open source with broad licensing is the public's project imo -- it just so happens that I'm a maintainer here. Using phrasing like "it's your project" seems like a way to sneak out of responsibility here and in future bug fixes for said relevant code ._.

I similarly don't have time to build out this enhancement fully, hence the "help wanted" tag on the original issue

Closing this PR and making a comment on the original issue for future contributors

Thanks for your time and contributions =)

plinss added 2 commits June 4, 2020 11:41

switch to comment based type annotations for variables to run on pyth…

b614331

…on 3.5

remove type annotations, restore python 3.4 support

da2602a

twolfson reviewed Jun 5, 2020

View reviewed changes

flake8_quotes/__about__.py Outdated Show resolved Hide resolved

flake8_quotes/__init__.py Outdated Show resolved Hide resolved

flake8_quotes/__init__.py Outdated Show resolved Hide resolved

address PR feedback

0abe576

twolfson changed the title ~~switch to processing logical_lines instead of entire files~~ Added variable docstring support Jun 6, 2020

twolfson reviewed Jun 6, 2020

View reviewed changes

twolfson closed this Jun 6, 2020

twolfson mentioned this pull request Jun 6, 2020

Support variable docstrings #100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added variable docstring support #101

Added variable docstring support #101

plinss commented Jun 4, 2020

plinss commented Jun 4, 2020

twolfson commented Jun 4, 2020 via email

plinss commented Jun 4, 2020 •

edited

twolfson left a comment

twolfson commented Jun 5, 2020

plinss commented Jun 5, 2020

plinss commented Jun 5, 2020

twolfson commented Jun 6, 2020 via email

twolfson commented Jun 6, 2020

twolfson commented Jun 6, 2020

twolfson left a comment

twolfson Jun 6, 2020

plinss Jun 6, 2020

twolfson Jun 6, 2020

twolfson Jun 6, 2020

plinss Jun 6, 2020

plinss commented Jun 6, 2020

twolfson commented Jun 6, 2020

	OUTPUT_REGEX = re.compile(r'.?:([\d]+):([\d]+): (.)')
	OUTPUT_REGEX = re.compile(r'[^:]+:([\d]+):([\d]+): (.*)')

Added variable docstring support #101

Added variable docstring support #101

Conversation

plinss commented Jun 4, 2020

plinss commented Jun 4, 2020

twolfson commented Jun 4, 2020 via email

plinss commented Jun 4, 2020 • edited

twolfson left a comment

Choose a reason for hiding this comment

twolfson commented Jun 5, 2020

plinss commented Jun 5, 2020

plinss commented Jun 5, 2020

twolfson commented Jun 6, 2020 via email

twolfson commented Jun 6, 2020

twolfson commented Jun 6, 2020

twolfson left a comment

Choose a reason for hiding this comment

twolfson Jun 6, 2020

Choose a reason for hiding this comment

plinss Jun 6, 2020

Choose a reason for hiding this comment

twolfson Jun 6, 2020

Choose a reason for hiding this comment

twolfson Jun 6, 2020

Choose a reason for hiding this comment

plinss Jun 6, 2020

Choose a reason for hiding this comment

plinss commented Jun 6, 2020

twolfson commented Jun 6, 2020

plinss commented Jun 4, 2020 •

edited