Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added variable docstring support #101

Closed
wants to merge 4 commits into from

Conversation

plinss
Copy link

@plinss plinss commented Jun 4, 2020

logical lines that begin with a string and don't follow other docstrings are now considered to be docstrings
fixes #100
requires flake8>=3.0
drop support for python 2.7 and 3.4
added some type assertions
update tests for new docstring rules and to call flake8 directly

logical lines that begin with a string and don't follow other docstrings are now considered to be docstrings
fixes zheller#100
requires flake8>=3.0
drop support for python 2.7 and 3.4
added some type assertions
update tests for new docstring rules and to call flake8 directly
@plinss
Copy link
Author

plinss commented Jun 4, 2020

I wasn't able to remove the old docstring logic completely, it's still needed to detect inline docstrings. But a lot of other code was able to go away.

Given that we now rely on flake8 to determine what a logical line is, the testing code had to change to call flake8 rather than the checker directly. The checker now also returns 1 based column numbers (using the line and column reported by flake8s tokenizer).

Now any logical line with a leading string is considered a docstring, unless if follows another docstring. I did the previous line check to match previous behavior, but I'm not sure if I should have.

A second docstring is not processed as a docstring (AFAICT), but it's still logically a docstring since it's ignored by the python interpreter, so it probably shouldn't be considered a multiline string either as it is now.

I'm thinking to add another error code for duplicate docstrings, but still consider them docstrings. Though checking for duplicate docstrings seems out of scope for this module... I welcome your feedback on that point.

If this PR works for you, I'll add code to fix #82 shortly. I know it's a significant change.

I also added type annotations (mostly) which required dropping support for Python 3.4 as well. This should still work with 3.4 if we just strip the type annotations or switch them all to comment style.

@twolfson
Copy link
Collaborator

twolfson commented Jun 4, 2020 via email

@plinss
Copy link
Author

plinss commented Jun 4, 2020

No problem on the timing, it's a significant change in behavior, take your time. I can run on my fork in the interim (or if you decide you don't want to go this route, which is also totally fine).

The type annotations are minor (and just a habit at this point, I find they really do help eliminate bugs), I'm happy to remove them or move them to a separate PR. I don't mean to impose extra workflow or maintenance burden on you. They're really ignored by everything in Python 3.5+ unless you add specific type checks (like mypy or flake8 extensions to require and validate them). If I convert them to comments then they're really ignored by everything except the type check tooling, I just find the comment variant less easy to read and maintain than the annotation. Let me know, it's minutes of work for me I went ahead and removed them and re-added Python 3.4 support, you're right, let's keep that change separate.

Aside from that, the only changes not related to the parsing change are dropping Python 2 support; dropping old flake8 support, since it's now dependent on >=3.0 for the logical line processing (the change was simplifying the option processing); and dropping the noqa support, since flake8 handles that for us now too.

As I suspected the primary quote logic didn't change at all (except adjusting the return types to match what flake8 expects since it's calling the iterator directly now (I just noticed I forgot to return the type, I'll add that actually logical line plugins can only return a location and a string, so the existing code is correct on that front))

I did bump the major version as you mentioned due to the flake8 3.0 dependency, I'm not sure if a major version increment is warranted, but totally your call.

The bulk of the diff is the testing code, which was needed since we can't call directly into the checker without computing the logical lines, so we just run flake8 now and parse its output (no sense in trying to replicate that behavior and getting it wrong/out of sync over time). Aside from how the tests are run, the only changes are updating the column numbers, and syncing to the new docstring detection which turns some previously multiline strings into valid docstrings.

Copy link
Collaborator

@twolfson twolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing but getting a bit overwhelmed by all the changes still. As a sanity check, this is only moving from physical lines to logical lines and not yet handling variable docstrings. Is that correct?

flake8_quotes/__about__.py Outdated Show resolved Hide resolved
flake8_quotes/__init__.py Outdated Show resolved Hide resolved
flake8_quotes/__init__.py Outdated Show resolved Hide resolved
@twolfson
Copy link
Collaborator

twolfson commented Jun 5, 2020

Also, can you explain how logical lines make it easier to handle variable docstrings as opposed to physical lines?

My current take is that they're not too different and we usually need full file state awareness for the docstring parser to work (i.e. don't just care about previous line but all previous lines in file -- e.g. class statement could take a lot of line breaks to finish its closing parens)

@plinss
Copy link
Author

plinss commented Jun 5, 2020

Sure, when flake8 breaks the file up into logical lines, each complete statement is passed as one line (even when it spans multiple physical lines), so any docstring is it's own statement and will the the first significant token on the line.

The docstring detection code for module docstrings sees variable docstrings as module docstrings when only looking at a single logcial line in isolation. If it weren't for the edge case of inline docstrings, e.g. class Foo: """docstring""" we could drop the docstring detection code entirely and just look for lines that start with strings.

There's no need for full file awareness and the only reason to look at the previous line at all is to detect:

"""docstring"""
"""not a docstring"""

where each docstring is passed in as a separate logical line.

Frankly, I consider the second docstring in that example to still be a docstring, just one that will be ignored by pretty much all documentation processors. I'd rather flag it as a 'bad docstring continuation', but still consider it a docstring. e.g. there's no reason for it to have the multiline string style.

If you agree with that reasoning, we can drop the previous line handling entirely and just let them be docstrings, or use it to surface a different error about continuation docstrings.

This PR does in fact detect all the variable docstrings, as well as other non-standard docstrings, e.g.

if something:
    """non-standard docstring"""

as evidenced by the test suite. Look for places where I changed """not a docstring""" to """variable docstring""" or """non-standard docstring""".

Python still ignores those non-standard docstrings (as well as docstring continuations), they're just not considered part of the standard documentation.

@plinss
Copy link
Author

plinss commented Jun 5, 2020

Did some more research and found out that what I was referring to as 'continuation docstrings' are actually called 'additional docstrings' and are part of PEP258. While that PEP was rejected, it's still referred to in PEP257 which is still active, and 'additional docstrings' are explicitly mentioned along with 'attribute docstrings'.

So I strongly suggest we consider them docstrings and remove the previous line check entirely.

@twolfson
Copy link
Collaborator

twolfson commented Jun 6, 2020 via email

@twolfson twolfson changed the title switch to processing logical_lines instead of entire files Added variable docstring support Jun 6, 2020
@twolfson
Copy link
Collaborator

twolfson commented Jun 6, 2020

My internet is back, going to give reviewing a go

Updated the title to better reflect the intention of the PR since logical lines seems more auxiliary based on your comment

@twolfson
Copy link
Collaborator

twolfson commented Jun 6, 2020

Yea, I concur that our goal should be to be in alignment with active PEPs (it doesn't look like there's a finalized state)

Copy link
Collaborator

@twolfson twolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really break up this PR into 2:

  • Dropping flake8@2 support
  • Adding variable docstring support

The reasoning here is while taking the attempt at I saw a bunch of flake8@2 polyfills which were missed in removal =/


def __iter__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find a reference to this in the flake8 docs but I can't seem to. Where's this documented and can we add a comment linking to it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find it referenced in the flake8 docs. Frankly the plugin docs are somewhat lacking. I learned about how logical line plugins work by looking at other examples and reading the flake8 source.

import subprocess
from unittest import TestCase

OUTPUT_REGEX = re.compile(r'.*?:([\d]+):([\d]+): (.*)')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really prefer to use flake8 APIs instead of running the command directly =/ It slows down the tests and makes debugging weirder/harder (e.g. print statements are now in assert blocks instead of just stdout

Unfortunately, I took a shot but don't have time to dig to find a nice default options for FileChecker or FileProcessor =/

https://flake8.pycqa.org/en/3.8.2/internal/checker.html#processing-files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's good practice to avoid using .* when you can. Better to use not class identifiers to avoid accidentally greedy code:

Suggested change
OUTPUT_REGEX = re.compile(r'.*?:([\d]+):([\d]+): (.*)')
OUTPUT_REGEX = re.compile(r'[^:]+:([\d]+):([\d]+): (.*)')

Also can we add an example comment line before so someone can better understand what the regex is doing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .*? is a non-greedy match, that change doesn't hurt but is unnecessary as the variants are functionally equivalent. If it's a preferred style thing, have at it, it's your project.

Personally I don't see the value in spending time optimizing unit test code, it's not something users will be dealing with and effort spent there should be balanced against developer pain and the amount of development effort going in to the project.

@plinss
Copy link
Author

plinss commented Jun 6, 2020

Hi Todd,

I appreciate the effort you put into reviewing this PR, and the work you've done on this project already, but I'm not going to spend the effort breaking this PR up.

Also, when switching to logical_line handling, you already get the variable docstring handling for free, so they're not really distinct changes. You could in theory drop flake8@2 support first without switching to logical_line handling, but I don't see the value.

I had a need for better docstring handling and rather than just complain, I decided to take care of it myself. As a way of partly paying back the value I have already received from this plugin, I decided to contribute my changes. My needs have been met and I'm using my fork in my projects at this point.

It's your project, do what you want with it. You have my contribution. It's a gift, along with my thanks. Feel free to take my code and do what you want with it, or not. I won't be offended either way, and I don't have a need to be credited by having my commits land directly, so feel free to cherry-pick changes in your own way. I have too many other projects going on right now to spend any more time on this facet of this one. I am happy to continue to answer any questions you may have about my code.

FWIW, if you decide to accept my suggestion of treating additional docstrings as docstrings, you can simply remove all the previous_line handling code in the constructor, and in get_docstring_tokens remove the for token in prev_tokens loop and simply initialize state to STATE_EXPECT_MODULE_DOCSTRING and you should be set. It's all just additional code to preserve old behavior. You'll also have to update the tests, of course, to match the change in behavior.

@twolfson
Copy link
Collaborator

twolfson commented Jun 6, 2020

Unsure what you mean by "it's your project", open source with broad licensing is the public's project imo -- it just so happens that I'm a maintainer here. Using phrasing like "it's your project" seems like a way to sneak out of responsibility here and in future bug fixes for said relevant code ._.

I similarly don't have time to build out this enhancement fully, hence the "help wanted" tag on the original issue

Closing this PR and making a comment on the original issue for future contributors

Thanks for your time and contributions =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support variable docstrings
2 participants