Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest black changes ASTs #2150

Closed
godlygeek opened this issue Apr 26, 2021 · 37 comments
Closed

Latest black changes ASTs #2150

godlygeek opened this issue Apr 26, 2021 · 37 comments
Labels
F: docstrings How we format docstrings R: rejected This will not be worked on T: style What do we want Blackened code to look like?

Comments

@godlygeek
Copy link
Contributor

godlygeek commented Apr 26, 2021

Describe the bug

The README says:

Also, as a temporary safety measure, Black will check that the reformatted code still produces a valid AST that is equivalent to the original. This slows it down. If you're feeling confident, use --fast.

But after #1740, docstrings beginning with a double quote have an extra space prepended to them. This produces a different AST than the file had before, but --safe doesn't complain about this.

Is Black no longer promising to produce identical ASTs?

To Reproduce

$ cat test.py
def foo():
    '''"something" is what this does.'''


print(foo.__doc__)
$ python test.py
"something" is what this does.
$ black --safe test.py
reformatted test.py
All done! ✨ 🍰 ✨
1 file reformatted.
$ cat test.py
def foo():
    """ "something" is what this does."""


print(foo.__doc__)
$ python test.py
 "something" is what this does.
$

Expected behavior:

I believe that black is never supposed to change the AST, and therefore rewriting the docstring to contain this extra space is not a valid transformation for black to perform.

Environment:

  • Version: black, version 21.4b0
  • OS and Python version: Linux, Python 3.8.8

Does this bug also happen on master?

Yes

Additional context

$ python -m ast test.py >ast1.out
$ black test.py
reformatted test.py
All done! ✨ 🍰 ✨
1 file reformatted.
$ python -m ast test.py >ast2.out
$ diff -U0 ast1.out ast2.out
--- ast1.out    2021-04-26 19:05:58.174110800 -0400
+++ ast2.out    2021-04-26 19:08:57.324110800 -0400
@@ -13 +13 @@
-               value=Constant(value='"something" is what this does.'))],
+               value=Constant(value=' "something" is what this does.'))],
@godlygeek godlygeek added the T: bug Something isn't working label Apr 26, 2021
@JelleZijlstra
Copy link
Collaborator

Black does change the AST in a few places. This was so even before the most recent release; the previous release already made some changes to docstrings. We should update that sentence in the documentation.

@godlygeek
Copy link
Contributor Author

That's disappointing. The fact that I could trust Black to never change the behavior of a program was one of its greatest selling points, and made me feel comfortable applying Black indiscriminately without needing to check its work.

Now that I know it may change my program's behavior, I'll be much more cautious applying it to new projects.

@JelleZijlstra
Copy link
Collaborator

We do still have the safety check (in the _stringify_ast function), but it allows some changes in the AST. The most important one is that it allows changes in whitespace (only) within strings: before comparing strings it strips all whitespace from the beginning and end of each line in the string. There is another allowed change in del statements: del a, b and del (a, b) are equivalent but have different ASTs, and Black sometimes changes one to the other.

@godlygeek
Copy link
Contributor Author

Well, I'm not sure what to say beyond that I'm very dismayed to learn this. The fact that Black would prove that it did not change my program's behavior was an important - probably the most important - feature to me, and I'm disappointed to hear that it has been lost.

Tools like Sphinx read docstrings, and now I can't know whether running Black on my library has changed what Sphinx will generate without diffing and checking its work. That's a really serious loss. Better formatting is not a win if I can't trust that it didn't break anything in the process.

@JelleZijlstra
Copy link
Collaborator

That's a fair point! We at least should have updated the documentation, and I would consider adding an option to disable docstring formatting.

@ambv
Copy link
Collaborator

ambv commented Apr 27, 2021

Do you have a concrete problem with the changes to docstrings or are you here only to voice your disappointment and dismay?

The changes made to docstrings are pretty conservative and have to do with indentation and leading/trailing whitespace.
Re-aligning docstrings when re-indenting code was one of the most requested features of Black. How does this break Sphinx?

@godlygeek
Copy link
Contributor Author

Do you have a concrete problem with the changes to docstrings or are you here only to voice your disappointment and dismay?

I came here because I noticed that the new version of Black had made a change that resulted in a different AST, which as far as I could tell from the docs and from a PyCon lightning talk I remembered, it's never supposed to do.

I originally thought this was just a bug, but after finding the 21.4b0 changelog I saw that this was intentional, and @JelleZijlstra has clarified that this wasn't the first time - other AST changes have been allowed before, and --safe has been changed from checking whether the AST is the same to checking if it's similar.

When I learned that, I switched to expressing my disappointment and dismay at a feature I consider so important being quietly lost. I have been confident using and advocating for Black because I could trust that it would never change a module's behavior. This lets me apply it indiscriminately to any project without checking its work, since it would check for me that the AST was the same and therefore the behavior almost certainly will be as well. But above, I've got a 3 line program that exhibits different behavior before and after running Black on it. I find this scary.

How does this break Sphinx?

I didn't say that this would break Sphinx, but I said that it could change what Sphinx will generate. After some experimentation, it doesn't seem to - it seems as though Sphinx is stripping the leading whitespace before rendering the docstring to HTML or to LaTeX. I didn't know that going into this. I'm still not entirely convinced that there isn't some Sphinx plugin whose behavior could change because it sees the docstring in an intermediate state before the leading whitespace has been stripped, though.

@jaklan
Copy link

jaklan commented Apr 27, 2021

@godlygeek But above, I've got a 3 line program that exhibits different behavior before and after running Black on it. I find this scary. - as for now you haven't proved any different --> behaviour <--. If there's no case when additional leading whitespace changes anything - it means the behaviour is the same (even if AST not).

@godlygeek
Copy link
Contributor Author

as for now you haven't proved any different --> behaviour <--.

The user-observable behavior of that program is that something is printed to stdout. The thing that's printed to stdout is different after running Black.

@godlygeek
Copy link
Contributor Author

Another example of somewhere where this might cause issues is PLY, which has a parsing DSL that uses docstrings as regular expressions to match tokens - the addition of an extra leading space to a PLY token regex like:

def t_DOUBLE_QUOTED_STRING(t):
    '"[^"]*"'
    t.value = t.value[1:-2]
    return t

would absolutely change the program's behavior.

@jaklan
Copy link

jaklan commented Apr 27, 2021

@godlygeek I could argue about the first point, but definitely not about the latter. But actually... it's not the case, because this docstring is not reformatted. And playing a bit more - it seems to be even more strange:

For:

def t_DOUBLE_QUOTED_STRING(t):
    '"[^"]*"'
    t.value = t.value[1:-2]
    return t
    
def foo():
    '''"something" is what this does.'''
    pass
    
    
def foo2():
    ''' "something" '''
    pass
    
def foo3():
    """ 'something' is what this does."""
    pass

we get:

def t_DOUBLE_QUOTED_STRING(t):
    '"[^"]*"'
    t.value = t.value[1:-2]
    return t


def foo():
    """ "something" is what this does."""
    pass


def foo2():
    '''"something"'''
    pass


def foo3():
    """'something' is what this does."""
    pass

For foo2 and foo3 the implicit trailing whitespace is removed, and for foo2 quotes are also not normalised...
I can understand why it's removed - readability of """" vs """' (however it's arguable if the difference is worth of such a inconsistency), however the lack of quotes normalisation is surprising for me.

Playground with the code is here.

@jaklan
Copy link

jaklan commented Apr 27, 2021

@godlygeek to be precise - your example should look this way (in yours we don't have docstrings at all):

def t_LEFT_QUOTED_STRING(t):
    """"[^"]*"""
    t.value = t.value[1:-2]
    return t

and output:

def t_LEFT_QUOTED_STRING(t):
    """ "[^"]*"""
    t.value = t.value[1:-2]
    return t

@godlygeek
Copy link
Contributor Author

godlygeek commented Apr 27, 2021

in yours we don't have docstrings at all

Well, we do have a docstring, just not a triple-quoted string - but fair enough; I didn't realize that single quoted strings wouldn't be affected by this, but should have. Thanks for the correction!

@ichard26

This comment has been minimized.

@JelleZijlstra
Copy link
Collaborator

Let's keep this issue focused on the fact that we're changing the AST for docstrings. Feel free to open new issues for any unexpected behavior in the docstring formatting logic.

@ambv
Copy link
Collaborator

ambv commented Apr 27, 2021

For foo2 and foo3 the implicit trailing whitespace is removed, and for foo2 quotes are also not normalised...

There's a lot of bugs / edge cases with docstring handling so I'm guessing this is one of them :( Consistency is definitely lacking right now.

None of the examples above are bugs.

  1. t_DOUBLE_QUOTED_STRING doesn't get normalized to avoid backslash escapes.
  2. The space in foo() is needed to differentiate between the triple quote and its content which uses the same character.
  3. foo2() doesn't need a differentiating leading and trailing whitespace so it's removed.
  4. foo3() also doesn't need differentiating leading whitespace so it's removed.

Note that string normalization doesn't replace single strings with triple strings.

@godlygeek
Copy link
Contributor Author

Focusing back on the question of adding these leading/trailing spaces to docstrings, Black does change:

def t_QUOTED_STRING(t):
    """"[^"]*\""""
    t.value = t.value[1:-1]
    return t

to

def t_QUOTED_STRING(t):
    """ "[^"]*\" """
    t.value = t.value[1:-1]
    return t

Which, in the context of PLY, definitely does result in different behavior for the program.

Granted that case is contrived, but it's still an example of a place where modifying the docstring would change the behavior of a working program, and supports my argument that changes to docstrings are not necessarily safe.

@felix-hilden
Copy link
Collaborator

I'm somewhat sympathetic to @godlygeek despite the esoteric use case. I see why Black is not configurable. Given the fact that most of the time docstring modifications are not only safe but actually very welcome, and that Black doesn't really provide much configuration options, the existing behavior is surely here to stay.

That being said, I share the concern that modifying behavior even slightly is on a different level to strict formatting. This could be addressed at least in a few ways:

  • An "extra" safe mode to warn or fail on any AST changes being made - this could be a quick bandaid solution
  • A toggle for disabling docstring processing (and any other AST unsafe changes) - probably the better way to go

Again, having more configuration runs against Black's philosophy, but I think this could be a proper place to give in a bit, because it isn't really a formatting preference. Would this kind of non-default toggle suffice, Matt? Łukasz and others, would this be an acceptable choice for the library?

@godlygeek
Copy link
Contributor Author

I'd prefer if the unsafe stuff was opt-in, rather than needing to opt into extra safety - but I suspect I won't win that battle. So in the interest of a compromise that everyone can agree on, a way to opt out of changes that affect the AST seems like a reasonable middle ground.

@jaklan
Copy link

jaklan commented Apr 28, 2021

But then (AST strict mode) Black wouldn't be able to e.g. add the last comma when exploding a list (and as well remove it when the list can fit single line again + magic commas are turned off):

array = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]

Output:

array = [
    0,
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10,
    11,
    12,
    13,
    14,
    15,
    16,
    17,
    18,
    19,
    20,
    21,
    22,
]

So disabling all AST-unsafe operations in one flag just to avoid docstrings manipulation doesn't seem to be actually practical. From the other side, to just skip docstrings we would need an additional option - and as mentioned, case by case, we would get further and further away from the initial Black's philosophy.

foo2() doesn't need a differentiating leading and trailing whitespace so it's removed.

@ambv But why single quotes are not replaced with double quotes like in foo()?

@JelleZijlstra
Copy link
Collaborator

Changing the trailing comma does not change the AST.

@jaklan
Copy link

jaklan commented Apr 28, 2021

Oh right, of course it doesn't, stupid me. Sorry, no more comments before morning coffee ;)

@pradyunsg
Copy link
Contributor

pradyunsg commented Apr 28, 2021

Which, in the context of PLY, definitely does result in different behavior for the program.

No, it does not. From https://ply.readthedocs.io/en/latest/ply.html#specification-of-tokens (emphasis mine):

Internally, lex.py uses the re module to do its pattern matching. Patterns are compiled using the re.VERBOSE flag which can be used to help readability. However, be aware that unescaped whitespace is ignored and comments are allowed in this mode. If your pattern involves whitespace, make sure you use \s. If you need to match the # character, use [#].

@pradyunsg
Copy link
Contributor

pradyunsg commented Apr 28, 2021

While there's still been not been any use case pointed out where modifying trailing/leading spaces in the doc-string has resulted in a meaningful behaviour change... I do think it'd be nice if black formatted as:

-    """"[^"]*"""
+    '''"[^"]*'''

It's likely clearer this way, and doesn't require modifying the string in this case. It's edge-casey, but it'd be nice. In a similar vein, it'd be nice to have the following changed to the above style (note the addition space getting trimmed):

-    """ "[^"]*"""
+    '''"[^"]*'''

This might be a separate request though.

@JelleZijlstra
Copy link
Collaborator

#2168 gives a concrete example where changing docstrings is problematic.

@pradyunsg
Copy link
Contributor

Looks like the behaviour of adding a space is the problematic part, in both issues.

@sbliven
Copy link

sbliven commented May 1, 2021

#2168 gives a concrete example where changing docstrings is problematic.

For context, sphinx/autodoc uses empty docstrings to disable documenting specific functions. Thus """"""" and """ """ are semantically different.

@ambv
Copy link
Collaborator

ambv commented May 4, 2021

What does Sphinx/autodoc do if there is no docstring at all?

@JelleZijlstra JelleZijlstra added the F: strings Related to our handling of strings label May 30, 2021
@JelleZijlstra
Copy link
Collaborator

Is there anything left to do here? We added documentation that better explains Black's behavior here and we made a formatting change to allow for empty docstrings (since people brought up concrete problems with those).

@felix-hilden
Copy link
Collaborator

I think practically speaking, there isn't. If we are worried about other possible use cases and the maintainers would "OK" having some sort of AST strict mode for docstrings (or even all code [likely not though]), that could be discussed. But I doubt it's worth the effort without concrete problems.

@Elvynzs
Copy link

Elvynzs commented Jun 1, 2021

When using pdoc3 (and maybe other similar tools for generating documentation from docstrings), a double trailing whitespace at the end of a line is used for soft breaks (markdown syntax). As a result I am a still using black 19.10 for now, as I did not find a way to preserve trailing whitespaces in docstrings with recent versions.

@onerandomusername
Copy link
Contributor

Hey, is still being considered? I would like to see an option to have black not change the ast at all, given that this can cause problems with automatic documentation and anything that uses docstrings. I'm of the opinion that there should be an extra safe ast mode.

@ghost
Copy link

ghost commented Sep 1, 2021 via email

@pradyunsg
Copy link
Contributor

@onerandomusername Could you elaborate on the problems?

#2168 fixed the only actual non-philosophical issue brought up in this discussion, so if you’re still facing issues, it’d be helpful if you could elaborate on what they are.

@felix-hilden
Copy link
Collaborator

Hi, we've discussed the docstring issue once more in #3087 and decided that it will keep happening. And we will not provide a toggle for disabling that behavior. #2168 fixed a concrete problem, and we'll be happy to consider any real-world issues. But I'll close this issue for housekeeping.

@felix-hilden felix-hilden added R: rejected This will not be worked on T: style What do we want Blackened code to look like? F: docstrings How we format docstrings and removed T: bug Something isn't working F: strings Related to our handling of strings labels Aug 29, 2022
@bryevdv
Copy link

bryevdv commented Mar 21, 2023

the only actual non-philosophical issue brought up in this discussion

Here is another: #144 (comment)

It seems Black isn't compatible with latest Docformatter (PyCQA/docformatter#94), because it removes the empty line after block docstrings. (Though also my personal preference would be to keep them)

Black had a golden opportunity to be the model of interoperability in this space. "black transformations will not [1] modify your AST" is an unambiguous, well-defined contract, that would have allowed every other formatting, testing, or docs tool to operate independently alongside it with the confidence only clear expectations can provide. Bummer.

[1] or even "will not, by default" or at the very least "will not, if you explicitly ask for AST-safety"

@kaddkaka
Copy link

As seen in #4276 linked above, I noticed that black breaks some ascii art I have in docstring. This is making me reluctant to adopt black. I wouldn't wanna read through all docstrings to find broken information. A configuration for this would be very valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: docstrings How we format docstrings R: rejected This will not be worked on T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests