Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create/Find a repo that uses every python syntax + Add to Primer #2407

Open
cooperlees opened this issue Jul 28, 2021 · 7 comments
Open

Create/Find a repo that uses every python syntax + Add to Primer #2407

cooperlees opened this issue Jul 28, 2021 · 7 comments
Assignees
Labels
C: maintenance Related to project maintenance, e.g. CI, testing, policy changes, releases T: bug Something isn't working

Comments

@cooperlees
Copy link
Collaborator

It would be nice to have a repo cpython developers add in their new syntax sugar every time the latest betas add the new support. That way we can have black run over that code super early and let us know what we fail on with more time before it becomes main stream.

I wonder if we could try format just the cpython test suite .py files for this.

@cooperlees cooperlees added the T: bug Something isn't working label Jul 28, 2021
@cooperlees cooperlees self-assigned this Jul 28, 2021
@JelleZijlstra
Copy link
Collaborator

I think the cpython test suite is our best bet for this. I think there's a few files in there that have syntax errors on purpose so we need to be a little careful, but we can just exclude those files.

@cooperlees cooperlees added the C: maintenance Related to project maintenance, e.g. CI, testing, policy changes, releases label Aug 15, 2021
@cooperlees
Copy link
Collaborator Author

cooperlees commented Aug 15, 2021

So I spent a few hours getting most of cpython passing black. We have 16 files that fail with HEAD in main today. I also wanted to make sure it was not WAY to slow for CI. It seems reasonable, but lets see on a lot less CPU GitHub action container how long this takes.

Test Checkout

  • See how Long python takes with --depth 1 on my cable (GitHub actions should be faster)
time git clone --depth 1 https://github.com/python/cpython.git
Cloning into 'cpython'...

real	0m12.558s
user	0m1.547s
sys	0m0.829s
  • This is fine to add to CI ...

Test Runtimes

The following times were captured via testing out on my 8 Core Apple M1 16gb of ram laptop.

Run with esp:

time /tmp/tb/bin/black --experimental-string-processing --check . 2>&1 | tee /tmp/black_cpython_esp
...
Oh no! 💥 💔 💥
1859 files would be reformatted, 148 files would be left unchanged, 16 files would fail to reformat.

real	4m8.563s
user	16m21.735s
sys	0m6.000s

Run without string processing:

time /tmp/tb/bin/black --check . 2>&1 | tee /tmp/black_cpython
Oh no! 💥 💔 💥
1861 files would be reformatted, 149 files would be left unchanged, 13 files would fail to reformat.

real	4m11.308s
user	11m49.160s
sys	0m4.762s
  • See no real time difference here with and without --experimental-string-processing

Total Error Files: 16

  • All files paste: https://pastebin.com/AHPQG848
  • Without --experimental-string-processing we have only 13 files error
    --experimental-string-processing causes error on these 3 files:
    • Lib/test/test_base64.py
    • Lib/test/test_tokenize.py
    • Lib/test/test_xml_etree.py

PR coming with all failing files excluded for now. We will need to dig into why each file fails ...

cooperlees added a commit that referenced this issue Aug 15, 2021
- cpython tests is probably the best repo for black to test on as the stdlib's unittests should use all syntax
  - Limit to running in recent versions of the python runtime - e.g. today >= 3.9
    - This allows us to parse more syntax
- Exclude all failing files for now
  - Definately have bugs to explore there - Refer to #2407 for more details there
  - Some test files on purpose have syntax errors, so we will never be able to parse them
- Add new black command arguments logging in debug mode; very handy for seeing how CLI arguments are formatted

cython now succeeds ignoring 16 files:
```
Oh no! 💥 💔 💥
1859 files would be reformatted, 148 files would be left unchanged.
```

Testing
- Ran locally with and without string processing - Very little runtime difference BUT 3 more failed files
```
time /tmp/tb/bin/black --experimental-string-processing --check . 2>&1 | tee /tmp/black_cpython_esp
...
Oh no! 💥 💔 💥
1859 files would be reformatted, 148 files would be left unchanged, 16 files would fail to reformat.

real	4m8.563s
user	16m21.735s
sys	0m6.000s
```
- Add unittest for new covienence config file flattening that allows long arguments to be broke up into an array/list of strings

Addresses #2407
ichard26 added a commit that referenced this issue Aug 24, 2021
* Add CPython repository into primer runs

- CPython tests is probably the best repo for black to test on as the stdlib's unittests should use all syntax
  - Limit to running in recent versions of the python runtime - e.g. today >= 3.9
    - This allows us to parse more syntax
- Exclude all failing files for now
  - Definitely have bugs to explore there - Refer to #2407 for more details there
  - Some test files on purpose have syntax errors, so we will never be able to parse them
- Add new black command arguments logging in debug mode; very handy for seeing how CLI arguments are formatted

CPython now succeeds ignoring 16 files:
```
Oh no! 💥 💔 💥
1859 files would be reformatted, 148 files would be left unchanged.
```

Testing
- Ran locally with and without string processing - Very little runtime difference BUT 3 more failed files
```
time /tmp/tb/bin/black --experimental-string-processing --check . 2>&1 | tee /tmp/black_cpython_esp
...
Oh no! 💥 💔 💥
1859 files would be reformatted, 148 files would be left unchanged, 16 files would fail to reformat.

real	4m8.563s
user	16m21.735s
sys	0m6.000s
```
- Add unittest for new covienence config file flattening that allows long arguments to be broke up into an array/list of strings

Addresses #2407

---

Commit history before merge:

* Add new `timeout_seconds` support into primer.json
- If present, will set forked process limit to that value in seconds
- Otherwise, stay with default 10 minutes (600 seconds)

* Add new "base_path" concept to black-primer
- Rather than start at the repo root start at a configured path within the repository
  - e.g. for cpython only run black on `Lib`

* Disable by default - It's too much for GitHub Actions. But let's leave config for others to use
* Minor tweak to _flatten_cli_args

Co-authored-by: Richard Si <63936253+ichard26@users.noreply.github.com>
@ichard26
Copy link
Collaborator

I did some analysis on the crashing files, here's the summary:

Unavoidable crashes (5/16):

  • Lib/test/bad_coding.py: unknown encoding: uft-8
  • Lib/test/bad_coding2.py: encoding problem: utf-8
  • Lib/test/badsyntax_3131.py: Cannot parse: 2:0: € = 2
  • Lib/test/badsyntax_pep3120.py: invalid or missing encoding declaration
  • Tools/c-analyzer/c_parser/parser/_delim.py: cannot use --safe with this file; failed to parse source file. AST error message: f-string: empty expression not allowed (<unknown>, line 37)

... these all have syntax errors (majority on purpose) and we can't do anything about that ¯_(ツ)_/¯

Pre-existing issues (3/16):

  • Lib/test/test_grammar.py: Cannot parse: 1841:22: manager() as x
  • Lib/test/test_patma.py: Cannot parse: 30:14: match x:
  • Lib/traceback.py: Cannot parse: 26:24: print(item, file=file, end="") - this one is confusing but the error's true cause is the match statement at line 576. The odd error message is due to us trying the Python 2 grammar last and therefore we emit its SyntaxError ... this is basically black 19.10b0 errors on the wrong line #1263 but with blib2to3 instead of ast / typed-ast.

... these use unsupported syntax -- parenthesized with statements and pattern matching respectively

New issues (5/16):

  • Lib/test/test_named_expressions.py: Cannot parse: 331:21: element = a[b:=0] - looks like we're going to need to somehow backport python/cpython@cae6018 to blib2to3
  • Lib/test/test_exceptions.py: maximum recursion depth exceeded -- I minimized the erroring code to this madness, but this is still a bit scary
  • Lib/test/test_base64.py: INTERNAL ERROR: Black produced invalid code on pass 1: invalid syntax (<unknown>, line 596). -- looks to be an edge case where the splitting logic forgot to add a comma. Minimized to
    tests = {
        b"""@:E_WAS,RgBkhF"D/O92EH6,BF`qtRH$VbC6UX@47n?3D92&&T:Jand;c"""
            b"""Hat='/U/0JP==1c70M3&r-I,;<FN.OZ`-3]oSW/g+A(H[P""":
            b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234'
            b'56789!@#0^&*();:<>,. []{}',
        b'DJpY:@:Wn_DJ(RS': b'no padding..',
    }
  • Lib/test/test_tokenize.py: INTERNAL ERROR: Black produced code that is not equivalent to the source on pass 1. -- honestly not sure what's wrong here but here's it minimized to:
    class TestSuite:
        def test_tokenize(self):
            self.check_tokenize(r'"a\
    de\
    fg"', """\
        STRING     '"a\\\\\\nde\\\\\\nfg"\' (1, 0) (3, 3)
        """)
            self.check_tokenize(r'u"a\
    de"', """\
        STRING     'u"a\\\\\\nde"\'  (1, 0) (2, 3)
        """)
            self.check_tokenize(r'rb"a\
    d"', """\
        STRING     'rb"a\\\\\\nd"\'  (1, 0) (2, 2)
        """)
            self.check_tokenize(r'"""a\
    b"""', """\
        STRING     '\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
        """)
            self.check_tokenize(r'u"""a\
    b"""', """\
        STRING     'u\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
        """)
            self.check_tokenize(r'rb"""a\
    b\
    c"""', """\
        STRING     'rb"\""a\\\\\\nb\\\\\\nc"\""' (1, 0) (3, 4)
        """)
            self.check_tokenize(r'f"abc\
    def"', """\
        STRING     'f"abc\\\\\\ndef"' (1, 0) (2, 4)
        """)
            self.check_tokenize(r'Rf"abc\
    def"', """\
        STRING     'Rf"abc\\\\\\ndef"' (1, 0) (2, 4)
        """)
            self.check_tokenize(r'"""a\
    b"""', """\
        STRING     '\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
          """)
            self.check_tokenize(r'u"""a\
    b"""', """\
        STRING     'u\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
        """)
            self.check_tokenize(r'rb"""a\
    b\
    c"""', """\
        STRING     'rb"\""a\\\\\\nb\\\\\\nc"\""' (1, 0) (3, 4)
        """)
            self.check_tokenize(r'f"abc\
    def"', """\
        STRING     'f"abc\\\\\\ndef"' (1, 0) (2, 4)
        """)
            self.check_tokenize(r'Rf"abc\
    def"', """\
        STRING     'Rf"abc\\\\\\ndef"' (1, 0) (2, 4)
        """)
  • Lib/test/test_xml_etree.py: INTERNAL ERROR: Black produced invalid code on pass 1: f-string: unmatched '(' (<unknown>, line 4589) -- appears like the code meant to prevent splits within f-string expressions isn't quite right, minimized to:
    class C14NTest(unittest.TestCase):
        def test_xml_c14n2(self):
            for input_file, output_files in tests.items():
                for output_file, config in output_files:
                    config_descr = ','.join(
                        f"{name}={value or ','.join(c.tag.split('}')[-1] for c in children)}"
                        for name, (value, children) in sorted(config.items())
                    )

The "we shouldn't worry about it" crashes (3/16):

  • Lib/lib2to3/tests/data/different_encoding.py: cannot use --safe with this file; failed to parse source file. AST error message: Missing parentheses in call to 'print'. Did you mean print(u'ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')? (<unknown>, line 3)
  • Lib/lib2to3/tests/data/false_encoding.py: cannot use --safe with this file; failed to parse source file. AST error message: Missing parentheses in call to 'print'. Did you mean print('#coding=0')? (<unknown>, line 2)
  • Lib/lib2to3/tests/data/py2_test_grammar.py: cannot use --safe with this file; failed to parse source file. AST error message: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (<unknown>, line 31)

... these are just the result of not having the typed-ast dependency which is needed to parse Python two code :p

@cooperlees
Copy link
Collaborator Author

zOMG - Legendary stuff here. I'll try add comments next to the files we can't / shouldn't try to remove from the exceptions.

Then we'd love all the help we can making this pass on all other files.

@JelleZijlstra
Copy link
Collaborator

I tried to figure out what is going on with Lib/test/test_tokenize.py but couldn't reproduce it. I used the version of the file from current CPython main; both black 21.7b0 and black main format it fine.

@JelleZijlstra
Copy link
Collaborator

Also I think we should consider the fact that we crash on that function with 800 locals to be a feature.

@ichard26
Copy link
Collaborator

ichard26 commented Aug 26, 2021

Small update: black can now parse Lib/test/test_named_expressions.py thanks to #2447. I can't test locally that it format perfectly fine since I don't have 3.10 available locally (the safety checks' parsing support depend on the runtime version) but passing --fast it formats without crashing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: maintenance Related to project maintenance, e.g. CI, testing, policy changes, releases T: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants