Create/Find a repo that uses every python syntax + Add to Primer #2407

cooperlees · 2021-07-28T14:32:37Z

It would be nice to have a repo cpython developers add in their new syntax sugar every time the latest betas add the new support. That way we can have black run over that code super early and let us know what we fail on with more time before it becomes main stream.

I wonder if we could try format just the cpython test suite .py files for this.

JelleZijlstra · 2021-07-28T16:32:48Z

I think the cpython test suite is our best bet for this. I think there's a few files in there that have syntax errors on purpose so we need to be a little careful, but we can just exclude those files.

cooperlees · 2021-08-15T22:44:12Z

So I spent a few hours getting most of cpython passing black. We have 16 files that fail with HEAD in main today. I also wanted to make sure it was not WAY to slow for CI. It seems reasonable, but lets see on a lot less CPU GitHub action container how long this takes.

Test Checkout

See how Long python takes with --depth 1 on my cable (GitHub actions should be faster)

time git clone --depth 1 https://github.com/python/cpython.git
Cloning into 'cpython'...

real	0m12.558s
user	0m1.547s
sys	0m0.829s

This is fine to add to CI ...

Test Runtimes

The following times were captured via testing out on my 8 Core Apple M1 16gb of ram laptop.

Run with esp:

time /tmp/tb/bin/black --experimental-string-processing --check . 2>&1 | tee /tmp/black_cpython_esp
...
Oh no! 💥 💔 💥
1859 files would be reformatted, 148 files would be left unchanged, 16 files would fail to reformat.

real	4m8.563s
user	16m21.735s
sys	0m6.000s

Run without string processing:

time /tmp/tb/bin/black --check . 2>&1 | tee /tmp/black_cpython
Oh no! 💥 💔 💥
1861 files would be reformatted, 149 files would be left unchanged, 13 files would fail to reformat.

real	4m11.308s
user	11m49.160s
sys	0m4.762s

See no real time difference here with and without --experimental-string-processing

Total Error Files: 16

All files paste: https://pastebin.com/AHPQG848
Without --experimental-string-processing we have only 13 files error
--experimental-string-processing causes error on these 3 files:
- Lib/test/test_base64.py
- Lib/test/test_tokenize.py
- Lib/test/test_xml_etree.py

PR coming with all failing files excluded for now. We will need to dig into why each file fails ...

- cpython tests is probably the best repo for black to test on as the stdlib's unittests should use all syntax - Limit to running in recent versions of the python runtime - e.g. today >= 3.9 - This allows us to parse more syntax - Exclude all failing files for now - Definately have bugs to explore there - Refer to #2407 for more details there - Some test files on purpose have syntax errors, so we will never be able to parse them - Add new black command arguments logging in debug mode; very handy for seeing how CLI arguments are formatted cython now succeeds ignoring 16 files: ``` Oh no! 💥 💔 💥 1859 files would be reformatted, 148 files would be left unchanged. ``` Testing - Ran locally with and without string processing - Very little runtime difference BUT 3 more failed files ``` time /tmp/tb/bin/black --experimental-string-processing --check . 2>&1 | tee /tmp/black_cpython_esp ... Oh no! 💥 💔 💥 1859 files would be reformatted, 148 files would be left unchanged, 16 files would fail to reformat. real 4m8.563s user 16m21.735s sys 0m6.000s ``` - Add unittest for new covienence config file flattening that allows long arguments to be broke up into an array/list of strings Addresses #2407

* Add CPython repository into primer runs - CPython tests is probably the best repo for black to test on as the stdlib's unittests should use all syntax - Limit to running in recent versions of the python runtime - e.g. today >= 3.9 - This allows us to parse more syntax - Exclude all failing files for now - Definitely have bugs to explore there - Refer to #2407 for more details there - Some test files on purpose have syntax errors, so we will never be able to parse them - Add new black command arguments logging in debug mode; very handy for seeing how CLI arguments are formatted CPython now succeeds ignoring 16 files: ``` Oh no! 💥 💔 💥 1859 files would be reformatted, 148 files would be left unchanged. ``` Testing - Ran locally with and without string processing - Very little runtime difference BUT 3 more failed files ``` time /tmp/tb/bin/black --experimental-string-processing --check . 2>&1 | tee /tmp/black_cpython_esp ... Oh no! 💥 💔 💥 1859 files would be reformatted, 148 files would be left unchanged, 16 files would fail to reformat. real 4m8.563s user 16m21.735s sys 0m6.000s ``` - Add unittest for new covienence config file flattening that allows long arguments to be broke up into an array/list of strings Addresses #2407 --- Commit history before merge: * Add new `timeout_seconds` support into primer.json - If present, will set forked process limit to that value in seconds - Otherwise, stay with default 10 minutes (600 seconds) * Add new "base_path" concept to black-primer - Rather than start at the repo root start at a configured path within the repository - e.g. for cpython only run black on `Lib` * Disable by default - It's too much for GitHub Actions. But let's leave config for others to use * Minor tweak to _flatten_cli_args Co-authored-by: Richard Si <63936253+ichard26@users.noreply.github.com>

ichard26 · 2021-08-26T03:17:27Z

I did some analysis on the crashing files, here's the summary:

Unavoidable crashes (5/16):

Lib/test/bad_coding.py: unknown encoding: uft-8
Lib/test/bad_coding2.py: encoding problem: utf-8
Lib/test/badsyntax_3131.py: Cannot parse: 2:0: € = 2
Lib/test/badsyntax_pep3120.py: invalid or missing encoding declaration
Tools/c-analyzer/c_parser/parser/_delim.py: cannot use --safe with this file; failed to parse source file. AST error message: f-string: empty expression not allowed (<unknown>, line 37)

... these all have syntax errors (majority on purpose) and we can't do anything about that ¯_(ツ)_/¯

Pre-existing issues (3/16):

Lib/test/test_grammar.py: Cannot parse: 1841:22: manager() as x
Lib/test/test_patma.py: Cannot parse: 30:14: match x:
Lib/traceback.py: Cannot parse: 26:24: print(item, file=file, end="") - this one is confusing but the error's true cause is the match statement at line 576. The odd error message is due to us trying the Python 2 grammar last and therefore we emit its SyntaxError ... this is basically black 19.10b0 errors on the wrong line #1263 but with blib2to3 instead of ast / typed-ast.

... these use unsupported syntax -- parenthesized with statements and pattern matching respectively

New issues (5/16):

Lib/test/test_named_expressions.py: Cannot parse: 331:21: element = a[b:=0] - looks like we're going to need to somehow backport python/cpython@cae6018 to blib2to3
Lib/test/test_exceptions.py: maximum recursion depth exceeded -- I minimized the erroring code to this madness, but this is still a bit scary

Lib/test/test_base64.py: INTERNAL ERROR: Black produced invalid code on pass 1: invalid syntax (<unknown>, line 596). -- looks to be an edge case where the splitting logic forgot to add a comma. Minimized to

tests = {
    b"""@:E_WAS,RgBkhF"D/O92EH6,BF`qtRH$VbC6UX@47n?3D92&&T:Jand;c"""
        b"""Hat='/U/0JP==1c70M3&r-I,;<FN.OZ`-3]oSW/g+A(H[P""":
        b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234'
        b'56789!@#0^&*();:<>,. []{}',
    b'DJpY:@:Wn_DJ(RS': b'no padding..',
}

Lib/test/test_tokenize.py: INTERNAL ERROR: Black produced code that is not equivalent to the source on pass 1. -- honestly not sure what's wrong here but here's it minimized to:

class TestSuite:
    def test_tokenize(self):
        self.check_tokenize(r'"a\
de\
fg"', """\
    STRING     '"a\\\\\\nde\\\\\\nfg"\' (1, 0) (3, 3)
    """)
        self.check_tokenize(r'u"a\
de"', """\
    STRING     'u"a\\\\\\nde"\'  (1, 0) (2, 3)
    """)
        self.check_tokenize(r'rb"a\
d"', """\
    STRING     'rb"a\\\\\\nd"\'  (1, 0) (2, 2)
    """)
        self.check_tokenize(r'"""a\
b"""', """\
    STRING     '\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
    """)
        self.check_tokenize(r'u"""a\
b"""', """\
    STRING     'u\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
    """)
        self.check_tokenize(r'rb"""a\
b\
c"""', """\
    STRING     'rb"\""a\\\\\\nb\\\\\\nc"\""' (1, 0) (3, 4)
    """)
        self.check_tokenize(r'f"abc\
def"', """\
    STRING     'f"abc\\\\\\ndef"' (1, 0) (2, 4)
    """)
        self.check_tokenize(r'Rf"abc\
def"', """\
    STRING     'Rf"abc\\\\\\ndef"' (1, 0) (2, 4)
    """)
        self.check_tokenize(r'"""a\
b"""', """\
    STRING     '\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
      """)
        self.check_tokenize(r'u"""a\
b"""', """\
    STRING     'u\"\""a\\\\\\nb\"\""' (1, 0) (2, 4)
    """)
        self.check_tokenize(r'rb"""a\
b\
c"""', """\
    STRING     'rb"\""a\\\\\\nb\\\\\\nc"\""' (1, 0) (3, 4)
    """)
        self.check_tokenize(r'f"abc\
def"', """\
    STRING     'f"abc\\\\\\ndef"' (1, 0) (2, 4)
    """)
        self.check_tokenize(r'Rf"abc\
def"', """\
    STRING     'Rf"abc\\\\\\ndef"' (1, 0) (2, 4)
    """)

Lib/test/test_xml_etree.py: INTERNAL ERROR: Black produced invalid code on pass 1: f-string: unmatched '(' (<unknown>, line 4589)

-- appears like the code meant to prevent splits within f-string expressions isn't quite right, minimized to:

class C14NTest(unittest.TestCase):
    def test_xml_c14n2(self):
        for input_file, output_files in tests.items():
            for output_file, config in output_files:
                config_descr = ','.join(
                    f"{name}={value or ','.join(c.tag.split('}')[-1] for c in children)}"
                    for name, (value, children) in sorted(config.items())
                )

The "we shouldn't worry about it" crashes (3/16):

Lib/lib2to3/tests/data/different_encoding.py: cannot use --safe with this file; failed to parse source file. AST error message: Missing parentheses in call to 'print'. Did you mean print(u'ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')? (<unknown>, line 3)
Lib/lib2to3/tests/data/false_encoding.py: cannot use --safe with this file; failed to parse source file. AST error message: Missing parentheses in call to 'print'. Did you mean print('#coding=0')? (<unknown>, line 2)
Lib/lib2to3/tests/data/py2_test_grammar.py: cannot use --safe with this file; failed to parse source file. AST error message: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (<unknown>, line 31)

... these are just the result of not having the typed-ast dependency which is needed to parse Python two code :p

cooperlees · 2021-08-26T03:25:36Z

zOMG - Legendary stuff here. I'll try add comments next to the files we can't / shouldn't try to remove from the exceptions.

Then we'd love all the help we can making this pass on all other files.

JelleZijlstra · 2021-08-26T04:24:28Z

I tried to figure out what is going on with Lib/test/test_tokenize.py but couldn't reproduce it. I used the version of the file from current CPython main; both black 21.7b0 and black main format it fine.

JelleZijlstra · 2021-08-26T04:28:35Z

Also I think we should consider the fact that we crash on that function with 800 locals to be a feature.

ichard26 · 2021-08-26T21:02:27Z

Small update: black can now parse Lib/test/test_named_expressions.py thanks to #2447. I can't test locally that it format perfectly fine since I don't have 3.10 available locally (the safety checks' parsing support depend on the runtime version) but passing --fast it formats without crashing :)

cooperlees added the T: bug Something isn't working label Jul 28, 2021

cooperlees self-assigned this Jul 28, 2021

cooperlees added the C: maintenance Related to project maintenance, e.g. CI, testing, policy changes, releases label Aug 15, 2021

cooperlees mentioned this issue Aug 15, 2021

Add cpython Lib/ repository config into primer config - Disabled #2429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create/Find a repo that uses every python syntax + Add to Primer #2407

Create/Find a repo that uses every python syntax + Add to Primer #2407

cooperlees commented Jul 28, 2021

JelleZijlstra commented Jul 28, 2021

cooperlees commented Aug 15, 2021 •

edited

ichard26 commented Aug 26, 2021

cooperlees commented Aug 26, 2021

JelleZijlstra commented Aug 26, 2021

JelleZijlstra commented Aug 26, 2021

ichard26 commented Aug 26, 2021 •

edited

Create/Find a repo that uses every python syntax + Add to Primer #2407

Create/Find a repo that uses every python syntax + Add to Primer #2407

Comments

cooperlees commented Jul 28, 2021

JelleZijlstra commented Jul 28, 2021

cooperlees commented Aug 15, 2021 • edited

Test Checkout

Test Runtimes

ichard26 commented Aug 26, 2021

cooperlees commented Aug 26, 2021

JelleZijlstra commented Aug 26, 2021

JelleZijlstra commented Aug 26, 2021

ichard26 commented Aug 26, 2021 • edited

cooperlees commented Aug 15, 2021 •

edited

ichard26 commented Aug 26, 2021 •

edited