Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing mypyc support pt. 2 #2431

Merged
merged 28 commits into from Nov 16, 2021
Merged

Implementing mypyc support pt. 2 #2431

merged 28 commits into from Nov 16, 2021

Conversation

ichard26
Copy link
Collaborator

@ichard26 ichard26 commented Aug 17, 2021

It's finally happening 馃帀, the initial work was done several years ago in GH-1009 by @ msullivan is now being followed through by yours truly. If this work doesn't find its way to PyPI eventually, I'll be disappointed 馃檪

Performance

I wrote an entire report which can be read here: https://gist.github.com/ichard26/b996ccf410422b44fcd80fb158e05b0d, but the TL;DR is:

  • Formatting with safety checks: 1.93x faster
  • Formatting without safety checks: 2.11x faster
  • Blib2to3 parsing: 1.81x faster
  • Import (import black): 1.16x faster

Windows numbers are to come later. Unfortunately I probably won't be able to get MacOS numbers (I own no MacOS running machine), but I'll see what I can do.

Stability

I've been running the test suite during the wheel build and as of latest of this branch it's all green. Though it should be noted that the arm64 and the arm portion of the universial2 wheels are completely untested since we don't have access to any M1 CI resources :(

Locally I've ran diff-shades (Jelle, it's basically mypy-primer but for black) against the CPython 3.8 manylinux wheel and no difference between interpreted and compiled Black was found (across the same project suite used by black-primer + blackbench 馃槈). I plan to also test Windows with diff-shades in the following days. I'll also get Cooper to test Intel MacOS + M1 MacOS for me, but I need to come up with better instructions before that.

Anyhow, please don't expect this to be fully tested. I plan to ask the wider community to test the experimental compiled wheels (and push any additional fixes) after this PR is merged.

Review notes

This PR contains a lot of things including QOL improvements, optimizations, compatibility changes, hacks, etc. While the diff isn't that painful, following the commit history should give you some context on why certain changes were made.

It should be noted that I plan to handle the merge of this PR myself since this project involves a lot of moving parts and I'd like to maintain control for the time being. Thanks!


Related issue: GH-366
Related repo: https://github.com/ichard26/black-mypyc-wheels

And if you're some passerby who really wants to try the experimental wheels, checkout the workflows here: https://github.com/ichard26/black-mypyc-wheels/actions.

Pretty much marking a few tests as straight incompatible or marking
certain functions with monkeypatching support.

Also address the lack of 3.8 or higher support in src/black/parsing.py
caused by my original changes.

Finally tweak the compile configuration.
Typing the ast3 / ast27 variables as Any breaks mypyc. More details in
the comment added in this commit.

Many thanks goes to Jelle for helping out, with their insight I was
able to find a workaround.
before .whl was 1.3M and now it is 1.2M
The `platform=linux` was causing mypy (and therefore mypyc) to believe
any `if sys.platform == "win32":` branch would never be taken. That led
to `Reached allegedly unreachable code!` crashes because of safety
checks mypyc adds.

There's not a strong reason for pinning the platform in `mypy.ini` so
with the agreement of Jelle, it's gone now!

I also disabled CliRunner's exception catching feature because while it
does lead to nicer test failures when an exception goes unhandled, it
also removes traceback information which makes debugging a pain.

P.S. You can blame Windows Update for the slowness of this bugfix :p
Unreachable code has either led to mypyc being unable to analyze or
compile Black, or has led to runtime failures (like the one the previous
commit fixed - although mypy wouldn't be able to catch it).
Hardcoding the targets isn't great since we will probably forget to
add paths to the list as new files are created.

Inspired from mypy's setup for mypyc.
- early binding; plus
- reordering of if checks to reduce work / hit the happy case more
  often; plus
- a few more tiny mypyc-specific tweaks
Honestly at this point, I was tired of optimizing and only attempted
basic and safe optimizations. I hope this actually makes a performance
impact :p
Mostly just dataclasses nonsense + more pain from the TOML bare CR
thing ... ugh
diff-shades depends on reformat_many being monkeypatchable so it can
calculate files and black.Mode without copying and pasting a bunch
of parsing, file discovery, and configuration code.

On the other hand, I've been working on blackbench and upcoming version
21.8a1 is now compatible with the driver convert changes.
@ichard26 ichard26 added C: performance Black is too slow. Or too fast. skip news Pull requests that don't need a changelog entry. labels Aug 17, 2021
@cooperlees
Copy link
Collaborator

Nice work. FWIW - I'm totally down for black-primer to be refactored to do what it does today as well as what diff-shades does to cut down on the number of tools we all use everywhere. I guess diff-shades like behavior could be more easily added to our CI too if we incorporated it into the black repo.

mypy.ini Outdated Show resolved Hide resolved
Copy link
Collaborator

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this! I'm really excited to see this PR.

src/black/brackets.py Outdated Show resolved Hide resolved
src/black/linegen.py Outdated Show resolved Hide resolved
src/black/parsing.py Show resolved Hide resolved
@@ -142,6 +152,11 @@ def normalize_string_prefix(s: str, remove_u_prefix: bool = False) -> str:
return f"{new_prefix}{match.group(2)}"


@lru_cache(maxsize=256)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really help? re.compile already caches things internally I believe.

Copy link
Collaborator Author

@ichard26 ichard26 Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the source and testing locally, I can confirm that both re and regex cache compiled patterns. Somehow I still observe a 8-9% speed up on that strings list literal file. Running this test script:

import time
from functools import lru_cache

import re
import regex

_cached_regex_compile = lru_cache(regex.compile)

# This is *roughly* how many times regex.compile was called formatting strings-list.py
loops = 410 
pattern = r"\s*\t+\s*(\S)"

re.compile(pattern)
regex.compile(pattern)

t0 = time.perf_counter()
for i in range(1, loops + 1):
    re.compile(pattern)
elapsed = time.perf_counter() - t0
print(f"{loops} cached compiles with re: {elapsed * 1000:.1f}ms")

t0 = time.perf_counter()
for i in range(1, loops + 1):
    regex.compile(pattern)
elapsed = time.perf_counter() - t0
print(f"{loops} cached compiles with regex: {elapsed * 1000:.1f}ms")

t0 = time.perf_counter()
for i in range(1, loops + 1):
    _cached_regex_compile(pattern)
elapsed = time.perf_counter() - t0
print(f"{loops} cached compiles with regex + lru_cache: {elapsed * 1000:.1f}ms")

I get:

  • 410 cached compiles with re: 0.4ms
  • 410 cached compiles with regex: 1.6ms
  • 410 cached compiles with regex + lru_cache: 0.1ms

So it seems like this is the source of the slight speedup, but oddly enough this only accounts for ~half of the 8-9% speedup I saw for fmt-strings-list. Maybe the fact this script only repeatedly compiles one regex is why the delta isn't 1:1. For real life cases this speedup is probably minimal so I wouldn't mind reverting this to be less confusing :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for the extra information. I looked at re.compile and it calls a helper function (_compile) and does an isinstance call before looking at the cache, and the cache key is a tuple type(pattern), pattern, flags. I suppose that adds up to more overhead than whatever lru_cache does internally.

- I don't actually have to go on a rant about tomli and bare CR it
  turns out.
- Subclassing KeyError isn't actually *that* important for one
  custom exception.
@ichard26 ichard26 marked this pull request as ready for review October 28, 2021 22:09
@ichard26
Copy link
Collaborator Author

@ichard26
Copy link
Collaborator Author

Still broken since the test suite is still dependent on CWD being something specific :(

https://github.com/ichard26/black-mypyc-wheels/runs/4040890632?check_suite_focus=true#step:5:811

gonna fix this later

nipunn1313 added a commit to nipunn1313/black that referenced this pull request Oct 29, 2021
@nipunn1313 nipunn1313 mentioned this pull request Oct 29, 2021
3 tasks
@ichard26
Copy link
Collaborator Author

Whoops this isn't PyPy compatible ATM, will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: performance Black is too slow. Or too fast. skip news Pull requests that don't need a changelog entry.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants