Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly speedup ESP on large expressions that contain many strings #3467

Merged
merged 8 commits into from Dec 23, 2022

Conversation

yilei
Copy link
Contributor

@yilei yilei commented Dec 20, 2022

Description

Previously, when ESP 1) merges strings in StringMerger; 2) strips parens in StringParenStripper, it finds and does one transform for one string group. Each transform will create a new line with 1) one group of strings merged; 2) one group of strings' parens stripped. This new line is then re-checked by those transformers. This is O(n^2) complexity.

Since these transformers won't cause line breaks, the same transforms can be done at one pass, and it only creates one new line. The new approach is O(n).

Tested on the example from #3340 (not compiled, since I'm failing to use cibuildwheel to compile with mypyc on my machines):

  1. Before this change, it took ~111 seconds to finish after I increased the recursionlimit to 1M
  2. After this change, it took ~3.7s
  3. In the stable style without ESP, this file takes ~2.5s. The increase is now sane.

Fixes #3340, since this no longer triggers recursion limit errors.

Hopefully this also solves #2314

Checklist - did you ...

  • Add an entry in CHANGES.md if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation?

@github-actions
Copy link

github-actions bot commented Dec 20, 2022

diff-shades reports zero changes comparing this PR (a120a1d) to main (a44dc3d).


What is this? | Workflow run | diff-shades documentation

Copy link
Collaborator

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is an exciting improvement! I'll need to study the code some more and see how it affects perf on my codebase where I previously saw terrible performance.

src/black/trans.py Outdated Show resolved Hide resolved
@yilei
Copy link
Contributor Author

yilei commented Dec 20, 2022

I ran this over our codebase, and fixed a bug.

This is the outer loop, previously string_merge and string_paren_wrap are generating one transformed new line per string group in the large expression. With this PR, they now generate a single new line for all the transformed string groups.

@JelleZijlstra
Copy link
Collaborator

Thanks for the report!

Still seeing somewhat bad performance on an internal file with a large expression (nested calls, dictionaries, lots of strings). I think I reported #2314 based on a similar file. On this PR branch:

$ git checkout .; rm -rf ~/.cache/black/
Updated 57 paths from the index
$ time black --preview path/to/file.py
reformatted path/to/file.py

All done! ✨ 🍰 ✨
1 file reformatted.

real    1m33.226s
user    1m32.556s
sys     0m0.468s
$ git checkout .; rm -rf ~/.cache/black/
Updated 1 path from the index
$ time black path/to/file.py
All done! ✨ 🍰 ✨
1 file left unchanged.

real    0m19.135s
user    0m18.740s
sys     0m0.176s
$ black --version
black, 22.12.1.dev27+ga120a1d (compiled: no)
Python (CPython) 3.9.14

However, on 22.12.0 (compiled), the same file in preview mode takes:

real    8m7.575s
user    7m57.408s
sys     0m9.836s

@JelleZijlstra
Copy link
Collaborator

So this PR clearly makes things better, but the performance penalty on ESP is still big enough (probably; I haven't tried with mypyc) that we should have a conversation about whether the tradeoff is acceptable. That can wait though.

for i, leaf in enumerate(LL):
string_indices = []
idx = 0
while is_valid_index(idx):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best for another PR, but there may be an optimization opportunity here. is_valid_index(idx) is basically equivalent to idx < len(LL), but it does so through some indirection and a nested function. According to Jukka mypyc isn't good at compiling nested functions, so this code may get a good speedup if we just use idx < len(LL).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this (JelleZijlstra@9b1f0b1) but didn't see a significant difference under ESP, though I didn't do rigorous benchmarking.

@JelleZijlstra JelleZijlstra merged commit 3feff21 into psf:main Dec 23, 2022
hugovk pushed a commit to hugovk/black that referenced this pull request Jan 16, 2023
@yilei yilei deleted the recursiondepth branch January 24, 2023 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

maximum recursion depth exceeded while calling a Python object
2 participants