Significantly speedup ESP on large expressions that contain many strings #3467

yilei · 2022-12-20T20:06:10Z

Description

Previously, when ESP 1) merges strings in StringMerger; 2) strips parens in StringParenStripper, it finds and does one transform for one string group. Each transform will create a new line with 1) one group of strings merged; 2) one group of strings' parens stripped. This new line is then re-checked by those transformers. This is O(n^2) complexity.

Since these transformers won't cause line breaks, the same transforms can be done at one pass, and it only creates one new line. The new approach is O(n).

Tested on the example from #3340 (not compiled, since I'm failing to use cibuildwheel to compile with mypyc on my machines):

Before this change, it took ~111 seconds to finish after I increased the recursionlimit to 1M
After this change, it took ~3.7s
In the stable style without ESP, this file takes ~2.5s. The increase is now sane.

Fixes #3340, since this no longer triggers recursion limit errors.

Hopefully this also solves #2314

Checklist - did you ...

Add an entry in CHANGES.md if necessary?
Add / update tests if necessary?
Add new / update outdated documentation?

…ipper & StringMerge in one pass.

github-actions · 2022-12-20T20:33:59Z

diff-shades reports zero changes comparing this PR (a120a1d) to main (a44dc3d).

What is this? | Workflow run | diff-shades documentation

JelleZijlstra

Thanks, this is an exciting improvement! I'll need to study the code some more and see how it affects perf on my codebase where I previously saw terrible performance.

src/black/trans.py

…ens.

yilei · 2022-12-20T21:16:38Z

I ran this over our codebase, and fixed a bug.

This is the outer loop, previously string_merge and string_paren_wrap are generating one transformed new line per string group in the large expression. With this PR, they now generate a single new line for all the transformed string groups.

JelleZijlstra · 2022-12-23T19:58:43Z

Thanks for the report!

Still seeing somewhat bad performance on an internal file with a large expression (nested calls, dictionaries, lots of strings). I think I reported #2314 based on a similar file. On this PR branch:

$ git checkout .; rm -rf ~/.cache/black/
Updated 57 paths from the index
$ time black --preview path/to/file.py
reformatted path/to/file.py

All done! ✨ 🍰 ✨
1 file reformatted.

real    1m33.226s
user    1m32.556s
sys     0m0.468s
$ git checkout .; rm -rf ~/.cache/black/
Updated 1 path from the index
$ time black path/to/file.py
All done! ✨ 🍰 ✨
1 file left unchanged.

real    0m19.135s
user    0m18.740s
sys     0m0.176s
$ black --version
black, 22.12.1.dev27+ga120a1d (compiled: no)
Python (CPython) 3.9.14

However, on 22.12.0 (compiled), the same file in preview mode takes:

real    8m7.575s
user    7m57.408s
sys     0m9.836s

JelleZijlstra · 2022-12-23T20:07:27Z

So this PR clearly makes things better, but the performance penalty on ESP is still big enough (probably; I haven't tried with mypyc) that we should have a conversation about whether the tradeoff is acceptable. That can wait though.

JelleZijlstra · 2022-12-23T20:11:14Z

src/black/trans.py

-        for i, leaf in enumerate(LL):
+        string_indices = []
+        idx = 0
+        while is_valid_index(idx):


Probably best for another PR, but there may be an optimization opportunity here. is_valid_index(idx) is basically equivalent to idx < len(LL), but it does so through some indirection and a nested function. According to Jukka mypyc isn't good at compiling nested functions, so this code may get a good speedup if we just use idx < len(LL).

I tried this (JelleZijlstra@9b1f0b1) but didn't see a significant difference under ESP, though I didn't do rigorous benchmarking.

…ngs (psf#3467)

yilei added 3 commits December 20, 2022 10:46

Initial version to speed up ESP by performing multiple StringParenStr…

ba0c2de

…ipper & StringMerge in one pass.

Fix a bug where some leaves were dropped when stripping parens.

b509c2d

Update CHANGES.md

08d5c49

JelleZijlstra reviewed Dec 20, 2022

View reviewed changes

src/black/trans.py Outdated Show resolved Hide resolved

yilei added 3 commits December 20, 2022 13:08

Fix a bug where an outer string's parens enclose an inner string' par…

cb3c259

…ens.

Update wording per review.

b1d348e

Oops, format code.

8a848d8

yilei added 2 commits December 20, 2022 14:13

Merge branch 'main' into recursiondepth

23be9b3

Fix the merge bug.

a120a1d

JelleZijlstra approved these changes Dec 23, 2022

View reviewed changes

JelleZijlstra merged commit 3feff21 into psf:main Dec 23, 2022

JelleZijlstra mentioned this pull request Dec 23, 2022

string_processing: Performance regression #2314

Open

hugovk pushed a commit to hugovk/black that referenced this pull request Jan 16, 2023

Significantly speedup ESP on large expressions that contain many stri…

b45f234

…ngs (psf#3467)

yilei deleted the recursiondepth branch January 24, 2023 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly speedup ESP on large expressions that contain many strings #3467

Significantly speedup ESP on large expressions that contain many strings #3467

yilei commented Dec 20, 2022

github-actions bot commented Dec 20, 2022 •

edited

JelleZijlstra left a comment

yilei commented Dec 20, 2022

JelleZijlstra commented Dec 23, 2022

JelleZijlstra commented Dec 23, 2022

JelleZijlstra Dec 23, 2022

JelleZijlstra Dec 23, 2022

Significantly speedup ESP on large expressions that contain many strings #3467

Significantly speedup ESP on large expressions that contain many strings #3467

Conversation

yilei commented Dec 20, 2022

Description

Checklist - did you ...

github-actions bot commented Dec 20, 2022 • edited

JelleZijlstra left a comment

Choose a reason for hiding this comment

yilei commented Dec 20, 2022

JelleZijlstra commented Dec 23, 2022

JelleZijlstra commented Dec 23, 2022

JelleZijlstra Dec 23, 2022

Choose a reason for hiding this comment

JelleZijlstra Dec 23, 2022

Choose a reason for hiding this comment

github-actions bot commented Dec 20, 2022 •

edited