fix performance issue due to search in tokens #210

bagerard · 2021-05-20T08:59:53Z

Fixes #176

I looked into this and the issue was in the following block

comment_in_line = any(
    token_type == tokenize.COMMENT
    for token_type, _, _, _, _ in self._tokens)

self._tokens is actually containing the tokens from beginning of the file until the physical line yielded by flake8, this means that self._tokens is getting bigger and bigger every time flake8 invokes this plugin on the next lines, thus searching in self._tokens gets more and more expensive for large files as it's inefficient to search in large list by design (the large file is > 12000 lines so that makes a lot of tokens).

I switch the code to process the whole file instead of work line by line, that way it has to search for the comment token only once.

On the long python file attached in #176, this makes the processing time go from 30 sec to just 2 sec 🚀

sobolevn

Hi! Thanks a lot for your work!

Let's run the CI and check it out. 👍

tests/test_comments.py

flake8_eradicate.py

sobolevn

Looks like, you've accidentally deleted pyproject.toml

bagerard · 2021-05-21T08:52:48Z

omg... sorry for that. I'm using pip to install it locally, not poetry so I had to clean those files temporarily...
These manually triggered pipeline are really killing productivity, let's hope github finds another solution to address the cryptomining issues

bagerard · 2021-05-21T10:29:58Z

Alright, now it should be good. I've set it up with poetry locally.
I added an additional piece of code to skip the # -*- coding: utf-8 -*- line in file (so that it doesn't waste time processing the file if that is the only tokenize.COMMENT found in the file) and I added a test for that

bagerard · 2021-05-24T20:05:11Z

Hi, please re-run the pipeline when you have a chance ;)

sobolevn · 2021-05-24T20:17:50Z

Done!

codecov · 2021-05-24T20:18:27Z

Codecov Report

Merging #210 (5f6b875) into master (6a5aab0) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #210   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines           41        42    +1     
  Branches         5         7    +2     
=========================================
+ Hits            41        42    +1

Impacted Files	Coverage Δ
flake8_eradicate.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a5aab0...5f6b875. Read the comment docs.

flake8_eradicate.py

bagerard · 2021-05-25T20:09:27Z

Note that consuming the file_tokens from flake8 made it slightly less trivial/efficient to ignore the file encoding '# -*- coding:' comment so I ended up removing that optimization. It was not a major improvement and anyway the '# -*- coding:' thing is something from the past so it may be better to avoid polluting the code with that (e.g pyupgrade removes them by default https://github.com/asottile/pyupgrade#-coding--comment)

bagerard · 2021-06-02T09:04:54Z

Would you mind running the pipeline on it ?

bagerard · 2021-06-08T13:58:00Z

Let me know if there is anything else I should do or if it's fine as it is

sobolevn · 2021-06-08T15:10:46Z

@bagerard I will return to this PR sometime soon! Please, stay tuned 🙂

sobolevn · 2021-06-22T08:55:03Z

I finally got the time to work on this, sorry for the long wait!

bagerard · 2021-06-22T10:12:27Z

CHANGELOG.md

+
+### Features
+
+- Imrpoves performance on long files #210


typo here :)

Fixed! Thanks!

sobolevn · 2021-06-22T10:13:39Z

@bagerard thanks a lot for your work! 👍

fix performance issue due to search in tokens

9877ff5

sobolevn reviewed May 20, 2021

View reviewed changes

tests/test_comments.py Outdated Show resolved Hide resolved

fix flake8 warnings and review comment

6893743

sobolevn reviewed May 20, 2021

View reviewed changes

flake8_eradicate.py Outdated Show resolved Hide resolved

sobolevn reviewed May 21, 2021

View reviewed changes

bagerard force-pushed the fix_performance_issue branch 2 times, most recently from 77c34db to 40cd990 Compare May 21, 2021 08:51

bagerard force-pushed the fix_performance_issue branch from 40cd990 to 7e34ae4 Compare May 21, 2021 10:25

fix from review.

5a46fde

bagerard force-pushed the fix_performance_issue branch from 7e34ae4 to 5a46fde Compare May 21, 2021 10:28

more flake8 fix

4483ce2

sobolevn reviewed May 25, 2021

View reviewed changes

flake8_eradicate.py Outdated Show resolved Hide resolved

Use flake8 builtin file_tokens

6a3ccfc

bagerard force-pushed the fix_performance_issue branch from 50f30fa to 6a3ccfc Compare May 25, 2021 20:05

Refactoring

5f6b875

bagerard commented Jun 22, 2021

View reviewed changes

sobolevn merged commit b073917 into wemake-services:master Jun 22, 2021

bagerard deleted the fix_performance_issue branch June 22, 2021 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix performance issue due to search in tokens #210

fix performance issue due to search in tokens #210

bagerard commented May 20, 2021 •

edited

sobolevn left a comment

sobolevn left a comment

bagerard commented May 21, 2021

bagerard commented May 21, 2021

bagerard commented May 24, 2021

sobolevn commented May 24, 2021

codecov bot commented May 24, 2021 •

edited

bagerard commented May 25, 2021

bagerard commented Jun 2, 2021

bagerard commented Jun 8, 2021

sobolevn commented Jun 8, 2021

sobolevn commented Jun 22, 2021

bagerard Jun 22, 2021

sobolevn Jun 22, 2021

sobolevn commented Jun 22, 2021

fix performance issue due to search in tokens #210

fix performance issue due to search in tokens #210

Conversation

bagerard commented May 20, 2021 • edited

sobolevn left a comment

Choose a reason for hiding this comment

sobolevn left a comment

Choose a reason for hiding this comment

bagerard commented May 21, 2021

bagerard commented May 21, 2021

bagerard commented May 24, 2021

sobolevn commented May 24, 2021

codecov bot commented May 24, 2021 • edited

Codecov Report

bagerard commented May 25, 2021

bagerard commented Jun 2, 2021

bagerard commented Jun 8, 2021

sobolevn commented Jun 8, 2021

sobolevn commented Jun 22, 2021

bagerard Jun 22, 2021

Choose a reason for hiding this comment

sobolevn Jun 22, 2021

Choose a reason for hiding this comment

sobolevn commented Jun 22, 2021

bagerard commented May 20, 2021 •

edited

codecov bot commented May 24, 2021 •

edited