Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance suggestion: do not run unselected plugins/checks #751

Open
asottile opened this issue Apr 3, 2021 · 4 comments
Open

Performance suggestion: do not run unselected plugins/checks #751

asottile opened this issue Apr 3, 2021 · 4 comments

Comments

@asottile
Copy link
Member

asottile commented Apr 3, 2021

In GitLab by @hugovk on Jun 5, 2020, 01:45

Please read this brief portion of documentation before going any further: http://flake8.pycqa.org/en/latest/internal/contributing.html#filing-a-bug

Please describe how you installed Flake8

$ pip install -U flake8
$ brew install flake8
# etc.

Please provide the exact, unmodified output of flake8 --bug-report

{
  "dependencies": [],
  "platform": {
    "python_implementation": "CPython",
    "python_version": "3.8.3",
    "system": "Darwin"
  },
  "plugins": [
    {
      "is_local": false,
      "plugin": "flake8_2020",
      "version": "1.6.0"
    },
    {
      "is_local": false,
      "plugin": "mccabe",
      "version": "0.6.1"
    },
    {
      "is_local": false,
      "plugin": "pycodestyle",
      "version": "2.6.0"
    },
    {
      "is_local": false,
      "plugin": "pyflakes",
      "version": "2.2.0"
    }
  ],
  "version": "3.8.2"
}

Please describe the problem or feature

I noticed that Flake8 takes the same time to run with --select as without. As shown using -vv verbosity, it runs all the plugins and checks regardless of --select, and only reports the selected ones afterwards.

Flake8 can sometimes take a long time to run on large codebases, and if it was possible to only run the selected checks, that would save a lot of time, CPU and power.

Would it be possible to only run selected checks/plugins? Rather than running them anyway and discarding that work when reporting?


Docs

For reference, my emphasis.

flake8 --help says --select is for which ones to enable:

  --select errors       Comma-separated list of errors and warnings to enable. For example, ``--select=E4,E51,W234``.
                        (Default: ['E', 'F', 'W', 'C90'])

The docs are a bit more explicit:

Specify the list of error codes you wish Flake8 to report.

https://flake8.pycqa.org/en/latest/user/options.html#cmdoption-flake8-select


Example

An example running on the TensorFlow codebase:

$ time flake8
...
flake8  323.91s user 4.31s system 98% cpu 5:32.78 total
$ time flake8 --select YTT
...
flake8 --select YTT  318.62s user 3.80s system 99% cpu 5:25.51 total

Both about the same, around 5m20s.

With an ugly hack (I know this mixes plugin names with error codes, but it's just to get a rough idea, and there's other places to skip too):

diff --git a/src/flake8/checker.py b/src/flake8/checker.py
index d993cb9..9ed986d 100644
--- a/src/flake8/checker.py
+++ b/src/flake8/checker.py
@@ -486,6 +486,8 @@ class FileChecker(object):
             return

         for plugin in self.checks["ast_plugins"]:
+            if plugin["name"] != "YTT":
+                continue
             checker = self.run_check(plugin, tree=ast)
             # If the plugin uses a class, call the run method of it, otherwise
             # the call should return something iterable itself
$ time flake8 --select YTT
flake8 --select YTT  276.90s user 3.17s system 98% cpu 4:43.00 total

About 4m30s, nearly a minute and ~13% faster.

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @sigmavirus24 on Jun 5, 2020, 06:01

This would break our verbose output that tells people how many errors were
ignored and not reported. Also there are nuanced ways to ignore codes so
this isn't feasible to skip things. Some plugins register just a prefix and
we'd have no way of skipping a sub error code check, especially depending
on how the plug-in is written

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @sigmavirus24 on Jun 5, 2020, 15:59

Perhaps the better way to do this is to have a --disable-extensions option because relying on --select is too fraught

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @andersk on Feb 13, 2021, 12:57

pycodestyle can do this and save significant time. So surely Flake8 ought to be able to do it too, at least for some checks including the pycodestyle ones, when verbose output is not requested.

$ git clone https://github.com/zulip/zulip.git

$ cd zulip; rm setup.cfg

$ time pycodestyle -qq --count .
15849

real    0m22.806s
user    0m22.759s
sys     0m0.020s

$ time pycodestyle -qq --select=E265 --count .
4

real    0m9.721s
user    0m9.680s
sys     0m0.030s

$ time flake8 -j1 -qq --count .
15831

real    0m50.552s
user    0m50.281s
sys     0m0.213s

$ time flake8 -j1 -qq --select=E265 --count .
4

real    0m50.434s
user    0m50.177s
sys     0m0.195s

@chr1st1ank
Copy link

This is not only a performance optimization but also a stability improvement. What you don't run can't break. Flake8's plugin discovery can break a CI pipeline at any time when dependencies are updated, because some of the plugin libraries may change their behaviour or if something unexpected is in the importpath. If one can exactly specify which to run, this reduces the chance of such surprises.
Examples:

There may be better examples, these might partially be debatable, but the problem class definitely exists in the deep fires of Python's dependency hell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants