Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of pylintd #5401

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

matusvalo
Copy link
Collaborator

@matusvalo matusvalo commented Nov 25, 2021

Type of Changes

Type
πŸ› Bug fix
βœ“ ✨ New feature
πŸ”¨ Refactoring
πŸ“œ Docs

Description

This PR introduces basic implementation of pylintd - pylint deamon running in the background. It is similar implementation as mypyd [1] or blackd [2]. The implementation is highly inspired by blackd approach. pylintd is implemented as simple http server providing simple API for passing files for linting and returning back the result.

The speedup of mainly gained in following areas:

  1. avoiding python initialization overhead. But this is only gained when http client is not written in python
  2. avoiding pylint initialization overhead.
  3. reusing already cached AST trees of already parsed modules in https://github.com/PyCQA/astroid/blob/47e860f7d5c73d53de77e7c450535e709e5ef99e/astroid/manager.py#L78. This is most prominent speedup when repeating linting of file does not trigger parsing other modules since they remain in cache. Re-linting of the same file is usual use case for editors what is the target area of pylintd.

Examples:

Following examples shows comparison between running pylint and pylintd against 2 large files in pylint. The test was executed as follows:

  1. fresh instance of pylintd was executed
  2. pylint command was executed against the file
  3. pylintd was asked to lint the file
  4. pylintd was asked again to lint the file without restarting pylintd (this shows speedup when astroid manager cache is filled)
$ cat test.sh
curl -v -H "X-File-Name:$1" -H accept:application/X-pylint-parseable localhost:8000

$ wc -l pylint/checkers/classes.py
    2415 pylint/checkers/classes.py

$ time pylint pylint/checkers/classes.py

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m5.146s
user	0m4.857s
sys	0m0.210s

$ time bash test.sh pylint/checkers/classes.py          # 1st run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m3.920s
user	0m0.008s
sys	0m0.013s

$ time bash test.sh pylint/checkers/classes.py          # 2nd run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m1.053s
user	0m0.008s
sys	0m0.011s


wc -l pylint/checkers/typecheck.py
    2026 pylint/checkers/typecheck.py

$ time pylint pylint/checkers/typecheck.py

------------------------------------
Your code has been rated at 10.00/10


real	0m5.243s
user	0m4.837s
sys	0m0.222s

$ time bash test.sh pylint/checkers/typecheck.py       # 1st run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m4.171s
user	0m0.008s
sys	0m0.013s

$ time bash test.sh pylint/checkers/typecheck.py       # 2nd run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m1.000s
user	0m0.009s
sys	0m0.013s

[1] https://mypy.readthedocs.io/en/stable/mypy_daemon.html
[2] https://black.readthedocs.io/en/stable/usage_and_configuration/black_as_a_server.html

This PR can partially help closing #4814

@matusvalo
Copy link
Collaborator Author

Of course this PR is not finished. It is more as Proof of concept.

@coveralls
Copy link

coveralls commented Nov 25, 2021

Pull Request Test Coverage Report for Build 1505341505

  • 13 of 13 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.005%) to 93.51%

Totals Coverage Status
Change from base Build 1505246806: 0.005%
Covered Lines: 14006
Relevant Lines: 14978

πŸ’› - Coveralls

@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component High effort πŸ‹ Difficult solution or problem to solve performance labels Nov 25, 2021
@Pierre-Sassoulas
Copy link
Member

pylint-dev/astroid#792 might be a problem if we want to be able to have an efficient pylintd.

@matusvalo
Copy link
Collaborator Author

pylint-dev/astroid#792 might be a problem if we want to be able to have an efficient pylintd.

I agree but using separate service can enable users for some workarounds like restarting pylintd service on regular basis. (I know not ideal...)

@superbobry
Copy link
Contributor

@Pierre-Sassoulas our experience with running pylint as a daemon suggests that leaks definitely will be a problem. Given that any leaking node retains the full AST, so memory consumption get uncomfortably high very quickly when linting large files.

@matusvalo
Copy link
Collaborator Author

matusvalo commented Nov 26, 2021

I have a question. How we can proceed with this PR (except unittests and fixing the failed CI)? I am willing to continue with this PR but I would like to understand what is missing in order to finalise this PR.

I would like to keep it as simple as possible right now maybe it can be added as experimental feature. Afterwards I can create additional issues/PRs with further improvements of pylintd.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair I don't know how that would be used or how it would affect pylint's user. I'm not using blackd myself so I don't really know what to expect and how to use a pylint deamon. If needs some automated tests for sure, maybe discussion with potential users ?

It's probably aimed toward those with a big codebase that launch analysis on single file often ? Might be useful for pre-commit too as pre-commit launch multiple pylint in parallel so if we use pylintd we skip the startup time ? Does the pre-commit use case makes sense to you @matusvalo ?

pylint/lint/run.py Outdated Show resolved Hide resolved
if not args:
print(linter.help())
sys.exit(32)

def initialize_jobs(self, linter: PyLinter) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the function created here πŸ‘Œ

Comment on lines +146 to +150
if __name__ == "__main__":
try:
DaemonRun(sys.argv, 8000)
except KeyboardInterrupt:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if __name__ == "__main__":
try:
DaemonRun(sys.argv, 8000)
except KeyboardInterrupt:
pass

I guess it's for manual tests ?

@cdce8p
Copy link
Member

cdce8p commented Nov 27, 2021

I mentioned it in the issue already. I think we should not move forward with it, at this time. Maybe at some point we can pick it up again. #4814 (comment)

To be fair I don't know how that would be used or how it would affect pylint's user. I'm not using blackd myself so I don't really know what to expect and how to use a pylint deamon. If needs some automated tests for sure, maybe discussion with potential users ?

If the startup time isn't too slow, most users (read IDEs, ...) will simply choose the easiest path which is likely launching it direct. They have that implemented already for other tools, no need to change it.

Tbh aside from a few outliers I haven't seen a use case for it yet.

It's probably aimed toward those with a big codebase that launch analysis on single file often ? Might be useful for pre-commit too as pre-commit launch multiple pylint in parallel so if we use pylintd we skip the startup time ? Does the pre-commit use case makes sense to you @matusvalo ?

For pre-commit it would be much better to use pylints own multiprocessing.

@matusvalo
Copy link
Collaborator Author

I agree just partially . Calling pylintd using curl with empty cache gives ~24% speedup - 0m5.146s vs 0m3.920s which I consider significant - see my test above. Regarding usage of pylintd, editors can spawn pylintd and use it's HTTP API instead of calling pylint command.

On the other hand, I think we need also input from developers of editors/plugins whether it make sense or they would use such API.

@Pierre-Sassoulas
Copy link
Member

Regarding usage of pylintd, editors can spawn pylintd and use it's HTTP API instead of calling pylint command.
I think we need also input from developers of editors/plugins whether it make sense or they would use such API.

Seems like the potential user are plugin editor then. A 25% perf gain is huge imo. But we'll have to handle caching properly in astroid or it won't last long though. I think we could move this MR to 3.0 alpha version and communicate with pylint-pycharm and vscode about it in order to see what they need and how it should be implemented. Do you want to lead the communication and specification effort @matusvalo ?

@matusvalo
Copy link
Collaborator Author

Do you want to lead the communication and specification effort @matusvalo ?

Yes, I can take this activity on my shoulders. But unfortunately, I was playing more with the implementation and I found out that the leak is pretty serious (~1 GB allocated memory after 10 runs of linting the same file). Moreover it seems that even the performance decreases over time :-/. I suppose, we need to progress with pylint-dev/astroid#792 in order to have functional pylintd.

Btw. just side not around the usability of the pylintd - I just came across that blackd has already some plugins [1]. This is kind of usage of pylintd I imagine.

[1] https://plugins.jetbrains.com/plugin/14321-blackconnect

@cdce8p
Copy link
Member

cdce8p commented Dec 13, 2021

Do you want to lead the communication and specification effort @matusvalo ?

Yes, I can take this activity on my shoulders. But unfortunately, I was playing more with the implementation and I found out that the leak is pretty serious (~1 GB allocated memory after 10 runs of linting the same file). Moreover it seems that even the performance decreases over time :-/. I suppose, we need to progress with PyCQA/astroid#792 in order to have functional pylintd.

Not sure it's a good idea to start the conversation now. It's not yet clear if or when we can fix astroid. It also isn't a priority at the moment, IMO rightly so. There are other areas that need attention first.

If you never the less want to start the conversation, please make it clear that this is only experimental (at the moment) and still a long way of (maybe even years).

@cdce8p cdce8p marked this pull request as draft December 13, 2021 15:58
@matusvalo
Copy link
Collaborator Author

matusvalo commented Dec 13, 2021

Not sure it's a good idea to start the conversation now.

I agree. I have mentioned it in previous message. At least memory leak should be solved first.

@Pierre-Sassoulas Pierre-Sassoulas added the Blocked 🚧 Blocked by a particular issue label Dec 13, 2021
@jacobtylerwalls
Copy link
Member

Hey @matusvalo just mentioning it might be worth it to get a new benchmark on memory usage after pylint-dev/astroid#1521. If you use MANAGER.clear_cache() between runs, is the memory usage now more reasonable?

@jacobtylerwalls jacobtylerwalls removed the Blocked 🚧 Blocked by a particular issue label May 10, 2022
@matusvalo
Copy link
Collaborator Author

matusvalo commented Sep 5, 2022

@jacobtylerwalls unfortunately even after the fix (testing on main), there is a leak - consider following example:

from pylint.lint import Run
from pylint.lint import pylinter

while True:
    Run(["pylint/checkers/variables.py"], exit=False)
    pylinter.MANAGER.clear_cache()

Following script after ~1 minute allocates around 1 GB of memory and keep raising:

image

Moreover, it removes the performance benefit of avoiding rescanning cached files again and again - hence it yields no or zero perf. improvement.

@jacobtylerwalls
Copy link
Member

What if you explicitly call garbage collection?

@matusvalo
Copy link
Collaborator Author

Unfortunately running gc does not help:

from pylint.lint import Run
from pylint.lint import pylinter
import gc

while True:
    Run(["pylint/checkers/variables.py"], exit=False)
    gc.collect()
    pylinter.MANAGER.clear_cache()

image

@jacobtylerwalls
Copy link
Member

The memory leaks in pylint-dev/astroid#792 have been solved (except for some very small ones, perhaps), but because it relies on clearing the cache to do so, that kind of violates the spirit of this PR, which is to keep the cache.

We probably want something like caching the astroid trees for third-party modules to disk, or having another separate cache for them. We could do that by checking if the third-party module defines a version constant.

@DanielNoord
Copy link
Collaborator

The memory leaks in pylint-dev/astroid#792 have been solved (except for some very small ones, perhaps), but because it relies on clearing the cache to do so, that kind of violates the spirit of this PR, which is to keep the cache.

We probably want something like caching the astroid trees for third-party modules to disk, or having another separate cache for them. We could do that by checking if the third-party module defines a version constant.

Do we know how mypy does this? Because I feel they also know when you have modified third party code yourself, which I think would be a requirement for a good catch miss/hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component High effort πŸ‹ Difficult solution or problem to solve performance Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants