Add common caching class and global cache flushing function #1242

keichi · 2021-11-16T05:04:52Z

Description

This PR adds an LRU cache class astroid.cache.LRUCache and modifies the inference cache, inference tip cache, generator cache and @lru_cache to use this new cache. LRUCache is bounded by default and can also be manually flushed by calling astroid.cache.clear_caches() to address the memory leak issue discussed in #792.

Type of Changes

	Type
	🐛 Bug fix
✓	✨ New feature
✓	🔨 Refactoring
	📜 Docs

Related Issue

Closes #792

DanielNoord

I have not looked at everything yet and I am by no means a caching expert, but I'd like to make an effort to getting some of the stale astroid PRs merged.

These are just some things I was thinking about before this could be merged.

I also wonder if we should add any form of tests. We have LruCacheModelTest right now. Does that work in itself for this change?

ChangeLog

DanielNoord · 2021-12-21T13:18:40Z

astroid/cache.py

+
+import wrapt
+
+LRU_CACHE_CAPACITY = 128


Do we want to this to be set-able somehow? I can imagine instances where somebody would want to cache to be larger?

@DanielNoord I'm not sure what you mean by "set-able"; it is settable. Would you like to add a setter method? Or make it overridable by instances?

We probably need to document the variable and explains explicitely how to modify it (assign a new value to it, I suppose). We can add the option in pylint config later but astroid users will need the info.

Preferably we could change this setting via an option, but now that I think of it I don't think we really have a good way to do so.

I'll create a "create option-setting system" project after this gets merged and add this global to the "to be configurable" list.

astroid/cache.py

keichi · 2021-12-30T05:19:53Z

@DanielNoord Thank you for your review. I am addressing your comments one by one.

I also wonder if we should add any form of tests. We have LruCacheModelTest right now. Does that work in itself for this change?

Sure, I will add tests for LRUCache. LruCacheModelTest tests if @functools.lru_cache is inferred correctly so it does not (directly) test LRUCache.

Pierre-Sassoulas

This looks pretty good already, thank you @keichi !

astroid/cache.py

DanielNoord

Thanks @keichi! This is becoming a very nice addition.

The tests are also well written.

DanielNoord · 2021-12-30T08:33:33Z

astroid/cache.py

+
+import wrapt
+
+LRU_CACHE_CAPACITY = 128


Preferably we could change this setting via an option, but now that I think of it I don't think we really have a good way to do so.

I'll create a "create option-setting system" project after this gets merged and add this global to the "to be configurable" list.

DanielNoord · 2021-12-30T08:34:57Z

astroid/transforms.py

    def __init__(self):
        self.transforms = collections.defaultdict(list)

-    @lru_cache(maxsize=TRANSFORM_MAX_CACHE_SIZE)
+    @lru_cache_astroid


How does the 10000 from lru_cache translate to the 128 of LRUCache? Is it similar in size?

Nope, the current LRU_CACHE_CAPACITY is arbitrary. Should we change it to 10000?

I'm not sure.. 10000 seems a bit excessive, but the difference with 128 is so large that I wonder if this has unforeseen effects.

In principle, a smaller cache size means more recomputation so it could lower the performance. However when I actually measure how long the unit tests take, I don't see any slowdown:

main (d2a5b3c):
pytest 19.51s user 0.96s system 108% cpu 18.893 total

this branch:
pytest 17.93s user 0.82s system 109% cpu 17.171 total

So at least for the unit tests, I don't see any negative side effects.

The real problem will appear on a very large code base where the RAM won't be enough to cache everything. The tests are supposed to run fast so they run on small examples with a few nodes and they won't cause caching problem. As this can create a cache for every nodes, and because the number of nodes is close to infinite for large code base, I think we need to get a sense on what is the median weight of a node in the cache. Then probably limit the default value to a number of cache entry value corresponding to a low value of RAM used like 1Gb to be safe. (RAM is what is expensive on the cloud, so I suppose that by default jobs runs on low RAM machines).

I can parse a large project and find out how the distribution of the node size looks like. Are there any projects you usually use for benchmarking?

We have some project in pylint's test suite that we call primer tests. django is pretty huge for example. See https://pylint.pycqa.org/en/latest/development_guide/testing.html#primer-tests

astroid/cache.py

tests/unittest_cache.py

DanielNoord

Thanks @keichi

I think I have done all I can with respect to this PR. Sadly, I'm not proficient enough with caching and some of the intricacies of typing generics to confidently say that this is now 100% perfect though. Hopefully some more experienced developer finds time to look at this soon.. 😄

astroid/cache.py

Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com>

Pierre-Sassoulas · 2022-04-08T08:36:50Z

Hello @keichi, sorry for taking a long time to check this merge request. We currently have flake8-bugbear who (rightfully I think) warn us of potential cache issue on the main branch because we cache methods that will never be garbage collected. So merging this would solve two problems in one. I rebased the code on the latest main branch, but there's a recursion error in one of the test. Do you think you would have some time to check the issue in the near future ?

coveralls · 2022-04-10T11:59:04Z

Pull Request Test Coverage Report for Build 2143911660

75 of 77 (97.4%) changed or added relevant lines in 10 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.03%) to 91.516%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
astroid/cache.py	44	46	95.65%

Totals
Change from base Build 2111457493:	0.03%
Covered Lines:	9126
Relevant Lines:	9972

💛 - Coveralls

Pierre-Sassoulas

LGTM, thank you @keichi

ChangeLog

DanielNoord · 2022-04-11T08:55:46Z

I haven't completed my review yet, but I see that two added lines are uncovered. Is there a way to cover these? Or did we already decide not to do so?

keichi · 2022-04-11T13:02:46Z

@DanielNoord How can I locate those two lines? The Coverall page does not show the source code for some reason.

DanielNoord · 2022-04-11T13:08:49Z

@DanielNoord How can I locate those two lines? The Coverall page does not show the source code for some reason.

They are L70 and L83 in astroid/cache.py. Coveralls doesn't show source code when you aren't logged in to reduce API calls on their "own" account so they require you to login/create an account to show annotated source code.

jacobtylerwalls

Thanks for working on pylint/astroid's performance. It's really important. :)

I'm wondering what the motivation was for reimplementing the basic LRU cache functionality. If it was chiefly to have a central call to flush all caches, we could just have the lru_cache_astroid decorator do that one thing: collect a reference to every function using functools.lru_cache so they can all be flushed.

Speaking of, we have a documented method clear_cache() already, so I'm worried we're expanding the API for no benefit and causing confusion. Better for that to call for c in cached_methods: c.clear_cache(), maybe? Just one flusher?

The reimplementation looks clean, so it shouldn't be hard to maintain. Still, I'm thinking we should be able to avoid merging it in. Especially if someday there comes a C-version in the stdlib that we would get for free by staying on the functools one.

If I'm missing something, let me know!

👀 cache by cache

LookupMixin.lookup(): I checked again on django: even a maxsize of 128 still generates 79K hits (instead of 85K), so I'm good with 128. This was the only lru_cache that we left unbounded after Update pylint requirement from ~=2.13.4 to ~=2.13.5 #1503, so bounding it seems important.
TransformVisitor._transform(): we realized in Update pylint requirement from ~=2.13.4 to ~=2.13.5 #1503 that this was generating 0 hits in typical scenarios and removed caching. So let's be sure not to add it back :-)
_cache_normalize_path(): honestly this doesn't look very important, 128 is fine
ObjectModel.attributes: same
infer_functiondef() and _inference_tip_cached(). These two look like the ballgame. If we already had these factored out to use functools.lru_cache, then we could already be using cache_info() to reason about them. I think that's my final argument in favor of not reimplementing this -- we're kind of losing some tooling by losing cache_info().

Happy to hear any thoughts on these directions! And thanks again for pushing this forward, as I said, it's really important.

jacobtylerwalls · 2022-04-20T01:11:51Z

astroid/transforms.py

    def __init__(self):
        self.transforms = collections.defaultdict(list)

-    @lru_cache(maxsize=TRANSFORM_MAX_CACHE_SIZE)
+    @lru_cache_astroid


We discovered in #1503 (comment) that in normal runs of pylint/astroid, this cache is never even hit in a single CLI run, so we removed it. Maybe an IDE has a use case for caching here, though, I don't know.

Pierre-Sassoulas · 2022-04-20T06:03:16Z

Regarding the use of functool we discussed it here: #1242 (comment)

The main reason I couldn't reuse functool.lru_cache is because it hold the cache as a closure variable, and thus there is no way to clear it from outside.

Maybe it's not an issue if we choose a low cache size and it's enough for big project.

jacobtylerwalls · 2022-04-20T12:21:26Z

Oh, I was thinking some light refactors might be involved to make them all reachable. I'll try to put together a little proof of concept to see if I'm suggesting something impossible.

jacobtylerwalls · 2022-04-20T13:11:04Z

Wow, infer_functiondef() is only cached as part of a workaround for an issue very similar to #1490. With a workaround can remove caching completely and still pass that unitt test in 17a5ee6. We may eventually want a keyword argument set_parent_local or something if we have three places we have to avoid doing that.

jacobtylerwalls · 2022-05-06T14:03:16Z

Thanks for pushing this discussion forward. We decided to go with #1521 in order to hew more closely to the current documented way to clear caches. Always eager to hear more proposals for improving performance. Thanks again.

Pierre-Sassoulas added this to the 2.9.0 milestone Nov 16, 2021

Pierre-Sassoulas added Enhancement ✨ Improvement to a component Needs review 🔍 Needs to be reviewed by one or multiple more persons labels Nov 16, 2021

DanielNoord reviewed Dec 21, 2021

View reviewed changes

keichi force-pushed the flush-cache branch from f88b6b5 to 61829a1 Compare December 30, 2021 03:09

Pierre-Sassoulas reviewed Dec 30, 2021

View reviewed changes

astroid/cache.py Show resolved Hide resolved

DanielNoord requested changes Dec 30, 2021

View reviewed changes

DanielNoord reviewed Dec 30, 2021

View reviewed changes

astroid/cache.py Show resolved Hide resolved

astroid/cache.py Outdated Show resolved Hide resolved

astroid/cache.py Outdated Show resolved Hide resolved

Pierre-Sassoulas self-requested a review December 30, 2021 22:01

keichi force-pushed the flush-cache branch from b017615 to c91725b Compare January 4, 2022 09:34

keichi requested a review from DanielNoord January 4, 2022 10:53

DanielNoord mentioned this pull request Jan 13, 2022

Investigate use of lru_cache on methods #1344

Closed

cdce8p modified the milestones: 2.10.0, 2.11.0 Feb 27, 2022

Pierre-Sassoulas mentioned this pull request Mar 10, 2022

Running pylint twice on the same file while changing the file in between runs doesn't work pylint-dev/pylint#5888

Closed

DanielNoord mentioned this pull request Apr 7, 2022

Fix flake8 bugbear version and fix some flake8 issues #1504

Merged

keichi added 3 commits April 7, 2022 22:49

Wrap lru_cache and add API to flush all caches

1845071

Add LRUCache class

b62b157

Use common LRUCache class in @cached_generator

e16181d

Pierre-Sassoulas force-pushed the flush-cache branch from 069ddef to ba190f6 Compare April 7, 2022 21:12

keichi and others added 9 commits April 8, 2022 10:35

Add documentation and change log entry

672becc

Use @lru_cahe in modutils._cache_normalize_path()

d65cdc4

Ignore no-value-for-parameter error

8f8b992

Rename @lru_cache to @lru_cache_astroid to avoid confusion

780bf1f

Add typing to LRUCache

b334b46

WeakSet is not typed in Python < 3.9

27ddaea

Add unit tests for astorid.cache.LRUCache

fb0cdc2

Apply review comments

5c97b0f

Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com>

Add typing for @lru_cache_astroid and @cached_generator

8ceb116

keichi added 2 commits April 8, 2022 10:35

Move type variables to the top of the file

837efed

Fix typing errors

bc46064

Pierre-Sassoulas force-pushed the flush-cache branch from ba190f6 to bc46064 Compare April 8, 2022 08:36

Reintegrate with latest master

b4c760c

Pierre-Sassoulas approved these changes Apr 10, 2022

View reviewed changes

ChangeLog Outdated Show resolved Hide resolved

Update ChangeLog

7e35803

Pierre-Sassoulas mentioned this pull request Apr 14, 2022

Update pylint requirement from ~=2.13.4 to ~=2.13.5 #1503

Merged

jacobtylerwalls self-requested a review April 14, 2022 15:01

jacobtylerwalls reviewed Apr 20, 2022

View reviewed changes

jacobtylerwalls mentioned this pull request Apr 20, 2022

Let AstroidManager.clear_cache act on other caches #1521

Merged

jacobtylerwalls closed this May 6, 2022

jacobtylerwalls removed this from the 2.12.0 milestone May 6, 2022

jacobtylerwalls removed the Needs review 🔍 Needs to be reviewed by one or multiple more persons label May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add common caching class and global cache flushing function #1242

Add common caching class and global cache flushing function #1242

keichi commented Nov 16, 2021

DanielNoord left a comment

DanielNoord Dec 21, 2021

keichi Dec 30, 2021

Pierre-Sassoulas Dec 30, 2021

DanielNoord Dec 30, 2021

keichi commented Dec 30, 2021

Pierre-Sassoulas left a comment

DanielNoord left a comment

DanielNoord Dec 30, 2021

DanielNoord Dec 30, 2021

keichi Dec 30, 2021

DanielNoord Dec 30, 2021

keichi Jan 4, 2022

Pierre-Sassoulas Jan 4, 2022

keichi Jan 6, 2022

Pierre-Sassoulas Jan 6, 2022

DanielNoord left a comment

Pierre-Sassoulas commented Apr 8, 2022

coveralls commented Apr 10, 2022 •

edited

Pierre-Sassoulas left a comment

DanielNoord commented Apr 11, 2022

keichi commented Apr 11, 2022

DanielNoord commented Apr 11, 2022

jacobtylerwalls left a comment

jacobtylerwalls Apr 20, 2022 •

edited

Pierre-Sassoulas commented Apr 20, 2022

jacobtylerwalls commented Apr 20, 2022

jacobtylerwalls commented Apr 20, 2022

jacobtylerwalls commented May 6, 2022


		import wrapt

		LRU_CACHE_CAPACITY = 128


		import wrapt

		LRU_CACHE_CAPACITY = 128

Add common caching class and global cache flushing function #1242

Add common caching class and global cache flushing function #1242

Conversation

keichi commented Nov 16, 2021

Description

Type of Changes

Related Issue

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keichi commented Dec 30, 2021

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanielNoord left a comment

Choose a reason for hiding this comment

Pierre-Sassoulas commented Apr 8, 2022

coveralls commented Apr 10, 2022 • edited

Pull Request Test Coverage Report for Build 2143911660

💛 - Coveralls

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

DanielNoord commented Apr 11, 2022

keichi commented Apr 11, 2022

DanielNoord commented Apr 11, 2022

jacobtylerwalls left a comment

Choose a reason for hiding this comment

jacobtylerwalls Apr 20, 2022 • edited

Choose a reason for hiding this comment

Pierre-Sassoulas commented Apr 20, 2022

jacobtylerwalls commented Apr 20, 2022

jacobtylerwalls commented Apr 20, 2022

jacobtylerwalls commented May 6, 2022

coveralls commented Apr 10, 2022 •

edited

jacobtylerwalls Apr 20, 2022 •

edited