Removed mutable default value in _inference_tip_cache #1139

superbobry · 2021-08-19T11:24:51Z

Description

The mutable default is problematic when astroid is used as a library,
because it effectively becomes a memory leak, see #792.

This commit moves the cache to the global namespace and adds a public
API entry point to clear it.

Type of Changes

	Type
	🐛 Bug fix
	✨ New feature
✓	🔨 Refactoring
	📜 Docs

Related Issue

This PR partially addresses #792. More changes are needed to make memory consumption stable. See discussion in the issue.

The mutable default is problematic when astroid is used as a library, because it effectively becomes a memory leak, see pylint-dev#792. This commit moves the cache to the global namespace and adds a public API entry point to clear it.

superbobry · 2021-08-19T11:27:50Z

astroid/inference_tip.py

+_cache: typing.Dict[typing.Tuple[InferFn, NodeNG], typing.Any] = {}
+
+
+def clear_inference_tip_cache():


Alternatively, I can add a single clear_caches() entry point which also clears LRU caches on various methods in astroid (as discussed in #792). Wdyt?

What do you think about using the @lru_cache(maxsize=?) decorator to handle that ? I think it would remove the need for the API. But maybe we'd need a mechanism to set the max size instead ?

We'd need some extra logic to account for the return value being a generator, right?

Otherwise, lru_cache would do the job. I'd still prefer for us to have an explicit way to clear the cache, because regardless of capacity, references to nodes will keep (potentially expensive) object around in memory for longer.

I don't know lru_cache well I did not know of the potential problem with generator. Just thought that we might as well use a specialized library to have a sane limit without any human intervention (Maybe 8 Go ?). (We can also have the API on top of it by the way).

Sorry if I was unclear. I was referring to the bit of logic in the existing implementation, where the cached function does itertools.tee to make the cached generator "re-usable". We could, of course, eagerly materialize the generator into a list/tuple and cached that, instead. Does that sg?

I don't understand the itertools.tee implementation well enough to guess if it will be better one way or the other. A way to test the speed of an implementation is to use the benchmark test in pylint by doing a local install of astroid and comparing the result. But I think as long as we set a limit to the cache we store it will probably be better than the current situation.

I think itertools.tee has to cache the elements somewhere, and logically if one of the resulting iterators is fully consumed, itertools.tee will have to keep a full copy of the original iterable around internally. With this in mind, it seems we can just drop the itertools.tee and always return a list iterator.

As for lru_cache, I'm afraid we cannot use it here because the current implementation only uses the first argument as key (because the other one is context?), and lru_cache does not allow that.

This commit is not expected to affect the behavior, and if anything, should improve memory usage, because result is only materilized once (before it was also stored in-full inside itertools.tee).

for more information, see https://pre-commit.ci

Pierre-Sassoulas

Thank you for refactoring this and permitting to clear the cache. Do you have an opinion on what need to be done to make lru_cache clear automatic ? Maybe a decorator that take a string of all the argument given to _inference_tip_cached and is cached itself ?

superbobry · 2021-08-25T22:02:40Z

The answer to that will depend on the exact API for inference tips. Is it right that inference tip functions always take exactly two arguments -- a node and a context? If so, what is context exactly? Does it remain constant during an inference session? Is it fine to hash it via id()?

Pierre-Sassoulas · 2021-08-27T05:48:58Z

Thank you for your answer, I opened #1150 for the next step and reflection about lru_cache. This is already a lot better than the old way of doing it.

superbobry · 2021-08-27T13:55:47Z

@Pierre-Sassoulas thanks for merging the PR! Happy to work on the lru_cache version in a follow up, if you give a bit more context on how inference tips are invoked (see questions in my previous comment).

Pierre-Sassoulas · 2021-09-01T20:17:47Z

Hey sorry for now answering this sooner. I thought I might as well partially answer: The inference tip always have two arguments (a node and a context). I asked other active maintainers about constant context during astroid's execution and it look like no one knows precisely. So I would have to look at the only place of truth (The CODE 😄) to answer that, but did not find the time.

superbobry · 2021-09-01T20:20:23Z

Thanks Pierre! Happy to dig through the code myself and address #1150 :)

superbobry commented Aug 19, 2021

View reviewed changes

Pierre-Sassoulas added Bug 🪳 topic-performance labels Aug 19, 2021

Pierre-Sassoulas modified the milestones: 2.7.2, 2.7.3 Aug 19, 2021

Removed the itertools.tee call from _inference_tip_cached

90cbac3

This commit is not expected to affect the behavior, and if anything, should improve memory usage, because result is only materilized once (before it was also stored in-full inside itertools.tee).

superbobry force-pushed the inference-tip-mutable-default branch from eeaf655 to 90cbac3 Compare August 22, 2021 21:26

[pre-commit.ci] auto fixes from pre-commit.com hooks

9053507

for more information, see https://pre-commit.ci

Pierre-Sassoulas approved these changes Aug 23, 2021

View reviewed changes

Pierre-Sassoulas mentioned this pull request Aug 27, 2021

Automatic cache clearing with lru_cache's maxsize #1150

Closed

Pierre-Sassoulas merged commit 3d0a025 into pylint-dev:main Aug 27, 2021

superbobry deleted the inference-tip-mutable-default branch August 28, 2021 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed mutable default value in _inference_tip_cache #1139

Removed mutable default value in _inference_tip_cache #1139

superbobry commented Aug 19, 2021

superbobry Aug 19, 2021

Pierre-Sassoulas Aug 19, 2021

superbobry Aug 19, 2021

Pierre-Sassoulas Aug 19, 2021

superbobry Aug 19, 2021

Pierre-Sassoulas Aug 19, 2021

superbobry Aug 22, 2021

Pierre-Sassoulas left a comment

superbobry commented Aug 25, 2021

Pierre-Sassoulas commented Aug 27, 2021

superbobry commented Aug 27, 2021

Pierre-Sassoulas commented Sep 1, 2021

superbobry commented Sep 1, 2021

		_cache: typing.Dict[typing.Tuple[InferFn, NodeNG], typing.Any] = {}


		def clear_inference_tip_cache():

Removed mutable default value in _inference_tip_cache #1139

Removed mutable default value in _inference_tip_cache #1139

Conversation

superbobry commented Aug 19, 2021

Description

Type of Changes

Related Issue

superbobry Aug 19, 2021

Choose a reason for hiding this comment

Pierre-Sassoulas Aug 19, 2021

Choose a reason for hiding this comment

superbobry Aug 19, 2021

Choose a reason for hiding this comment

Pierre-Sassoulas Aug 19, 2021

Choose a reason for hiding this comment

superbobry Aug 19, 2021

Choose a reason for hiding this comment

Pierre-Sassoulas Aug 19, 2021

Choose a reason for hiding this comment

superbobry Aug 22, 2021

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

superbobry commented Aug 25, 2021

Pierre-Sassoulas commented Aug 27, 2021

superbobry commented Aug 27, 2021

Pierre-Sassoulas commented Sep 1, 2021

superbobry commented Sep 1, 2021