New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use hash_funcs to compute an @st.cache function's cache_key
#2331
Conversation
Are you sure we don't want to use hash funcs at all when creating the cache key? Or do we just not want to use it when hashing the module and function name? |
Yeah, we never want to use it when creating the Using hash_funcs in the computation of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to make sense for me.
Checked the docs to make sure there's no confusion, and the way I interpret it is how to hash the parameters, which means this should be a value_key
hashing only and not the cache_key
.
I originally wondered if there would be a reason to cache across multiple functions, and I answered "probably not". On top of that, we can wrap the functions with a cached function for that exact example if needed (ie there's a workaround likely for this incredibly rare scenario).
I believe when we hash functions we hash both the byte code and the referenced objects. So it's possible we'd try to hash an object that we don't know how to hash while generating the cache key. When looking at your PR comment below, it seems like we don't want to use hash funcs with
|
Yeah, @jrhone is correct, and I completely overlooked this. When we hash the function itself, it's not just the AST, it's also all objects on the stack that are referenced by the function. We could do a two-phase cache-key generation: # Include the function's module and qualified name in the hash.
update_hash(
(func.__module__, func.__qualname__),
hasher=func_hasher,
hash_funcs=None,
hash_reason=HashReason.CACHING_FUNC_BODY,
hash_source=func,
)
# Include the function's body in the hash. We _do_ pass hash_funcs here,
# because this step will be hashing any objects referenced in the function
# body.
update_hash(
func,
hasher=func_hasher,
hash_funcs=hash_funcs,
hash_reason=HashReason.CACHING_FUNC_BODY,
hash_source=func,
)
cache_key = func_hasher.hexdigest() However, this doesn't address a possibly deeper issue, which is that many of our special-cased hashing functions use strings and Maybe we should just disallow this entirely? |
I've fixed the issue that @jrhone pointed out, but - I'm pulling this and changing it to a draft for now. I think we want to come to a better understanding of the right way forward for "catch-all" hash_funcs (like |
Only use hash_funcs for computing the
value_key
for a cached function value.Fixes #2328
A longer explanation, from that bug:
There are two parts to cache retrieval for @st.cache:
Retrieve the decorated function's
MemCache
instance. We use a(func.__module__, func.__qualname__, func)
tuple to get thecache_key
that uniquely identifies the function. No two functions (even if they have the same name and body) will share the sameMemCache
.Retrieve the cached value from the function's
MemCache
. This is where we hash the function's arguments to produce thevalue_key
for looking up the value within theMemCache
.We currently pass
hash_funcs
when computing bothcache_key
andvalue_key
. This is generally innocuous, sincecache_key
uses two strings and a function's AST as hash values. However, if you supply a hash_func that operates on string values, you run the risk of having two different functions resolve the samecache_key
, and end up unexpectedly sharing aMemCache
instance.See this forum issue for an example of this bug in action.
In short, we should not pass
hash_funcs
to thecache_key
hasher. We never want different functions to share the sameMemCache
instance. (Passinghash_funcs
here was an oversight - the solution is to just passhash_funcs=None
!)