Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety #294

Closed
brucehoff opened this issue Jan 16, 2024 · 3 comments
Closed

Thread safety #294

brucehoff opened this issue Jan 16, 2024 · 3 comments
Assignees
Labels

Comments

@brucehoff
Copy link

brucehoff commented Jan 16, 2024

Describe the bug

I need an in-memory cache to be shared across threads. The doc’s say

all the decorators in this module are thread-safe by default.

If so, the code below should result in one cache “miss” followed by 999 cache hits. However it prints:

CacheInfo(hits=0, misses=1000, maxsize=1, currsize=0)

It’s like all 1000 threads got to enter the cache at the same time.

Expected result

CacheInfo(hits=999, misses=1, maxsize=1, currsize=1)

Actual result

CacheInfo(hits=0, misses=1000, maxsize=1, currsize=0)

Reproduction steps

import cachetools.func
import threading
from time import sleep
NUM_THREADS = 1000
CACHE_TTL_SECONDS = 3600 # one hour
@cachetools.func.ttl_cache(maxsize=1, ttl=CACHE_TTL_SECONDS)
def get_value():
	sleep(1)
	return 42
class MyThread(threading.Thread):
    def __init__(self, id):
        threading.Thread.__init__(self)
        self.thread_name = str(id)
        self.thread_ID = id
    def run(self):
        get_value();
for i in range(0,NUM_THREADS):
	MyThread(i).start()
sleep(2)
print(get_value.cache_info())
@GianlucaFicarelli
Copy link

We can wait for the authors to confirm, but from my understanding the result CacheInfo(hits=0, misses=1000, maxsize=1, currsize=1) is expected (with currsize=1 and not 0), based on this note in the docs:

The lock context manager is used only to guard access to the cache object. The underlying wrapped function will be called outside the with statement, and must be thread-safe by itself.

There are some related discussions in #224

@tkem
Copy link
Owner

tkem commented Jan 23, 2024

@GianlucaFicarelli Thanks for stepping in, and apologies to @brucehoff - I was kind of busy with other (paid) work...

As @GianlucaFicarelli already pointed out, this is not a "safety" issue, but mainly a performance issue. The cache and internal data structures should be kept intact, but in a "thundering herd" scenario like this, calls will bypass the cache until the first result is ready.

There have been several proposals to solve this, but at least so far these would have added complexity or simply serialize calls to the underlying function. In all cases, there would have been some performance impact on the general case, i.e. calling a cached function with different arguments simultaneously.

Please note that the same holds for the standard library's functools.lru_cache decorator, for which cachetools.funcis meant to be a drop-in replacement.

If this is a common use case, I'd recommend synchronizing/serializing the calling function (get_value() in your example).

@tkem tkem closed this as completed Jan 23, 2024
@brucehoff
Copy link
Author

brucehoff commented Jan 23, 2024

Until/unless the issue is fixed I recommend updating or removing the statement:

all the decorators in this module are thread-safe by default

which is misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants