Thread safety #294

brucehoff · 2024-01-16T21:55:14Z

Describe the bug

I need an in-memory cache to be shared across threads. The doc’s say

all the decorators in this module are thread-safe by default.

If so, the code below should result in one cache “miss” followed by 999 cache hits. However it prints:

CacheInfo(hits=0, misses=1000, maxsize=1, currsize=0)

It’s like all 1000 threads got to enter the cache at the same time.

Expected result

CacheInfo(hits=999, misses=1, maxsize=1, currsize=1)

Actual result

CacheInfo(hits=0, misses=1000, maxsize=1, currsize=0)

Reproduction steps

import cachetools.func
import threading
from time import sleep
NUM_THREADS = 1000
CACHE_TTL_SECONDS = 3600 # one hour
@cachetools.func.ttl_cache(maxsize=1, ttl=CACHE_TTL_SECONDS)
def get_value():
	sleep(1)
	return 42
class MyThread(threading.Thread):
    def __init__(self, id):
        threading.Thread.__init__(self)
        self.thread_name = str(id)
        self.thread_ID = id
    def run(self):
        get_value();
for i in range(0,NUM_THREADS):
	MyThread(i).start()
sleep(2)
print(get_value.cache_info())

The text was updated successfully, but these errors were encountered:

GianlucaFicarelli · 2024-01-23T15:09:37Z

We can wait for the authors to confirm, but from my understanding the result CacheInfo(hits=0, misses=1000, maxsize=1, currsize=1) is expected (with currsize=1 and not 0), based on this note in the docs:

The lock context manager is used only to guard access to the cache object. The underlying wrapped function will be called outside the with statement, and must be thread-safe by itself.

There are some related discussions in #224

tkem · 2024-01-23T21:48:57Z

@GianlucaFicarelli Thanks for stepping in, and apologies to @brucehoff - I was kind of busy with other (paid) work...

As @GianlucaFicarelli already pointed out, this is not a "safety" issue, but mainly a performance issue. The cache and internal data structures should be kept intact, but in a "thundering herd" scenario like this, calls will bypass the cache until the first result is ready.

There have been several proposals to solve this, but at least so far these would have added complexity or simply serialize calls to the underlying function. In all cases, there would have been some performance impact on the general case, i.e. calling a cached function with different arguments simultaneously.

Please note that the same holds for the standard library's functools.lru_cache decorator, for which cachetools.funcis meant to be a drop-in replacement.

If this is a common use case, I'd recommend synchronizing/serializing the calling function (get_value() in your example).

brucehoff · 2024-01-23T22:28:14Z

Until/unless the issue is fixed I recommend updating or removing the statement:

all the decorators in this module are thread-safe by default

which is misleading.

brucehoff added the bug label Jan 16, 2024

brucehoff assigned tkem Jan 16, 2024

tkem closed this as completed Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread safety #294

Thread safety #294

brucehoff commented Jan 16, 2024 •

edited

GianlucaFicarelli commented Jan 23, 2024

tkem commented Jan 23, 2024

brucehoff commented Jan 23, 2024 •

edited

Thread safety #294

Thread safety #294

Comments

brucehoff commented Jan 16, 2024 • edited

GianlucaFicarelli commented Jan 23, 2024

tkem commented Jan 23, 2024

brucehoff commented Jan 23, 2024 • edited

brucehoff commented Jan 16, 2024 •

edited

brucehoff commented Jan 23, 2024 •

edited