New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key-level locking which corrects multithreading performance #224
Conversation
First of all, thanks for your interest and for providing detailed analysis and explanation for what you are trying to achieve. I very much appreciate that, especially from a first-time contributor! I thought about something like this when I initially implemented the decorators, but at that time decided the use case (multiple threads requesting the same key concurrently) wasn't common enough to warrant the extra complexity, especially since
So I decided to follow the standard library's Anyway, I'd like to hear more about your particular real-world use case, which apparently triggers this behavior, before making up my mind about this. |
On a side note, I've learned myself that many maintainers prefer PRs which consist of a single commit, to keep their git log short and clean. Rebasing and squashing is therefore always a good idea before submitting a PR. |
c9f8611
to
04379f7
Compare
Thank you for considering our improvement! I've got some responses to your comments, you do raise some good points:
Where about's can I find your review comments? I understand that it's a non-trivial change to get right, but I feel that the work required to make it right is worth it, at least in our case. Given the complexity however, would you be adverse to splitting out the @ cached decorators into key-level-locked/non-key-level-locked flavors? Perhaps a @ cached(key_level_locking=True) argument may be passed, which routes the caller to a separate implementation? I can see that you merged a decent refactor yesterday, so happy to make a change to this effect at the same time as re-basing onto your changes.
Agreed that we shouldnt hide the key level locking behind the fact that the user has supplied a Lock. Once we decide on how to progress with the previous point I'll get that cleaned up, not a problem.
As do we! It's very easy to read, which may be a positive for splitting the implementation into one with key level locking and one without.
Here goes: Our usecase is one where we've got a standard (inherently multithreaded) FastAPI endpoint which fields upwards of 150 requests per second. We put a cachetools Past the first ~1.5 seconds of a new date being requested, we run perfectly. However, during that first ~1.5 seconds, every single reqeust for a given date (note we get ~150 per second) triggers a number of database queries, which completely crushes our backend and leads to response times of up to 30 seconds as our databases become fully congested. We implemented this key level locking in cachetools such that subsequent requests for a given date are blocked, meaning that not only is our worst-case response time now ~1.5 seconds (blocked threads re-use the newly computed value), we've also completely mitigated almost all bursty load on our database. |
src/cachetools/decorators.py
Outdated
@@ -1,4 +1,5 @@ | |||
import functools | |||
from threading import RLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use RLock
only to lock access to _key_level_lock; cache access should be guarded with user-provided lock
object.
src/cachetools/decorators.py
Outdated
with _key_level_lock: | ||
try: | ||
# In the case that this thread was blocked, retrieve and return the now computed value | ||
return cache[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache access not protected by user-supplied lock, maybe not MT-safe and not conforming to documentation
src/cachetools/decorators.py
Outdated
except KeyError: | ||
# Otherwise compute it on this thread since the key is not in the cache | ||
v = func(*args, **kwargs) | ||
_key_level_locks.pop(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple threads may call _key_level_locks
with different k
concurrently
src/cachetools/decorators.py
Outdated
with _key_level_lock: | ||
try: | ||
# In the case that this thread was blocked, retrieve and return the now computed value | ||
return c[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
src/cachetools/decorators.py
Outdated
except KeyError: | ||
# Otherwise compute it on this thread since the key is not in the cache | ||
v = method(self, *args, **kwargs) | ||
_key_level_locks.pop(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
with _key_level_lock: | ||
try: | ||
# In the case that this thread was blocked, retrieve and return the now computed value | ||
return cache[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
src/cachetools/func.py
Outdated
except KeyError: | ||
# Otherwise compute it on this thread since the key is not in the cache | ||
v = func(*args, **kwargs) | ||
key_level_locks.pop(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
tests/test_method.py
Outdated
self.assertEqual(cached.get(1), 3) | ||
self.assertEqual(cached.get(1.0), 3) | ||
self.assertEqual(cached.get(2.0), 7) | ||
self.assertEqual(cached.get(1), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS you should _not_need to change existing unit tests.
self.assertEqual(cached.get(1), 3) | ||
self.assertEqual(cached.get(1), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS you should _not_need to change existing unit tests.
@@ -92,11 +94,11 @@ def __exit__(self, *exc): | |||
self.assertEqual(len(cache), 0) | |||
self.assertEqual(wrapper.__wrapped__, self.func) | |||
self.assertEqual(wrapper(0), 0) | |||
self.assertEqual(Lock.count, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
src/cachetools/decorators.py
Outdated
return cache[k] | ||
except KeyError: | ||
# Otherwise compute it on this thread since the key is not in the cache | ||
v = func(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if func()
raises an exception, k
is never popped from _key_level_locks
=> possible memory leak?
@northyorkshire: You should be able to see the review comments now, sorry for the confusion. And yes, I already started some long-postponed (and in retrospect somewhat ill-fated) refactoring when your PR arrived, so sorry for that inconvenience, too. In fact, only the location of the |
That's okay - it's nothing that we cant deal with :). I'll have a look at addressing the things which you have pointed out now. Nice spot on the memory leak btw! Did you have any more thought on how you'd like us to feature-toggle this key-level-locking behavior? Or if you'd even want it toggleable to begin with? |
return cache[k] | ||
except KeyError: | ||
# Otherwise compute it on this thread since the key is not in the cache | ||
v = func(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exception may cause memory leak
@northyorkshire: Thanks for explaining your use case - it's always informative to learn how this is being used. Regarding integration of key-level locking in the decorators, however, I'm still somewhat undecided - sure, the current situation is less than optimal and can probably be quite surprising (I guess you spent quite some time to figure out what's going on here). But I'm not sure the added complexity is worth it, especially since this issue hasn't been brought up before. I also don't like the idea of selecting behavior passing an extra argument to |
I also toyed a little with your example code, thinking about how this could be solved without any changes to from concurrent.futures import ThreadPoolExecutor
from functools import update_wrapper
from threading import RLock
from time import sleep
from cachetools import TTLCache, cached
from cachetools.keys import hashkey
cache = TTLCache(maxsize=100,ttl=600)
calls=0
def key_level_locking(func):
lock = RLock()
# locks holds [lock, refcount] items
locks = {}
def wrapper(*args, **kwargs):
k = hashkey(*args, **kwargs)
try:
with lock:
klock = locks.setdefault(k, [RLock(), 0])
klock[1] += 1 # increase reference count
with klock[0]:
v = func(*args, **kwargs)
finally:
with lock:
klock = locks[k]
klock[1] -= 1 # decrease reference count
if klock[1] == 0:
del locks[k]
return v
return update_wrapper(wrapper, func)
@key_level_locking
@cached(cache)
def method(*args):
global calls
sleep(1)
calls+=1
print("Doing something expensive!")
return args
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(method, ['arg']*10)
print(calls) Now this has some disadvantages compared with adding key-level locking to the decorators directly, performance-wise:
Hoewever, I find reasoning about correctness and thread-safety easier with smaller building blocks, and the performance issues may be negligible in your case. |
Another, much simpler solution that comes to mind would be to somehow pre-seed the cache with the next date before that data is being requested. If that's predictable in some way, this would even improve your response times for the first request(s). |
4be2ab3
to
4e1dc32
Compare
I'm glad that you can appreciate our usecase for this! It was indeed a tricky one to track down, we almost never suspect the fault to be with a library which we're using and always try to assume it's with our own code. Are you intending to keep Our view is that while we are proposing added complexity, we're also adding / correcting functionality as well. I do like your idea of the def key_level_locking(func):
lock = RLock()
locks = {}
def wrapper(*args, **kwargs):
k = hashkey(*args, **kwargs)
try:
with lock:
klock = locks.setdefault(k, RLock())
with klock:
return func(*args, **kwargs)
finally:
with lock:
if k in locks:
del locks[k]
return update_wrapper(wrapper, func) My reservation with this approach though is that we'd rarely use If we build this logic into the
As for pre-warming the cache, our data becomes stale quite quickly (hence our need for a ttl_cache), and there would be a significant amount of data to pre-load should we grab for all dates, which makes this unfeasable |
Yes, for the forseeable future
I think it's necessary when using the decorator "stand-alone", i.e. to guarantee that the underlying function is not called concurrently with the same arguments. What you present looks exactly like my initial attempt, and may be sufficient when combined with a caching decorator for your use case, but I haven't fully thought it through.
I agree, and that's why I sympathize with this PR in principle ;-) I've just become somewhat conservative when it comes to major changes and/or introducing added complexity, mainly for maintenance reasons and backward compatibility (unless I'm convinced it's just a little refactoring that noone will notice... see #225). I'd also have to be convinced that this is 100% thread-safe and does behave as expected under all circumstances, and the fact that the standard library's So it may take a while and a couple of iterations before I feel it's safe to include this, maybe even until the next major version release. |
src/cachetools/__init__.py
Outdated
# Otherwise compute it on this thread since the key is not in the cache | ||
v = func(*args, **kwargs) | ||
finally: | ||
_key_level_locks.pop(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlocked access to _key_level_locks - since k
provides it's own __hash__
function, dict
operations are not thread-safe, not even in CPython with its global interpreter lock...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look like the GIL get's released, however I've protected this call either way since it doesn't look like it's going to give rise to a deadlock (all operations under the global (non key level) lock are context managed and finite)
https://github.com/python/cpython/blob/main/Objects/setobject.c#L630
4e1dc32
to
220bfab
Compare
Fully understandable and we appreciate that this is quite a major, all be it backend change.
This is very difficult to prove, however we've been hitting our caches quite aggressively for a few weeks now and have not encountered any problems. My reasoning for thread safety is that, to my eyes, no lock can be acquired such that a separate caller may be cause for deadlock since all operations under non-key-level lock are deterministic and independent of the underlying wrapped function.
That's promising! I'm happy to work with you to correctly feature toggle this behavior if need be, however I appreciate your prior concerns around maintaining yet another implementation of the caching decorator |
220bfab
to
b632787
Compare
@@ -526,20 +527,33 @@ def wrapper(*args, **kwargs): | |||
|
|||
else: | |||
|
|||
_key_level_locks = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to use defaultdict to not allocate extra RLock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! That datatype looks ideal here. Out of curiosity, where is the extra RLock being allocated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not here actually, but a pair lines below on _key_level_lock = _key_level_locks.setdefault(k, RLock())
. RLock is always created even it's not needed. As that line is under the lock so it must be atomic.
I always thought passing a lock object would handle this problem. But I was wrong. Thanks for the import threading
import time
from cachetools import TTLCache, cached
# from memoization import cached
from concurrent.futures import ThreadPoolExecutor
from functools import lru_cache, _make_key, update_wrapper, wraps
from collections import defaultdict
from cachetools.keys import hashkey
cache_lock = threading.Lock()
def threadsafe_lru(func):
# https://noamkremen.github.io/a-simple-threadsafe-caching-decorator.html
func = lru_cache()(func)
lock_dict = defaultdict(threading.Lock)
def _thread_lru(*args, **kwargs):
key = _make_key(args, kwargs, typed=False)
with lock_dict[key]:
return func(*args, **kwargs)
return _thread_lru
def key_level_locking(key=hashkey):
def decorator(func):
# https://github.com/tkem/cachetools/pull/224
lock = threading.RLock()
# locks holds [lock, refcount] items
locks = {}
def wrapper(*args, **kwargs):
k = key(*args, **kwargs)
try:
with lock:
klock = locks.setdefault(k, [threading.RLock(), 0])
klock[1] += 1 # increase reference count
with klock[0]:
v = func(*args, **kwargs)
finally:
with lock:
klock = locks[k]
klock[1] -= 1 # decrease reference count
if klock[1] == 0:
del locks[k]
return v
return update_wrapper(wrapper, func)
return decorator
def get_key(o: int):
# print("key: ", o)
# return "xxx"
return str(o)
cache = TTLCache(maxsize=1024, ttl=25)
# @threadsafe_lru # no ttl, no custom key
@key_level_locking(key=get_key)
@cached(cache, key=get_key)
# @cached(cache, lock=cache_lock, key=get_key) # not working
def method(arg: int):
print(f"Doing something expensive with {arg} ...")
time.sleep(5)
return arg
import random
def simple():
r = random.randrange(1, 100)
r = 10
method(r)
print("...done.")
if False:
with ThreadPoolExecutor(max_workers=5) as executor:
print("start execution")
executor.map(simple)
if True:
for i in range(10):
t = threading.Thread(target=simple, name=f"worker-{i}")
print("start execution")
t.start()
time.sleep(1)
# Resources
# https://stackoverflow.com/questions/31771286/python-in-memory-cache-with-time-to-live Output:
|
@tkem @northyorkshire This is of interest to us as well for asyncio servers
# some multi-process server,
# where each process uses asyncio greenthread (single physical thread / event loop) concurrency in each process
@cached(TTLCache(...), key_rw_lock_class=aiorwlock.RWLock)
async def route_handler(...):
out = await fan_out_async()
return out or, worse, at least only block requests at the same route (even if diff keys): # more global/conservative blocking of all use
@cached(TTLCache(...), lock=threading.Lock())
async def route_handler(...):
out = await fan_out_async()
return out
async def route_handler(...):
out = route_handler_sync_helper(...)
return out
# no value in adding lock=... as the event loop will be blocked anways
@cached(TTLCache(...))
def route_handler_sync_helper(...):
"""
Fully blocks asyncio event loop
"""
out = fan_out_sync()
return out |
@northyorkshire Sorry for the delay... Due to the added complexity and runtime overhead for simple uses (e.g. the |
Hello @northyorkshire ! Are you planning to publish this as a separate package? |
A cache is generally applied to functions and methods which are either slow or expensive to execute, in order to minimize both caller latency and stress on underlying services.
As it stands today, calling a
cachetools
cached function multiple times from separate threads with the same key may cause the function body to be evaluated multiple times. This means that a cached, 10 seconds reference data load may be invoked thread count number of times during the first 10 seconds that it's executing, potentially swamping underlying services.Cachetools today:
For example, setting up a REST (I used FastAPI) server to call the following function per request yields multiple calls even though the function is cached. (Note that each timestamped line represents a call to the FastAPI endpoint)
This is because @ cached only locks on the access to the cache, not on the generation of the value when the key is not present. During the time it takes from the first call for that key to that call (or a subsequent) call completing, the wrapped function will always be evaluated.
Another, more self contained example is as follows:
Cachetools post-fix
After the fixes which I'm proposing, the expensive underlying function is only executed a single time for each unique (per key) call.
For the first example:
I have manually added some commentary to the log lines. Note how the first call hits our expensive function, while subsequent calls wait for it to complete.
10 seconds after the first call has come in, all other calls instantly return, since the value is now available.
The request at 13:59:25 took only two seconds to respond, whereas it would not only have taken 10 seconds to respond before the bug fix, it would also add more stress to the underlying services called from within
test()
In this second, self contained example, note how only one call is logged to the cached function, even though the code is functionally identical to before.
I'll also add that key level locking still works as expected - repeated calls with different keys yields no benefit over the previous implementation before this bug fix.