Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cachedmethod not working as expected with locks #279

Closed
npezolano opened this issue Aug 8, 2023 · 4 comments
Closed

cachedmethod not working as expected with locks #279

npezolano opened this issue Aug 8, 2023 · 4 comments
Assignees
Labels

Comments

@npezolano
Copy link

npezolano commented Aug 8, 2023

The below example runs, however it seems like it's not working as expected. In the below example calc will get called multiple times for the same n when it should be locked by a thread and then cached, I tried using both Lock and RLock and get the same result. The cache does seem to be working if I rerun res = obj.multithreaded_run() below everything will be cached, the problem looks to be if the function to be cached takes up to a few seconds to run and its required by multiple threads it won't lock properly. The problem gets worse if you use more threads. This is a follow up on the previous suggestion in #274

from multiprocessing.pool import ThreadPool as Pool

from threading import Lock, RLock
import cachetools
import operator
import time
import numpy as np

cores = 4

class DataObj(object):
    def __init__(self, enable_cache=True, **kwargs):
        self.enable_cache = enable_cache
        self.lock = RLock()
        self.cache = cachetools.LRUCache(maxsize=200)

    @cachetools.cachedmethod(operator.attrgetter('cache'), lock=operator.attrgetter('lock'))
    def calc(self, n): # Should only be called once for each n
        print(f'running n : {n}')
        sleeptime = np.random.uniform(0, 1) 
        time.sleep(sleeptime) # Sleep timer to simulate loading some data
        return n

class RunnerObj(object):
    def __init__(self, data_obj=None, window=60, size=100, **kwargs):
        self.data_obj = data_obj
        self.window = window
        self.size = size

    def run(self, data):
        n = 0
        for i in data:
            n += self.data_obj.calc(i)
        return n

    def multithreaded_run(self):
        seq = [i for i in range(self.size)]
        window_size = self.window
        data = []
        # Create example window of data to run
        for i in range(len(seq) - window_size + 1):
           data.append(seq[i: i + window_size])

        pool = Pool(cores)
        res = pool.map(self.run, data)
        return res


obj = RunnerObj(data_obj=DataObj())
res = obj.multithreaded_run()

See screenshot for example: each n should only print once:
image

@npezolano
Copy link
Author

npezolano commented Aug 9, 2023

7677e50 looks to fix the issues I'm facing above but causing a small performance impact

@tkem
Copy link
Owner

tkem commented Sep 18, 2023

@npezolano: the title of this issue "not working as expected with locks" is something that I would have to admit is probably true, sorry.
However, it is AFAIK the same behavior the standard library's @lru_cache decorator shows, and any PRs trying to fix this have been rejected due to added complexity and runtime overhead. See #224 for a somewhat thorough discussion.
So, sorry, unless someone comes up with some truly ingenious idea, this will be left open.

@tkem
Copy link
Owner

tkem commented Sep 18, 2023

On a final note: Yes, it's an issue, probably an issue that should be at least stated in the docs.
However, if this is a real issue for your individual use case, it's probably also an issue in your back-end service, i.e. this would arise even without caching (probably more so), and maybe should be handled there...

@npezolano
Copy link
Author

I'm sure removing thread safety everywhere in the documentation will fix the issue @tkem

@tkem tkem closed this as completed Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants