Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTLCache with "absolute" expiration times #219

Closed
xmatthias opened this issue Sep 12, 2021 · 12 comments
Closed

TTLCache with "absolute" expiration times #219

xmatthias opened this issue Sep 12, 2021 · 12 comments

Comments

@xmatthias
Copy link

I need to have a TTLCache that has absolute expiration times.

This means, i have a cache which expires items every full hour (for example) - or always at midnight.

Currently, TTL seems to only be possible for relative values (200s from now, 1000s from now) - but not "until the clock hits :00 next time" - so the insert time will determine how long it'll live:

from cachetools.ttl import TTLCache

# Cache for 1h
c = TTLCache(5, ttl=3600)
c['a'] = 1

In the above sample, item 'a' will live for 1 hour - but it might expire at 18:30, or 18:34, (exactly one hour after i inserted it). I'd however like it to expire at 18:00, 19:00, 20:00 (independently of the insertion time).

I don't mind creating a subclass of TTLCache for this myself, but TTLCache seems pretty involved / complex and not well documented (what does what) - so i'm currently not sure where to start.

I'll potentially need a different timer - but i'm not sure that will suffice my requirements (unfortunately, this is not pretty well documented).

I could also work with it by somehow modifying the "expire" (all items should expire at the same moment - when the clock hits 11:00, 12:00, ... ) - so i think calling .expire() after insertion "might" work (although i'm not sure that will suffice my needs).

The math of the expiration is pretty easy (to get to full hours) - but i miss understanding where i should plug this in ...

from datetime import datetime
ts = datetime.now().timestamp()
offset = ts % 3600
expire > datetime.fromtimestamp(ts - offset + 3600)
@XuehaiPan
Copy link

XuehaiPan commented Sep 12, 2021

You could write a custom timer for this somehow like the math.floor operation for a timestamp. For example, a key is inserted at 18:15, but you can register it at 18:00 use the floor operation. Then after ttl=3600 seconds, this key will be expired.

def my_timer():
    now = datetime.now()

    # floor to minute
    dt = datetime(year=now.year, month=now.month, day=now.day, hour=now.hour, minute=now.minute)

    # floor to hour
    dt = datetime(year=now.year, month=now.month, day=now.day, hour=now.hour)

    # even for a specific time
    target_minute, target_second = 15, 16
    dt = datetime(year=now.year, month=now.month, day=now.day, hour=now.hour, minute=target_minute, second=target_second)
    if dt > now:
        dt -= timedelta(hours=1)
    return dt.timestamp()

cache = TTLCache(5, ttl=3600 - 1E-5, timer=my_timer)  # expire at HH:15:16 for every hour

@xmatthias
Copy link
Author

xmatthias commented Sep 12, 2021

Thanks for the quick answer!

I'm not sure i understand what the expectation for "timer" to return is?
The current time - or the time it'll expire?

Also, the above code (i've tried the "floor to minute") doesn't seem to work properly ... the items still expire only after their full minute of expiration - even though my timer-function returns a full minute

@XuehaiPan
Copy link

XuehaiPan commented Sep 12, 2021

It seems that we should add a small number to the ttl. As shown below here is curr.expire < time not curr.expire <= time.

while curr is not root and curr.expire < time:
cache_delitem(self, curr.key)
del links[curr.key]
next = curr.next
curr.unlink()
curr = next

import time
from datetime import datetime, timedelta
from cachetools.func import ttl_cache

def my_timer():
    now = datetime.now()

    # floor to minute
    dt = datetime(year=now.year, month=now.month, day=now.day, hour=now.hour, minute=now.minute)
    print('Register as {}'.format(dt))
    return dt.timestamp()

@ttl_cache(ttl=(60 - 1E-5), timer=my_timer)  # ttl is slightly smaller than 60
def now():
    return datetime.now()

for i in range(30):
    print('Function output: {}'.format(now()))
    print('Sleep 5 seconds')
    time.sleep(5)
    print()

And I got correct results as expected:

Results...

Register as 2021-09-13 01:43:00
Function output: 2021-09-13 01:43:44.380631
Sleep 5 seconds

Register as 2021-09-13 01:43:00
Function output: 2021-09-13 01:43:44.380631
Sleep 5 seconds

Register as 2021-09-13 01:43:00
Function output: 2021-09-13 01:43:44.380631
Sleep 5 seconds

Register as 2021-09-13 01:43:00
Function output: 2021-09-13 01:43:44.380631
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:44:00
Function output: 2021-09-13 01:44:04.416717
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:45:00
Function output: 2021-09-13 01:45:04.512299
Sleep 5 seconds

Register as 2021-09-13 01:46:00
Register as 2021-09-13 01:46:00
Function output: 2021-09-13 01:46:04.604721
Sleep 5 seconds

Register as 2021-09-13 01:46:00
Function output: 2021-09-13 01:46:04.604721
Sleep 5 seconds

@XuehaiPan
Copy link

It seems that we should add a small number to the ttl.

I have created PR #220 to address this. After then, we could build a new cache like PeriodicCache on top of TTLCache(exclusive=True). Link #114

@xmatthias
Copy link
Author

I am not sure if #220 is really necessary.

My solution now is the following (naming "stolen" from your above comment)

from cachetools.ttl import TTLCache


class PeriodicCache(TTLCache):
    """
    Special cache that expires at "straight" times
    A timer with ttl of 3600 (1h) will expire at every full hour (:00).
    """

    def __init__(self, maxsize, ttl, getsizeof=None):
        def local_timer():
            ts = datetime.now(timezone.utc).timestamp()
            offset = (ts % ttl)
            return ts - offset

        # Init with smlight offset
        super().__init__(maxsize=maxsize, ttl=ttl-1e-5, timer=local_timer, getsizeof=getsizeof)

I've setup a few tests for this and it seems to work reliably - as long as a "repeating" pattern is used for ttl (60, 120, 30, 3600, ...) - as otherwise the repeating-logic will result in "shifting" logic - which means you can't really predict when it'll expire.

@XuehaiPan
Copy link

XuehaiPan commented Sep 14, 2021

For me, I prefer to use the exclusive argument to remove the arbitrary value 1E-5, and make it more configurable.

import time
from datetime import datetime, timedelta


from .ttl import TTLCache


class PeriodicCache(TTLCache):
    def __init__(self, maxsize, period, reference=0, timer=time.perf_counter, getsizeof=None):
        self.__period = period
        self.__reference = reference
        self.__external_timer = timer

        TTLCache.__init__(self, maxsize, ttl=period, timer=self.__periodic_timer,
                          getsizeof=getsizeof, exclusive=True)

    def __periodic_timer(self):
        time = self.__external_timer()
        offset = (time - self.__reference) % self.__period
        return time - offset


cache_full_minute_proc_time = PeriodicCache(128, period=60.0)

cache_full_hour_proc_time = PeriodicCache(128, period=3600.0)

cache_half_hour_proc_time = PeriodicCache(128, period=3600.0, reference=1800.0)

cache_full_minute_utc_time = PeriodicCache(128, period=timedelta(minutes=1),
                                           reference=datetime(year=2000, month=1, day=1),
                                           timer=datetime.now)

cache_full_hour_utc_time = PeriodicCache(128, period=timedelta(hours=1),
                                         reference=datetime(year=2000, month=1, day=1),
                                         timer=datetime.now)
                                         
cache_half_hour_utc_time = PeriodicCache(128, period=timedelta(hours=1),
                                         reference=datetime(year=2000, month=1, day=1, hour=0, minute=30),
                                         timer=datetime.now)

@tkem
Copy link
Owner

tkem commented Sep 16, 2021

@xmatthias: Back to your original request...

I need to have a TTLCache that has absolute expiration times.

It would be interesting to know why you'd need such a thing (knowing the use case might help e.g. find alternative solutions), but no, such a thing does not exist in cachetools.

I don't mind creating a subclass of TTLCache for this myself,

TTLCache is not designed to be subclassed. Cache is the common base class, so if you need some special behavior, I'd recommend deriving a new class from Cache that implements your caching strategy.

I'm not sure i understand what the expectation for "timer" to return is?

Well, the docs (https://cachetools.readthedocs.io/en/stable/#cachetools.TTLCache) could surely be improved in this respect (see also #216), but basically it returns the current time, as used by the cache. This does not necessarily have to be wall-clock time, for example in the TTLCache unit tests a counter is used to better control expiration of items.

I still don't fully understand your use case, but if you want to expire items at every full hour, maybe you could use a timer with 1-hour-resolution, e.g.

timer = lambda: int(time.time() / 3600)

and specify the ttl in hours, too?

@XuehaiPan
Copy link

XuehaiPan commented Sep 16, 2021

I still don't fully understand your use case, but if you want to expire items at every full hour, maybe you could use a timer with 1-hour-resolution, e.g.

timer = lambda: int(time.time() / 3600)

and specify the ttl in hours, too?

As commented in #219 (comment) and #220, TTLCache will keep the item in the cache when its lifetime is exactly reached to 0. (INCLUSIVE)

The actual period (or ttl) in the following case is 2 hour rather than desired 1 hour:

cache = TTLCache(128, ttl=1, timer=lambda : int(time.time() / 3600))  # period is 2 hour

If we need an 1 hour cache:

cache = TTLCache(128, ttl=0.9, timer=lambda : int(time.time() / 3600))  # period is 1 hour

@xmatthias
Copy link
Author

xmatthias commented Sep 16, 2021

It would be interesting to know why you'd need such a thing (knowing the use case might help e.g. find alternative solutions), but no, such a thing does not exist in cachetools.

My usecase is a financial one - think Candlestick charts of different resolutions.
It'll not matter when the cache was last updated (at xx:15, xx:13, or xx:45 anything) for a 1h candle, i know the next relevant updatetime will be at xy:00.

Obviously this applies to 1h candles, 30m candles, 5m, or even 1d candles (candle being specified timeframe that elapses, and is always given by "clock time".

timer = lambda: int(time.time() / 3600)

and specify the ttl in hours, too?

Without the PR #220, this will result in 2h durations - which is clearly wrong.

TTLCache is not designed to be subclassed. Cache is the common base class, so if you need some special
behavior, I'd recommend deriving a new class from Cache that implements your caching strategy.

Can you highlight / point out some technical reasons (not "because i designed it the other way) to not subclass TTLCache if that serves the purpose best?

Derriving from Cache directly would have me reimplement 90% of the logic currently present in TTLCache - so when subclassing, i'm usually looking for the biggest similarity - not to the "highest" level.
otherwise we could also state that "object" is designed to be subclassed, why are you subclassing Cache - the answer to that is simple - because most functionality present there is also needed in the subclass).

@tkem
Copy link
Owner

tkem commented Sep 26, 2021

Can you highlight / point out some technical reasons (not "because i designed it the other way)

That's as good a reason as any in my book. Trying to twist and bend something that just wasn't designed with your particular use case in mind is usually more trouble than it's worth. Go and look for an alternative solution!

Derriving from Cache directly would have me reimplement 90% of the logic currently present in TTLCache - so when subclassing, i'm usually looking for the biggest similarity - not to the "highest" level.

You'll find enough resources online that explain the issues with this kind of "implementation inheritance", and why you should change your mind about this.

Without the PR #220, this will result in 2h durations - which is clearly wrong.

I agree, the ttl handling is not really intuitive when using small integers. However, as @XuehaiPan pointed out above, this is easy to work around even without PR #220.

Back to your use case: If I understand correctly, you don't even need a TTL, since all cache entries will expire at every full hour? Then why use a TTLCache in the first place? My suggestion would be to use one of the other, basic caches, e.g. LRUCache, record the timestamp of the last cache access, and simply clear the cache if an "updatetime" has occured since. For example (largely untested, using implementation inheritance for exposition only):

import time
import cachetools

class FullHourCache(cachetools.LRUCache):

    def __init__(self, maxsize, getsizeof=None):
        super().__init__(maxsize, getsizeof)
        self.__last_access = self.time()

    def __getitem__(self, key):
        t = self.time()
        if t != self.__last_access:
            self.__last_access = t
            self.clear()
        return super().__getitem__(key)

    def time(self):
        return int(time.time() / 3600)

@xmatthias
Copy link
Author

That's as good a reason as any in my book. Trying to twist and bend something that just wasn't designed with your particular use case in mind is usually more trouble than it's worth. Go and look for an alternative solution!

You should change your mind about how you handle issues that try to be constructive and improve your project.
Just because you don't like the way a part is used doesn't give you the right to tell me how i have to implement solutions.

To me, this whole answer sounds like a pretty frustrated maintainer.
If you're frustrated / no longer motivated, please try to find another, still motivated maintainer for this project (or archive it if you don't think it'll be of use to anyone in the future), instead of becomming unfriendly and attacking your users, thanks.

You'll find enough resources online that explain the issues with this kind of "implementation inheritance", and why you should change your mind about this.

I understand you didn't intend it to be used this way - but apparently we hit a weak spot (maybe you're suspecting issues in TTL cache's implementation?) - but that's no reason to get defensive and trying to attack the way people are trying to use your tool (especially if the 2nd part of the answer contains a very similar approch).

I'm sure i can find enough resources online that also explain that reimplementing / copy/pasting 90% of good code is also an even worse approach.

Back to your use case: If I understand correctly, you don't even need a TTL, since all cache entries will expire at every full hour? Then why use a TTLCache in the first place? My suggestion would be to use one of the other, basic caches, e.g. LRUCache, record the timestamp of the last cache access, and simply clear the cache if an "updatetime" has occured since. For example (largely untested, using implementation inheritance for exposition only):

import time
import cachetools

class FullHourCache(cachetools.LRUCache):

    def __init__(self, maxsize, getsizeof=None):
        super().__init__(maxsize, getsizeof)
        self.__last_access = self.time()

    def __getitem__(self, key):
        t = self.time()
        if t != self.__last_access:
            self.__last_access = t
            self.clear()
        return super().__getitem__(key)

    def time(self):
        return int(time.time() / 3600)

Now this is very inconsequent and contradicts what you said above.

Why is inheriting from LRUCache fine, but inheriting from TTLCache is not?
Wouldn't your logics about subclassing apply to that as well?

This confirms my suspicion that you're seeing some problems with TTLCache ... would you mind highlighting these, as i'm using TTLCache (also without this usecase) extensively, and would not want to run into issues due to it.

@tkem
Copy link
Owner

tkem commented Sep 26, 2021

Well, let me assure you that I'm not frustrated, and still happy to maintain this. Which does not mean I'll happily embrace each and every feature/change request or PR that doesn't make my life easier.

That said, I didn't mean to "attack" anybody. AFAICS I just stated that I think your approach is wrong, and I tried to come up with an alternative solution. You're free to use that or whatever you come up with yourself, of course. Just don't waste your and my time any more with this, thank you.

Let me also assure you that you didn't "hit a weak spot" - there are no issues with the TTLCache implementation, at least none I'm aware of.

Finally, use of inheritance in the example above was marked as "for exposition only". I would probably use a separate accessor function, but to keep this short and since you seem to like inheritance, I tried to do you a favor.

Case closed.

Repository owner locked as too heated and limited conversation to collaborators Sep 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants