get API that does not update eviction heuristics #418

hshankar · 2020-05-13T06:17:26Z

Is there a get API in caffeine that does not update the eviction heuristics used by Tiny-LFU? In my use case, my write operation has to read from the cache, but that would end up messing up the heuristics and not evicting the value even if I am only writing from it and not reading from it. Is there any way for the write thread to do a get() on a key, but not count that get as a cache hit?

Details: We are using Caffeine as a separate cache, away from the DB. However, it is somewhat atypical in that when a user reads a key, it typically requires loading N different entities from the DB. In cache, we store this aggregated wrapper containing these N entities, say the key we use for caching is UserId, and the value stored in cache is a Map<EntityId, Entity>.
For reads, app would talk to cache, and if not found, it would talk to the DB and put the value in cache. Writes only update a particular entity at a time, so a write is something like <UserId,EntityId,Entity>. For this, we need to first check if the cache has any entries for UserId, and if there is an entry, do cache.get(UserId).put(entityId, entity). If the userId is not cached, then we don't need to update cache. But this causes a problem because all writes are doing cache.get(userId), so caffeine will think that the key is getting accessed, but these are write accesses, and should not contribute to the entry staying in cache longer. We are only interested in keeping an entity if it is accessed at read time. Is there any way for us to let caffeine know the difference in access.

The text was updated successfully, but these errors were encountered:

ben-manes · 2020-05-13T06:25:10Z

Not currently, though there have been discussions of what quiet methods would mean for such cases.

We’ve generally tried to make the eviction policy smarter to handle odd workloads. If you captured a trace, indicating which are read and write accesses for a key hash, then we’d have data to favor an api method. So far it’s only been speculation which leaves only questions.

Do you observe a problem or is this an educated concern?

ben-manes · 2020-05-13T06:31:17Z

You might also consider using a (UserId,EntityId) key and use getAll to return the aggregate on a miss.

hshankar · 2020-05-13T06:50:54Z

If you captured a trace, indicating which are read and write accesses for a key hash, then we’d have data to favor an api method.

What do you mean by a trace here? Did you mean any log proving that a particular key K1 was evicted in favor of K2 even though K2 only had writes and K1 had reads?

I am not sure if this is a problem for us yet or not, but I strongly suspect it is because ours is a write-heavy use case, we are under memory pressure and hence evicting constantly. There is a fair amount of reads as well but it is very cacheable (most readers are repeat users), and the first load from DB is expensive, which is why we cache the result. And the reads and writes are somewhat orthogonal - users who write a lot are not reading a lot, and people who read a lot are not writing. I know it sounds odd, but unfortunately I am not sure if I can reveal anything more here. The read-heavy users, who need to be in cache are probably being penalized by the write-heavy users - they might have used read just once but their entry stays in cache forever because they write a lot.

(User, EntityId) as a key also won't work - the Map example was a simplification, the value is actually a fairly heavy POJO with a lot of fields and nesting.

ben-manes · 2020-05-13T07:12:33Z

We would use a trace to run it through the simulator and observe the hit rate of different policies. The event's key is a long, so you can obfuscate real data using a simple hash. Since you care about the distinction between reads and writes, the access type could be in your event. Then we'd be able to write a parser and see how the hit rate compares if we supply all events or only read events. If you have a real workload, e.g. off of production, you could capture a day's worth for us to evaluate against. It really helps us to make data driven decisions when possible.

The prior use-cases asking for a quiet getIfPresent typically were for scheduled tasks that iterated over the cache (e.g. persisting a snapshot). Those could use one of asMap view's iterators, which don't update the access metadata. Usually the asks were because users would iterate over the keySet and lookup the entry, rather than the entrySet to get it together. Those were often misunderstandings of thinking asMap was an expensive copy rather than a view.

A getIfPresentQuietly that reads the value without any metadata update (eviction, expiration, etc) would be possible. It is just understanding the use-case to justify it rather than a misunderstanding of the apis. We have a lot of ad hoc methods under cache.policy() for these special scenarios already, so we can avoid polluting the core interfaces.

It sounds like that would be all that you need, but more complex cases like writes opens up more questions (e.g. exactly what would computeQuietly mean?). If the scope creeps out to more complex cases like writes, where we need to update metadata, then it probably becomes too painful to consider. But if a good reason justifies a simple method like getIfPresentQuietly (which just reads the value in the backing map), then it could certainly be added without much trouble.

In short, I think the first step is to acquire some trace data to justify by comparing the hit rates. Then we'd make sure the changes solve your problem and cut a new release.

hshankar · 2020-05-13T07:55:17Z

Got it. I am thinking the following csv format will do:
keyHash,access
where access = get,put,update.
put will mean when we inserted the key first
updates are just regular writes which we don't want to track as accesses

Let me know if you need evictions as well, though it might be tricky to add trace log in the eviction listener.

ben-manes · 2020-05-13T08:05:21Z

That sounds great. We won't need evictions, since we'll run it in a simulator to see what evictions occur naturally and their effect on the hit rate. We'll even get a sense of your workload (recency vs frequency) by seeing how it behaves in all of our other policies (arc, lirs, etc) and adding a new parser is very easy.

hshankar · 2020-05-15T08:48:18Z

cache-trace.csv.gz
Attached the trace logs. The format is the following:
keyHash,action,path
where action = get/put (cache.get() or cache.put())
and path = read/write (read flow vs. write flow)

So we would ideally only consider the rows where path = read for eviction heuristics. Only about 2.3% of the accesses are in the read path (~42k out of 1.8M total cache accesses)

ben-manes · 2020-05-15T09:58:24Z

Thanks. The cache size is very tiny so that can cause some odd skew, because a most advanced policies maintain a history (to learn from) relative to the maximum size. This can make adaptivity harder due to a lot of noise that goes away when using larger sampling periods. How large is your cache normally?

I used a size of 256, which is more than enough is seems. Your trace has a total of 1.8M events.

reads update the policy, populating on a miss. Writes directly insert the value.

Hit Rate	Hits	Misses
92.71 %	38,824	3,054

read and writes both update the policy and populate on a miss.

Hit Rate	Hits	Misses
41.33 %	743,855	1,056,145

reads update the policy, populating on a miss. Writes update the policy if the entry is present and does not insert the value if absent.

Hit Rate	Hits	Misses
92.71 %	38,824	3,054

In your current usage of (2) then the hit rate is much worse than your proposal of (1). Ideally if you could use computeIfPresent then it would avoid polluting the cache on an update, and you'd get (3). However that might not be possible since you might have the entry but not care to write to it, causing it to be treated as a read and update the policy. I suppose then that the only way to achieve the high hit rate is to bypass the normal APIs, as you propose, because there is no way to capture your semantics.

hshankar · 2020-05-15T15:19:58Z

Here is a larger dump if it helps. It has 100M entries (around 350MB compressed). Typically the cache size I have seen is around 80,000.

ben-manes · 2020-05-15T22:34:44Z

I had a bug in my hack to the simulator to support your complex scenario.

read uses a get, write uses a get always
read uses a get, write uses a get only if present

Large trace (80,000)

Mode	Hit Rate	Hits	Misses
1	69.47 %	69,468,943	30,531,057
2	95.13 %	2,212,211	113,275

Small trace (256)

Mode	Hit Rate	Hits	Misses
1	39.95 %	719,087	1,080,913
2	89.06 %	37,297	4,581

As (2) doesn't have to track the majority of requests, it does indicate a much higher hit rate. Since this is such a complex flow it can't really be simulated, but does indicate it is likely your feature request would help you.

The implementation that I'd propose is to add cache.policy().getIfPresentQuietly(key) which would be skip updating any policy (eviction, expiration, stats). It would do the minimal work of not returning expired entries, making it equivalent to containsKey except by returning the value.

I don't know what other "quiet" methods might mean and the variations could be very painful. For example don't update expiration but update the eviction policy. Or overwrite a value but don't notify a listener of the old value's removal. Older caching libraries had ad hoc quiet methods and that really gets into a framework mindset. So I am fine with this one-off but would push strongly against variations, because it is confusing and could impact the internal design negatively.

I'd place this in Policy to not promote it on the Cache interface, but rather leave it as an escape hatch. This way those who truly need it like you are not blocked, but we disfavor it compared to other api methods.

hshankar · 2020-05-15T23:35:29Z

That sounds good. Agreed that adding it to the main API is unclean from the abstraction point of view. "Quietly" might still have a broad interpretation. Maybe "untrackedGet()" might make it more clear that this get is not tracked anywhere and changes no internal state. Still vague I guess, but narrows it down a little bit. Or may be even "untrackedLookup()" to distinguish it more from get().

ben-manes · 2020-05-16T00:08:30Z

The term "quiet" was used in older caching libraries - Apache JCS and Ehcache v1 & v2. They defined it as,

JCS - Get an item from the cache without affecting its last access time or position.
Ehcache 2 - Gets an element from the cache, without updating Element statistics.

Ehcache added other quiet methods, putQuiet and removeQuiet, which also did not notify the listeners.

Put an element in the cache, without updating statistics, or updating listeners.
Removes an Element from the Cache, without notifying listeners.

JCache's api designed by Ehcache, which dropped these but confusingly added a removeAll that notifies listeners, and a clear that does not. It doesn't have a quiet get, put, remove though.

So quiet has some historic familiarity and isn't a verbose word. I am not sure if it's better to be vague or explicit. Since this is a 2nd class api method, perhaps being more vague is better than over specifying?

The other consideration is we tried to extend the Collection Framework in Guava rather then invent or redefine terms. That has a nicer consistency, but also means that breaking away feels very harsh and foreign. We differ only in that a map is a passive data structure, so it has a passive get and active computeIfAbsent. Since the cache is active (e.g. expiration), we made inverted it to getIfPresent and get. That's why I'd go with some form of getIfPresent as a more consistent experience.

Maaartinus · 2020-05-16T15:46:53Z

The implementation that I'd propose is to add cache.policy().getIfPresentQuietly(key) which would be skip updating any policy (eviction, expiration, stats). It would do the minimal work of not returning expired entries, making it equivalent to containsKey except by returning the value.

I don't know what other "quiet" methods might mean and the variations could be very painful. For example don't update expiration but update the eviction policy. Or overwrite a value but don't notify a listener of the old value's removal.

This makes "quiet" quite unusable. As the naming can never be good and the number of possibilities is large, what about thinking about them the other way round? Instead of choosing what options are the most important ones and finding a name for them, imagine, you'd like to provide all of them via a single method. Something like .get(key, ONLY_IF_PRESENT, SKIP_LISTENERS, SKIP_POLICY). This way, java.nio.Files.move provides 32 variations. Not sure, whether it's doable or too complicated, but it saves you from naming problems. Anyway, this should be useful as a thought experiment at least.

For avoiding varargs, there should probably be an options object, which should be pre-created by the user. As this is just a secondary API, it's a reasonable burden on the users. The options object could also prohibit combinations which make no sense (if such exists).

ben-manes · 2020-05-16T18:20:28Z

That's exactly the right approach if suppressions were not problematic to mix in with concurrency. We'd go a step further and offer a suppressed view, e.g.

Cache<K, V> suppressed(Suppression suppression, Suppression... suppressions);

This becomes challenging when with concurrency because it directly impacts the design. The cache algorithms are not multi-thread friendly, so we split it apart from the concurrent hash table using a pub-sub approach. Instead of applying the change on both under a lock, after the map operation we publish an event to a queue and trigger a consumer to apply it asynchronously to the policy. These suppression options would need to be propagated in those events to be honored when applying to the policy data structures.

Since performance is a core feature, we are very mindful to minimize allocations as a high rate directly limits throughput. Almost all of our performance gains over Guava is how this is done. Back then I started simpler by using a ConcurrentLinkedQueue, which allocates a wrapping Node. Switching to ring buffers, where no allocation is needed on a read by storing the entry, increases throughput by 25x. Unfortunately that patch was never accepted due to lack of interest.

For a get the suppressions would need to avoid updating the eviction or expiration policy. These are built on top of linked lists to cheaply reorder in O(1) time (see LRU), but an option might allow one or both to be skipped. While those algorithms are not concurrent, this doesn't matter thanks to our pub-sub behavior (buffer & replay under lock). However we'd need to change our publishing to allocate and significantly hurt our performance if done in the general case.

There are hacks, though, if we accept degrading only the suppressed view. The published entry is an an abstract type, so we could wrap the original with a delegate that captures these suppressions. When consuming the entry it would be unwrapped for the intrusive list operations. The performance hit is isolated, but it would be an invasive change to wrap/unwrap throughout the code and might be surprising for users to discover that disabling features reduces performance.

The question then becomes about the power-to-weight ratio and if a fully baked set of suppressions is worthwhile. So far it hasn't been and the few asks have been only for getIfPresent and are okay with a very restrictive definition. Until now, those were also solvable without a new api method or lacked data to sway the argument. For now I'd be open but very hesitant to offer a richer set of quiet methods and, if implemented, there is a very low api cost to have a redundant getIfPresentQuietly linger while promoting its sibling suppressed view alternative instead.

hshankar · 2020-05-18T01:17:53Z

Thanks for the historical context! Makes sense to keep it consistent. It feels better to know that older implementations also needed this kind of behavior. Also just noticed there is a getIfPresentQuietly() method already in LocalCache which does this. I can send out a PR to expose it from the Policy interface if it sounds good. Let me know.

ben-manes · 2020-05-18T01:19:25Z

Thanks, I have it all done with tests locally :)

ben-manes · 2020-05-18T02:20:52Z

When the snapshot is built, please take a quick look to see if the new method works as desired in your application. This will be available on the Sonatype snapshot repo or by jitpack on master. If its a good match then I'll cut a release.

V value = cache.policy().getIfPresentQuietly(key);

hshankar · 2020-05-18T05:46:55Z

Awesome, thanks for this! I took a look at what I'd need to do to test it without a release version and it is a substantial amount of work. I'd have to plumb these changes through multiple layers because the application transitively depends on caffeine. There's also some red tape around pushing non-release versions in production. Given the effectiveness on trace logs, I am confident it should work for us.

ben-manes · 2020-05-18T05:54:22Z

Yep, no need for pushing to production. I just wanted to ensure you looked at the usage code to make sure this works for you. I’ll release tonight or tomorrow.

ben-manes · 2020-05-18T07:13:22Z

released 2.8.3

ben-manes closed this as completed in 49e3e4f May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get API that does not update eviction heuristics #418

get API that does not update eviction heuristics #418

hshankar commented May 13, 2020

ben-manes commented May 13, 2020

ben-manes commented May 13, 2020

hshankar commented May 13, 2020

ben-manes commented May 13, 2020

hshankar commented May 13, 2020

ben-manes commented May 13, 2020 •

edited

Loading

hshankar commented May 15, 2020 •

edited

Loading

ben-manes commented May 15, 2020

hshankar commented May 15, 2020

ben-manes commented May 15, 2020

hshankar commented May 15, 2020

ben-manes commented May 16, 2020

Maaartinus commented May 16, 2020

ben-manes commented May 16, 2020 •

edited

Loading

hshankar commented May 18, 2020

ben-manes commented May 18, 2020

ben-manes commented May 18, 2020

hshankar commented May 18, 2020

ben-manes commented May 18, 2020

ben-manes commented May 18, 2020

get API that does not update eviction heuristics #418

get API that does not update eviction heuristics #418

Comments

hshankar commented May 13, 2020

ben-manes commented May 13, 2020

ben-manes commented May 13, 2020

hshankar commented May 13, 2020

ben-manes commented May 13, 2020

hshankar commented May 13, 2020

ben-manes commented May 13, 2020 • edited Loading

hshankar commented May 15, 2020 • edited Loading

ben-manes commented May 15, 2020

hshankar commented May 15, 2020

ben-manes commented May 15, 2020

Large trace (80,000)

Small trace (256)

hshankar commented May 15, 2020

ben-manes commented May 16, 2020

Maaartinus commented May 16, 2020

ben-manes commented May 16, 2020 • edited Loading

hshankar commented May 18, 2020

ben-manes commented May 18, 2020

ben-manes commented May 18, 2020

hshankar commented May 18, 2020

ben-manes commented May 18, 2020

ben-manes commented May 18, 2020

ben-manes commented May 13, 2020 •

edited

Loading

hshankar commented May 15, 2020 •

edited

Loading

ben-manes commented May 16, 2020 •

edited

Loading