Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The order of eviction in weight-limited cache makes it essentially static #583

Closed
Demonofthe2ndkind opened this issue Sep 1, 2021 · 8 comments

Comments

@Demonofthe2ndkind
Copy link

Demonofthe2ndkind commented Sep 1, 2021

Hi Ben,

First of all, thanks for the excellent cache library. It works great! However, I found that if I switch from a size-and-time based eviction to a weight-and-time based eviction, the eviction starts removing newly added elements making the cache effectively static in the absence of TTL-based removal.

Here is how my cache is configured

 cache = Caffeine.newBuilder()
                    .maximumWeight(250)
                    .weigher((Key key, Data value) -> {
                        System.out.println("Adding key: "+ key.getText() + ", weight: " + (key.getLength() + value.getWeight()));
                        return key.getLength() + value.getWeight();
                    })
                    .expireAfterAccess(3600, TimeUnit.SECONDS)
                    .scheduler(com.github.benmanes.caffeine.cache.Scheduler.systemScheduler())
                    .evictionListener((key, value, cause) -> {
                        System.out.println("Evicting key: " + key.getText());
                    })
                    .build();

and this is the result of a test run

Adding key: test = 10, weight: 25
Adding key: test = 11, weight: 25
Adding key: test = 12, weight: 25
Adding key: test = 13, weight: 25
Adding key: test = 14, weight: 25
Adding key: test = 15, weight: 25
Adding key: test = 16, weight: 25
Adding key: test = 17, weight: 25
Adding key: test = 18, weight: 25
Adding key: test = 19, weight: 25
Adding key: test = 20, weight: 25
Evicting key: test = 13
Adding key: test = 21, weight: 25
Evicting key: test = 10
Adding key: test = 22, weight: 25
Evicting key: test = 11
Adding key: test = 23, weight: 25
Evicting key: test = 12
Adding key: test = 24, weight: 25
Evicting key: test = 14
Adding key: test = 25, weight: 25
Evicting key: test = 15
Adding key: test = 26, weight: 25
Evicting key: test = 16
Adding key: test = 27, weight: 25
Evicting key: test = 17
Adding key: test = 28, weight: 25
Evicting key: test = 28
Adding key: test = 29, weight: 25
Evicting key: test = 29
Adding key: test = 30, weight: 25
Evicting key: test = 30
Adding key: test = 31, weight: 25
Evicting key: test = 31
Adding key: test = 32, weight: 25
Evicting key: test = 32
Adding key: test = 33, weight: 25
Evicting key: test = 33
Adding key: test = 34, weight: 25
Evicting key: test = 34
Adding key: test = 35, weight: 25
Evicting key: test = 35
Adding key: test = 36, weight: 25
Evicting key: test = 36
Adding key: test = 37, weight: 25
Evicting key: test = 37
Adding key: test = 38, weight: 25
Evicting key: test = 38
Adding key: test = 39, weight: 25
Evicting key: test = 39
Adding key: test = 40, weight: 25
Evicting key: test = 40
Adding key: test = 41, weight: 25
Evicting key: test = 41

Remaining elements in the cache

In the cache
test = 18
test = 19
test = 20
test = 21
test = 22
test = 23
test = 24
test = 25
test = 26
test = 27

I am on Caffeine v3.0.3.

I am aware of #513. I tried increasing the max weight, but that only delayed the moment when the cache starts evicting the latest additions.

Any suggestions?

@ben-manes
Copy link
Owner

Can you provide a runnable test case? You could simply capture a trace of key hashes. Many of the simulator's traces are like that, and would allow us to run against a variety of policies to see the hit rate options.

It is true that the cache defaults towards LFU / MRU by rejecting new candidates, but this is only if less than or equal to the victim's frequency. In a small cache like yours, it can be difficult to capture enough statistics to determine the workload distribution. We do that by a CountMinSketch (historic popularity) and hill climbing (adaptive region sizes) in order to obtain a high hit rate. A cache of only 10 entries might simply stay in an MRU configuration because repeated hits are too spread out. In your case of a scan there is no clear benefit of keeping the new arrival, e.g. think a sql row scan, so the main region is protected from one-hit wonders polluting the cache.

@Demonofthe2ndkind
Copy link
Author

Demonofthe2ndkind commented Sep 1, 2021

Here's a runnable test case

package com.reproducible.case;

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;

import java.util.Objects;
import java.util.concurrent.TimeUnit;

public class CacheEvictionIssue {
    private final int MAX_WEIGHT = 250;
    private final int DURATION_SEC = 3600;

    private Cache<Key, Data> cache;

    public static class Key {
        private final String text;

        public Key(String text) {
            this.text = text;
        }

        public String getText() {
            return text;
        }

        public int getLength() {
            return text.length();
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;
            Key key = (Key) o;
            return text.equals(key.text);
        }

        @Override
        public int hashCode() {
            return Objects.hash(text);
        }
    }

    public static class Data {
        public int getWeight() {
            return 16;
        }
    }

    public CacheEvictionIssue() {
        cache = Caffeine.newBuilder()
                .maximumWeight(MAX_WEIGHT)
                .weigher((Key key, Data value) -> {
                    System.out.println("Adding key: "+ key.getText() + ", weight: " + (key.getLength() + value.getWeight()));
                    System.out.flush();
                    return key.getLength() + value.getWeight();
                })
                .expireAfterAccess(DURATION_SEC, TimeUnit.SECONDS)
                .scheduler(com.github.benmanes.caffeine.cache.Scheduler.systemScheduler())
                .evictionListener((key, value, cause) -> {
                    System.out.println("Evicting key: " + key.getText());
                    System.out.flush();
                })
                .build();
    }

    public void add(String text) throws Exception {
        Key key = new Key(text);
        cache.getIfPresent(key);
        cache.put(key, new Data());
        Thread.sleep(500);
    }

    public void test() throws Exception {
        for (int i = 10; i < 42; i++) {
            add("text = " + i);
        }
    }

    public static void main(String... args) throws Exception {
        new CacheEvictionIssue().test();
        Thread.sleep(5000);
    }

}

Note that the delay in add() is crucial; without it the behavior is different.
The output:

Adding key: text = 10, weight: 25
Adding key: text = 11, weight: 25
Adding key: text = 12, weight: 25
Adding key: text = 13, weight: 25
Adding key: text = 14, weight: 25
Adding key: text = 15, weight: 25
Adding key: text = 16, weight: 25
Adding key: text = 17, weight: 25
Adding key: text = 18, weight: 25
Adding key: text = 19, weight: 25
Adding key: text = 20, weight: 25
Evicting key: text = 10
Adding key: text = 21, weight: 25
Evicting key: text = 11
Adding key: text = 22, weight: 25
Evicting key: text = 12
Adding key: text = 23, weight: 25
Evicting key: text = 13
Adding key: text = 24, weight: 25
Evicting key: text = 14
Adding key: text = 25, weight: 25
Evicting key: text = 15
Adding key: text = 26, weight: 25
Evicting key: text = 16
Adding key: text = 27, weight: 25
Evicting key: text = 17
Adding key: text = 28, weight: 25
Evicting key: text = 28
Adding key: text = 29, weight: 25
Evicting key: text = 29
Adding key: text = 30, weight: 25
Evicting key: text = 30
Adding key: text = 31, weight: 25
Evicting key: text = 31
Adding key: text = 32, weight: 25
Evicting key: text = 32
Adding key: text = 33, weight: 25
Evicting key: text = 33
Adding key: text = 34, weight: 25
Evicting key: text = 34
Adding key: text = 35, weight: 25
Evicting key: text = 35
Adding key: text = 36, weight: 25
Evicting key: text = 36
Adding key: text = 37, weight: 25
Evicting key: text = 37
Adding key: text = 38, weight: 25
Evicting key: text = 38
Adding key: text = 39, weight: 25
Evicting key: text = 39
Adding key: text = 40, weight: 25
Evicting key: text = 40
Adding key: text = 41, weight: 25
Evicting key: text = 41

Increasing the cache maxWeight just delays the appearance of the issue, but does not eliminate it. For comparison, here is the output when running the same example with maxSize = 10:

Adding key: text = 10, weight: 25
Adding key: text = 11, weight: 25
Adding key: text = 12, weight: 25
Adding key: text = 13, weight: 25
Adding key: text = 14, weight: 25
Adding key: text = 15, weight: 25
Adding key: text = 16, weight: 25
Adding key: text = 17, weight: 25
Adding key: text = 18, weight: 25
Adding key: text = 19, weight: 25
Adding key: text = 20, weight: 25
Evicting key: text = 10
Adding key: text = 21, weight: 25
Evicting key: text = 11
Adding key: text = 22, weight: 25
Evicting key: text = 12
Adding key: text = 23, weight: 25
Evicting key: text = 13
Adding key: text = 24, weight: 25
Evicting key: text = 14
Adding key: text = 25, weight: 25
Evicting key: text = 24
Adding key: text = 26, weight: 25
Evicting key: text = 25
Adding key: text = 27, weight: 25
Evicting key: text = 26
Adding key: text = 28, weight: 25
Evicting key: text = 27
Adding key: text = 29, weight: 25
Evicting key: text = 28
Adding key: text = 30, weight: 25
Evicting key: text = 29
Adding key: text = 31, weight: 25
Evicting key: text = 30
Adding key: text = 32, weight: 25
Evicting key: text = 31
Adding key: text = 33, weight: 25
Evicting key: text = 32
Adding key: text = 34, weight: 25
Evicting key: text = 33
Adding key: text = 35, weight: 25
Evicting key: text = 34
Adding key: text = 36, weight: 25
Evicting key: text = 35
Adding key: text = 37, weight: 25
Evicting key: text = 36
Adding key: text = 38, weight: 25
Evicting key: text = 37
Adding key: text = 39, weight: 25
Evicting key: text = 38
Adding key: text = 40, weight: 25
Evicting key: text = 39
Adding key: text = 41, weight: 25
Evicting key: text = 40

I would prefer to see FIFO eviction (all other things considered) but it's better that evicting the newcomers.

The advantage of not evicting new elements is that the content of the cache follows the moving "hot spot". If we evict newcomers, the cache will fill in with no longer relevant elements that will stay there for the duration of TTL. As the "hot spot" moves, the cache hit rate will drop.

To make the cache useful in this scenario as well as in the scenario you described, maybe we should have a parameter that sets the eviction preference.

@Demonofthe2ndkind
Copy link
Author

I ran a few more tests with both the size based and the weight based eviction adding 800 elements to 100-element cache (by size or weight). The cache seems to be heavily LIFO biased with a larger bias when the cache is set to the weight-based eviction.

@ben-manes
Copy link
Owner

ben-manes commented Sep 1, 2021

The cache is optimized towards maximizing the hit rate and the algorithms evolve over time. In your test case there is a 0% hit rate (32 misses, 22 evictions). From a policy's perspective evicting any entry is fair, but each has a bias. Guava's LRU will keep different content than Caffeine's, but the result is identical.

The cache defaults to a policy configuration that is similar to a LIFO. This is because many workloads have "one-hit wonders" which should be discarded quickly. If there are hits then the recent entries will be promoted for retention and the cache will be frequency-biased. As many samples are acquired, the cache may reconfigure itself to prefer holding new arrivals longer if it determines that is more beneficial. That can result in a policy that is roughly LRU. For example in the workload below the cache switches adapts due to a sequence of LRU => MRU => LRU traces. When I simulate a weighted workload that is LRU-biased, like Twitter's, then I do see the cache reconfigure itself appropriately.

image

We don't make any promises about what our algorithms prefer to keep, only that we will try to maximize the hit rate by taking multiple signals into account. If you have a business need for LRU and don't want to rely on our adaptivity to suss that, then you should use an LRU cache. There may be excellent reasons for that requirement, but is it is not something that we promises in our offering.

It sounds like you don't have a requirement, just a different default expectation. A "hot spot" should be captured by frequency, e.g. we do well on a Zipfian trace. I think that if you tried a real workload then you'd see that this cache does quite well. We have many in our simulator and can easily add new ones. From your test, though, I don't think one should have any expectations on the contents when optimal is a 0% hit rate.

@ben-manes
Copy link
Owner

ben-manes commented Sep 2, 2021

An example using a CDN object trace from the AdaptSize caching policy, which is a weight-based algorithm. In this case the maximum is set to 5,000,000 and the largest entry is 4,092,928.

Policy Hit Rate
Mru 8.07%
Fifo 21.53%
Lru 22.59%
Guava 22.72%
AdaptSize 23.62%
Caffeine 24.00%
Gdsf 30.95%

Here you can see that Caffeine does not mimic MRU, which has the worst hit rate. It more closely follows LRU, but improves upon it by a small margin. The gold standard, Greedy-Dual-Size-Frequency, is the clear leader but this costs O(lg n) per operation. There is a possible improvement that would close this gap described in a recent paper, but I have not yet implemented that. The paper shows more real world workloads and that we're already very competitive to the leading options.

@Demonofthe2ndkind
Copy link
Author

Ben, thank you for the explanation and all the details. My real-world cache is rather small. I am going to run it in a staging environment and see how well the eviction behaves.

@ben-manes
Copy link
Owner

Great. Please let me know the results. If you can capture an access trace then we can dissect it. I’ll close for now but reopen if you get some data for us.

Small caches may be difficult for us because there is not much room to adapt. For example the hill climber changes the ratio by a step size, e.g. 5%, and monitors the impact for a sample period. A 10 entry cache can’t be climbed with that step size and changes might be too noisy to discern if we could. No one has reported a problem yet, but I believe there could be improvements made if we can find a real world trace that we do poorly at.

@ben-manes
Copy link
Owner

The cache maintains the item's popularity outside of the working set. This is done by using a 4-bit CountMinSketch (FrequencySketch) that is periodically aged. If the item was prematurely evicted then it will incur another cache miss, each of which causes its score to increase. When evicting we compare a recent arrival to the LRU victim and retain whichever is considered more valuable. This is referred to as the TinyLFU admission policy.

We extend that idea by using an admission window (an LRU) to delay the evaluation of admitting into the main region. The cache starts at a 1% admission window / 99% main region ratio. We then adjust the ratio using a hill climber by observing the hit ratio within the sample period. This allows the cache to reconfigure itself as shown above to maximize the hit rate to the workload's pattern. In the example above it switches between LRU and MRU traces.

If you capture an access trace (log the key's hash and weight, if applicable, on every get request) then we can then run it in the simulator so that you can see the hit rate curves across different algorithms. We've run it across dozens of different workload types from public sources in order to try to find an approach that is robustly near optimal (either the best or comparable). The simulator implements the common and advanced policies, so we can be data driven in our optimizations of the algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants