Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the eviction strategy be configured? #247

Open
Millione opened this issue Mar 29, 2023 · 11 comments
Open

Can the eviction strategy be configured? #247

Millione opened this issue Mar 29, 2023 · 11 comments
Labels
question Further information is requested

Comments

@Millione
Copy link

Right now the eviction strategy is LRU, but I want to use SampledLFU strategy to suit my usage scenario, how could I get there?

@tatsuya6502
Copy link
Member

tatsuya6502 commented Mar 30, 2023

Hi. Moka does not support other eviction strategies like SampledLFU. I do not have a plan to support it/them as the current strategy seems to give good overall performance. But nothing stops someone from implementing other strategies.

To implement it, you will start to look at the admit associate function of sync_base::base_cache::Inner.

/// Performs size-aware admission explained in the paper:
/// [Lightweight Robust Size Aware Cache Management][size-aware-cache-paper]
/// by Gil Einziger, Ohad Eytan, Roy Friedman, Ben Manes.
///
/// [size-aware-cache-paper]: https://arxiv.org/abs/2105.08770
///
/// There are some modifications in this implementation:
/// - To admit to the main space, candidate's frequency must be higher than
/// the aggregated frequencies of the potential victims. (In the paper,
/// `>=` operator is used rather than `>`) The `>` operator will do a better
/// job to prevent the main space from polluting.
/// - When a candidate is rejected, the potential victims will stay at the LRU
/// position of the probation access-order queue. (In the paper, they will be
/// promoted (to the MRU position?) to force the eviction policy to select a
/// different set of victims for the next candidate). We may implement the
/// paper's behavior later?
///
#[inline]
fn admit(
candidate: &EntrySizeAndFrequency,
cache: &CacheStore<K, V, S>,
deqs: &Deques<K>,
freq: &FrequencySketch,
) -> AdmissionResult<K> {
const MAX_CONSECUTIVE_RETRIES: usize = 5;
let mut retries = 0;
let mut victims = EntrySizeAndFrequency::default();
let mut victim_nodes = SmallVec::default();
let mut skipped_nodes = SmallVec::default();
// Get first potential victim at the LRU position.
let mut next_victim = deqs.probation.peek_front();
// Aggregate potential victims.
while victims.policy_weight < candidate.policy_weight {
if candidate.freq < victims.freq {
break;
}
if let Some(victim) = next_victim.take() {
next_victim = victim.next_node();
let vic_elem = &victim.element;
if let Some(vic_entry) = cache.get(vic_elem.hash(), |k| k == vic_elem.key()) {
victims.add_policy_weight(vic_entry.policy_weight());
victims.add_frequency(freq, vic_elem.hash());
victim_nodes.push(NonNull::from(victim));
retries = 0;
} else {
// Could not get the victim from the cache (hash map). Skip this node
// as its ValueEntry might have been invalidated.
skipped_nodes.push(NonNull::from(victim));
retries += 1;
if retries > MAX_CONSECUTIVE_RETRIES {
break;
}
}
} else {
// No more potential victims.
break;
}
}
// Admit or reject the candidate.
// TODO: Implement some randomness to mitigate hash DoS attack.
// See Caffeine's implementation.
if victims.policy_weight >= candidate.policy_weight && candidate.freq > victims.freq {
AdmissionResult::Admitted {
victim_nodes,
skipped_nodes,
}
} else {
AdmissionResult::Rejected { skipped_nodes }
}
}

Currently, it gets the LRU entry at line 1480 and 1488. You will need to change it to sample some entries and then get the LFU of them.

To sample entries and find the LFU, you will do the followings:

  1. Get the number of segments of the internal concurrent hash table (cht) by calling the num_cht_segments method.
  2. Get the keys of the entries in a cht segment by calling keys method.
    • fn keys(&self, cht_segment: usize) -> Option<Vec<Arc<K>>> {
      // Do `Arc::clone` instead of `Arc::downgrade`. Updating existing entry
      // in the cht with a new value replaces the key in the cht even though the
      // old and new keys are equal. If we return `Weak<K>`, it will not be
      // upgraded later to `Arc<K> as the key may have been replaced with a new
      // key that equals to the old key.
      self.cache.keys(cht_segment, Arc::clone)
      }
  3. Randomly sample keys and pick a victim entry with the lowest estimated popularity:
    1. Randomly pick a key from the keys.
    2. Get the entry for the key by calling the get_key_value_and_then method of sync_base::base_cache::Inner.
    3. Check the entry's is_admitted().
    4. If it is true:
      1. Increment the number of sampled entries by 1.
      2. Get the estimated popularity by calling freq.frequency(hash_of_the_key) method.
      3. If the popularity is lower than the current lowest popularity, update the lowest popularity and the victim entry.
    5. If not enough entries are sampled, repeat 3.

I hope this helps.

@tatsuya6502 tatsuya6502 added the question Further information is requested label Mar 30, 2023
@Millione
Copy link
Author

Really appreciate it, it definitely helps, I will try it.

Also, the background here is I have a go service using ristretto to cache, when I rewrote it with rust using Moka with the same configuration, the cache hit rate dropped from about 90% to 70%.

@tatsuya6502
Copy link
Member

I think there is another difference between Ristretto and Moka/Caffeine. Ristretto has larger frequency sketch than Moka.

https://github.com/dgraph-io/ristretto#features

Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).

Moka's frequency sketch (aka CountMin sketch) is direct port from Caffeine, and it has 4 bits per counter.

/// A probabilistic multi-set for estimating the popularity of an element within
/// a time window. The maximum frequency of an element is limited to 15 (4-bits)
/// and an aging process periodically halves the popularity of all elements.
#[derive(Default)]
pub(crate) struct FrequencySketch {

A 12-bit counter can count up to 4095, while a 4-bit counter can count up to 15. You might want to change Moka to use 12-bit counters as well to get a similar cache hit performance to Ristretto.

@Millione
Copy link
Author

A 12-bit counter can count up to 4095, while a 4-bit counter can count up to 15. You might want to change Moka to use 12-bit counters as well to get a similar cache hit performance to Ristretto.

Thanks, I'd like to try it. Is there any cache hit bench for Moka that I can reuse to compare with different bit counters, now I implemented 16-bit counters.

@tatsuya6502
Copy link
Member

Is there any cache hit bench for Moka ...

Yes, there are two.

Mokabench will be easier to run. It is written entirely in Rust. Follow the README and download the trace datasets called S3.lis and DS1.lis into its datasets directory. And then, run it with the following commands:

## Run with the S3 dataset.
$ cargo run --release -- --num-clients 1

## Run with the DS1 dataset.
$ cargo run --release -- --num-clients 1 --trace-file ds1

(Do not use OLTP.lis dataset. It is a very small dataset and the Moka result will be skewed because Moka has a huge write buffer and the most of the part of the dataset will fit into it)

Another benchmarking tool is Moka Cache Driver for the Caffeine Simulator. It is written in Rust and Java, so it will need more work to run. But a nice thing with it is that you can compare results with other caching strategies. I used it for creating the benchmark results on our Wiki.

@Millione
Copy link
Author

Millione commented Mar 30, 2023

A 12-bit counter can count up to 4095, while a 4-bit counter can count up to 15. You might want to change Moka to use 12-bit counters as well to get a similar cache hit performance to Ristretto.

I tried it, but still the same as before, the cache hit rate first goes up to 90% and stays there for a while, then goes down.

commit: feat: change frequency counter bits from 4 to 16

@tatsuya6502
Copy link
Member

A 12-bit counter can count up to 4095, while a 4-bit counter can count up to 15.

@Millione

Nice catch! dgraph-io/ristretto#335

It seems Ristretto uses 4-bit counter too. The max value is 15 and 0x0f (0b00001111) is a 4-bit mask.

https://github.com/dgraph-io/ristretto/blob/3177d9c9520c37d36b18113be01fea6393f63860/sketch.go#L114-L119

	// Counter value.
	v := (r[i] >> s) & 0x0f
	// Only increment if not max value (overflow wrap is bad for LFU).
	if v < 15 {
		r[i] += 1 << s
	}

@ben-manes
Copy link

@Millione you might also consider recording a trace (e.g. log a 64-bit hash of the key). Then you can run it against the different simulators that the projects provide and assert behavior, e.g. if any are not honoring the maximum size or misreporting the hit rate, and see what other policies report (like optimal). A trace of a few million events is typically pretty good; you want something many multiples of the cache size to ensure the run captures long-term the workfload characteristics.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Jan 24, 2024

Hi @Millione,

Also, the background here is I have a go service using ristretto to cache, when I rewrote it with rust using Moka with the same configuration, the cache hit rate dropped from about 90% to 70%.

FYI, you might be affected by the issue fixed by #348 for v0.12.2. The Sentry team reported us that #348 fixed a long standing issue they were observing where the cache hit rate would slowly degrade over time.

@Ten0
Copy link

Ten0 commented Jun 1, 2024

Is this issue solved?
The doc currently mentions TinyLFU as an possible eviction strategies in addition to LRU.

Right now the eviction strategy is LRU, but I want to use SampledLFU

Hi. Moka does not support other eviction strategies like SampledLFU. I do not have a plan to support it/them

image

@tatsuya6502
Copy link
Member

Hi @Ten0

As for the SampledLFU eviction strategy, no, it is not implemented.

Moka's default cache policy remains Caffeine style TinyLFU, which uses an access-order (LRU) queue for selecting a cache victim. (Cache victim is a candidate to be removed from the cache). And this issue is about adding Ristretto style TinyLFU, which uses random sampling for selecting a cache victim (thus SampledLFU).

I opened #389. I will evaluate cache hit performance of Ristretto style TinyLFU to see if it is worth to be implemented.

As for the original issue that the user who created this GH Issue was having with moka@#0.10.1 (see the following quote), we do not know if it has been resolved. But it is possible that it has been resolved.

#247 (comment)

the background here is I have a go service using ristretto to cache, when I rewrote it with rust using Moka with the same configuration, the cache hit rate dropped from about 90% to 70%.

We fixed a bug in moka@v0.12.2, which solved a long-standing hit rate issue for other user (below). It is possible that the user who created this GH Issue was affected by the bug.

#348 (comment)

FYI, I believe this fixed a long standing issue we were observing where the cache hit rate would slowly degrade over time after ~1 week of operation.

Updating to 0.12.2, the hit rate has been stable without any degradation for 2+ weeks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants