Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues of Caffeine with time based expiration (vs Guava) #1320

Open
sfc-gh-emammedov opened this issue Nov 15, 2023 · 6 comments

Comments

@sfc-gh-emammedov
Copy link

Hi,

We are considering using Caffeine in one of our projects where Guava is currently used. Before making the switch we wanted to run a performance experiment against both caches.

When building the cache we are relying on time based expiration:

...
    .expireAfterWrite(60L * 60L * 2L, TimeUnit.SECONDS)
    .expireAfterAccess(60L * 60L * 2L, TimeUnit.SECONDS)
.build()

The benchmark (https://github.com/sfc-gh-emammedov/guava-caffeine-performance-comparison/blob/main/CaffeineVsGuava/src/com/example/cache/CaffeineVsGuavaTest.java) has the following knobs (default setting):

  • THREAD_COUNT (10) concurrent threads accessing the cache (either read or write)
  • UNIQUE_ENTITIES_COUNT (1000) number of unique entities stored in cache
  • TOTAL_OPERATIONS_PER_THREAD (100) number of operations each thread will perform against each unique entity in the cache
  • READ_RATIO (0.9, i.e. 90%) ratio of read threads (the remaining threads are assigned as write threads

During the tests we noticed that although read operations are faster with Caffeine, write operations are slower. Using IntelliJ profiler we observed that a bunch of time was spent for maintenance related tasks (in this simplified benchmark, it might not be the most expensive operation, but during our internal cache tests, majority of the time was spent in scheduleDrainBuffers and eventually during the thread unparking):
flamegraph

We tried different executors for Caffeine, but that did not help either. We tried following:

  • Caffeine (Executors.newSingleThreadExecutor())
  • Caffeine (Executors.newFixedThreadPool(10))
  • Caffeine (Runnable::run, basically running the maintenance on main thread)

We then tried to remove time based expiration completely and that made the real difference. Caffeine was way faster than Guava then.

Here are the results of benchmark (durations in nanoseconds):

Guava

  • Avg benchmark dur:32899479
  • Avg read dur:965
  • Avg write dur:300

Caffeine

  • Avg benchmark dur:52712004
  • Avg read dur:734
  • Avg write dur:510

Caffeine without expiration

  • Avg benchmark dur:16337327
  • Avg read dur:83
  • Avg write dur:150

We were wondering if this is an expected behaviour for Caffeine when it is configured with time based expiration or whether we are missing some key configuration knob which would make Caffeine performant with time based expiration?

Thank you!

@ben-manes
Copy link
Owner

Thanks for taking the time to benchmark and provide your findings. Here are a few observations,

  1. You should probably use jmh for benchmarks to ensure you do not bias the analysis (see ours as examples)
  2. Enabling both expiration modes doesn't make sense in practice, as here expireAfterWrite is redundant.
  3. You can adjust concurrencyLevel in Guava for higher throughputs
  4. A repeated full scan by all threads is not a realistic distribution, power law like zipf follows a hot-cold pattern
  5. Caffeine uses a write buffer to schedule maintenance for a batch of work, assuming that it can hide latencies either due to a low write rate or writes to popular entries. When the buffer is full then it stalls writers as backpressure to avoid runaway growth. You are probably forcing this and since only one thread performs maintenance at a time, it degrades total write throughput.
  6. If using put then we do have a write tolerance to cope with a flood of expireAfterWrite updates, where updates within 1s are considered close enough to downgrade to a lossy read buffer event. That optimization isn't present on computes / merge, though probably could be. If you switch you might see a large speed up as the write buffer is less stressed.
  7. Guava splits the cache into N segments, which improves throughput at various costs. A write-heavy workload is rare, but if needed then we defer that optimization to users who can decide if the tradeoff is worthwhile. That is simply to stripe by the keys hash to chose a cache, e.g. caches[key.hashCode() % caches.length].put(key, value).

@ben-manes
Copy link
Owner

Using merge is a bit odd since that is a forced write, whereas computeIfAbsent is usually the behavior that you want. That does a read before falling back to a write if absent or expired, and is what both Guava and Caffeine are optimized for. You can see that while merge is less optimized, those more common cases are faster in your benchmark harness. (Note jmh should still be strongly preferred)

Merge

Guava

Avg benchmark dur:52921453
Avg read dur:546
Avg write dur:439

Caffeine

Avg benchmark dur:60350082
Avg read dur:362
Avg write dur:553

Caffeine without expiration

Avg benchmark dur:33437848
Avg read dur:51
Avg write dur:314

Put

Guava

Avg benchmark dur:43992562
Avg read dur:516
Avg write dur:338

Caffeine

Avg benchmark dur:26648787
Avg read dur:306
Avg write dur:170

Caffeine without expiration

Avg benchmark dur:13292782
Avg read dur:73
Avg write dur:97

Compute If Absent

Guava

Avg benchmark dur:45823483
Avg read dur:571
Avg write dur:378

Caffeine

Avg benchmark dur:21660199
Avg read dur:484
Avg write dur:155

Caffeine without expiration

Avg benchmark dur:8973124
Avg read dur:130
Avg write dur:62

@sfc-gh-emammedov
Copy link
Author

Thank you for the detailed reply!

The reason we are using merge is that there are multiple threads accessing and updating the cache. Each thread is fetching the latest data from the database and putting the up-to-date entity into the cache. The entities are versioned. Whenever a thread wants to write an entity into the cache we need to ensure that version of the entity is increasing (because otherwise we would be writing a stale value into the cache). merge allows us to do that atomically (based on the implementation of LocalCache in Guava and AFAIU Caffeine provides the same guarantees). I am not sure if this could be replicated via put method (without explicit locking involved).

You mentioned that put has an optimisation to cope with a flood of expireAfterWrite updates. Is it possible to add a similar optimisation for merge?

@ben-manes
Copy link
Owner

ben-manes commented Nov 17, 2023

That sounds like a good reason to use merge, thanks for clarifying. Note that if you are not already using it in Guava, be aware that their computes have had nasty bugs such as corruption or deadlocks. I helped fix a few, but since it is difficult to get fixes merged there are still open items so consider reviewing the bug list.

I think the optimization could be applied. It was added to resolve a similar benchmark concern (orbit/orbit#144 (comment)). I don't think you'll run into this as a bottleneck in a non-synthetic benchmark since the application and I/O time, as well as the item distribution, will give the cache enough time to flush the write buffer and hide the latency. It's worth doing to alleviate concerns, but it shouldn't be a blocker for you.

@sfc-gh-emammedov
Copy link
Author

Got you, makes sense.

Just curious, would you have time to help add an optimisation for merge?

@ben-manes
Copy link
Owner

It’s to hard to say given a busy week and the holiday season. I might get to it this weekend, or not. I can’t say tbh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants