Skip to content

Cache Analysis History

masklinn edited this page Mar 30, 2024 · 1 revision

During the development of better caching options, collection and graphing were performed around cache algorithm suggestions in terms of hit rates and performances (on a 2021 M1 Pro macbook pro, on CPython 3.11) in order to get an idea of relative performances and advantages.

This page collects those performed "by hand", although some cache simulations were also done via libCacheSim, mostly by primary author of libCacheSim and S3-Fifo Juncheng Yang upon request for assistance1.

Hit Rates

min2 fifo-lp1 fifo-lp2 clock1 clock2 lru qdlru lfu3 qdlfu qdlpfifo s3fifo sieve
10 10% 1% 1% 1% 1% 1% 3% 3% 3% 3% 3% 3%
20 15% 2% 2% 2% 2% 2% 5% 6% 6% 6% 6% 5%
50 23% 4% 4% 4% 4% 4% 11% 11% 12% 12% 12% 12%
100 30% 8% 8% 8% 8% 7% 18% 18% 18% 19% 19% 19%
200 39% 14% 15% 14% 15% 13% 26% 26% 27% 28% 28% 27%
500 51% 28% 29% 28% 29% 26% 38% 39% 39% 40% 40% 39%
1000 60% 39% 41% 39% 41% 37% 48% 48% 49% 49% 50% 48%
2000 67% 51% 52% 51% 52% 49% 56% 57% 57% 57% 58% 57%
5000 73% 64% 64% 64% 64% 63% 64% 66% 64% 64% 66% 66%

Basic Resolver Performances

basic basic-clearing basic-random basic-fifo basic-fifolp1 basic-fifolp2 basic-clock0 basic-clock1 basic-clock2 basic-lru basic-qdlru basic-lfu basic-qdlfu basic-qdlp basic-s3fifo
10 420 419 418 420 417 415 414 414 415 424 414 413 410 410 412
20 420 423 419 417 415 412 411 409 411 412 399 398 392 390 390
50 420 407 407 417 407 404 402 402 401 404 385 377 365 365 361
100 420 398 392 388 385 384 388 384 383 386 341 339 341 336 338
200 420 391 371 371 363 358 370 362 362 364 311 309 304 305 304
500 420 354 330 326 300 293 326 302 296 310 257 253 253 252 250
1000 420 309 284 283 251 246 283 253 247 261 219 216 216 214 211
2000 420 262 236 238 205 200 237 205 200 213 185 184 185 185 181
5000 420 203 178 180 153 150 180 153 150 158 156 146 162 157 147
---
config:
    xyChart:
        width: 1200
        height: 500
---
xychart-beta
    x-axis "cache size" [10, 20, 50, 100, 200, 500, 1000, 2000, 5000]
    y-axis "average (ms)" 0 --> 500
    line "test" [420, 420, 420, 420, 420, 420, 420, 420, 420]
    line [419, 423, 407, 398, 391, 354, 309, 262, 203]
    line [418, 419, 407, 392, 371, 330, 284, 236, 178]
    line [420, 417, 417, 388, 371, 326, 283, 238, 180]
    line [417, 415, 407, 385, 363, 300, 251, 205, 153]
    line [415, 412, 404, 384, 358, 293, 246, 200, 150]
    line [414, 411, 402, 388, 370, 326, 283, 237, 180]
    line [414, 409, 402, 384, 362, 302, 253, 205, 153]
    line [415, 411, 401, 383, 362, 296, 247, 200, 150]
    line [424, 412, 404, 386, 364, 310, 261, 213, 158]
    line [414, 399, 385, 341, 311, 257, 219, 185, 156]
    line [413, 398, 377, 339, 309, 253, 216, 184, 146]
    line [410, 392, 365, 341, 304, 253, 216, 185, 162]
    line [410, 390, 365, 336, 305, 252, 214, 185, 157]
    line [412, 390, 361, 338, 304, 250, 211, 181, 147]

re2 Resolver Performances

This only checked the effect of a small selection of cache replacement algorithms on an re2 base resolver, in order to see how (possibly unnecessary) caches would impact that.

re2 re2-lru re2-lfu re2-s3fifo
10 32 32 33 34
20 32 33 34 32
50 32 32 31 30
100 32 31 29 28
200 32 29 26 25
500 32 26 22 22
1000 32 22 19 19
2000 32 19 16 16
5000 32 14 14 14

re2 resolver: re2 extraction

Using re2's FilteredRE2 feature allows avoiding the evaluation of an input against every regex in the set, thanks to a prefilter process. However FilteredRE2 only hands out which regexes in the set match the input, data extraction is a secondary step.

This data extraction can be done by compiling the regex via the standard library and applying that, or by accessing the regexes FilteredRE2 has already compiled internally and applying that. The table below checks the two approaches:

re2+re re2-lru+re re2-s3fifo+re re2-sieve+re re2+re2 re2-lru+re2 re2-s3fifo+re2 re2-sieve+re2
10 33 33 32 33 59 62 62 60
20 33 32 31 31 59 62 61 60
50 33 32 30 29 59 60 57 57
100 33 31 27 27 59 60 52 53
200 33 29 25 25 59 56 49 48
500 33 25 22 21 59 49 41 41
1000 33 22 19 19 59 42 37 36
2000 33 18 16 16 59 35 29 30
5000 33 15 14 13 59 27 25 25

While the timings are low compared to the basic regexes, and despite the overhead of re-compiling regexes via the standard library, the performance overhead of using re2 for extraction is a bit much (although compiled regexes are then cached, re-compiling on every extraction has not been benched).

1: Due to me having screwed up the implementation of qd-lp-fifo (the informal algorithm preceding S3) I was hitting very poor hit rates at higher cache sizes, the error was that all checks in the ghost cache essentially returned a miss, so the ghost cache's recovery feature never triggered.

2: Bélády's MIN (or OPTimal) is not a true cache eviction algorithm, it is an oracle which requires knowing the entire key sequence up-front and always makes the best possible decision (including ignoring keys entirely), I understand it has been proved optimal. It is provided as an informational and very theoretical upper bound.

3: A direct implementation of Matani and al's O(1) LFU, with no aging mechanism and thus mostly unsuitable to an online cache even if its potential issues did not show up on the available sample file.