Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add separate expirable LRU implementation, with generics #116

Merged
merged 7 commits into from Aug 7, 2023

Conversation

paskal
Copy link
Contributor

@paskal paskal commented Nov 20, 2022

Thread-safe. It could be used in place of simplelru.LRU but shouldn't as it has built-in locks, whereas simplelru.LRU doesn't, which allows more effective locking on top of it in top-level package cache implementations.

Similar to what was done in #41. I've tried to make the new cache as close as possible to the existing one regarding code style and behaviour, however existing simplelru.LRU is not thread-safe, as pointed out above, so it's not a drop-in replacement.

Another difference I left in place is that size=0 on this type of cache means unlimited, which makes sense with the expirable cache.

Original work was done in lcw, but it turned out that I don't need it there, and I thought it would be nice to add my changes to upstream (this repo).

This PR is a recreation of #68, which was not merged. cc @jefferai @Dentrax for review as an alternative to #113 (which I unlikely will be able to finish while preserving the current package's speed).

Copy link

@Dentrax Dentrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is a better approach than #113. Do you have any new benchmark results as you already did here?

PTAL @jefferai

README.md Outdated Show resolved Hide resolved
simplelru/expirable_lru.go Outdated Show resolved Hide resolved
simplelru/expirable_lru.go Outdated Show resolved Hide resolved
simplelru/expirable_lru.go Outdated Show resolved Hide resolved
simplelru/expirable_lru.go Outdated Show resolved Hide resolved
@jefferai
Copy link
Member

jefferai commented Dec 1, 2022

Is this code the same as the changed code in #113?

@paskal
Copy link
Contributor Author

paskal commented Dec 2, 2022

Precisely the same code as in #113, but in a separate file, as making these changes in existing simplelru decreases the performance of the service.

@Dentrax
Copy link

Dentrax commented Dec 19, 2022

Hey @jefferai, do you have any feedback/reviews for the PR, additionally?

@jefferai
Copy link
Member

No, I'm just busy with personal stuff (moving) on top of work and haven't been able to get to this yet, sorry.

@paskal
Copy link
Contributor Author

paskal commented Dec 19, 2022

I've completely forgot about this one, I'm addressing your comments in the meantime.

@gdsoumya
Copy link

Is this PR getting merged anytime soon?

@paskal
Copy link
Contributor Author

paskal commented Dec 23, 2022

I have to add few minor changes prior to that, I'll try to add them before the New Year.

@jefferai
Copy link
Member

I was hoping to look at combining the two but realistically I don't have time right now. That said there isn't really an issue with having it be separate and trying to merge them later if I see a good way. (Mostly I am just worried that fixes applied to one will be forgotten for the other.)

Please address the comments when you get a chance and I will then do another look through and hopefully get it merged.

@paskal
Copy link
Contributor Author

paskal commented Dec 26, 2022

Fixed benchmarks everywhere to calculate the hit ratio correctly. Affects only logging. Benchmarks:

Benchmark2Q_Rand-8    	 4362295	       273.2 ns/op
Benchmark2Q_Freq-8    	 5259523	       221.2 ns/op
BenchmarkARC_Rand-8   	 3586426	       329.8 ns/op
BenchmarkARC_Freq-8   	 4349251	       263.5 ns/op
BenchmarkLRU_Rand-8   	 9076308	       133.0 ns/op
BenchmarkLRU_Freq-8   	 9300483	       128.1 ns/op
BenchmarkExpirableLRU_Rand_NoExpire-8     	 7061056	       170.3 ns/op
BenchmarkExpirableLRU_Freq_NoExpire-8     	 7251532	       165.1 ns/op
BenchmarkExpirableLRU_Rand_WithExpire-8   	 6939873	       172.0 ns/op
BenchmarkExpirableLRU_Freq_WithExpire-8   	 7980356	       143.6 ns/op

@jefferai
Copy link
Member

jefferai commented Jan 3, 2023

@paskal Can you please stop force pushing once reviews have started? It breaks links in emails, it makes it harder to tell what has changed since a previous review, and it often ends up dropping comments from GitHub if it can't figure out where it should apply.

@jefferai
Copy link
Member

jefferai commented Jan 3, 2023

Curious about the choice of using a list and iterating through on a schedule vs. using a heap (e.g. adapting https://golang.google.cn/pkg/container/heap/) and having a timer keyed off of the next expiration (reset on any push if needed). That seems like it would purge more closely to timeouts.

@paskal
Copy link
Contributor Author

paskal commented Jan 6, 2023

If I recall correctly, the idea is to break out of checking the list as soon as the first non-expired item is found so that the complexity of cleanup is M, where M is the number of cleaned entries. With heap by expiry date usage, the complexity of cleanup increases to M*log(N), where N is the total number of entries as we are forced to rebalance the heap each time we drop the oldest item from there.

However, I see that breaking out of the loop is not implemented there, and I have useless continue: I'll fix that.

While we are speaking of the expiring mechanism, also I see that I don't remove the outdated items once I hit them on cache.Get, but I think it's okay as that couldn't alter the performance of existing expiry cleanup mechanism except for very special cache usage cases.

And last, in another cache implementation, I saw PurgeEvery set to half of entry TTL: It might be a better solution than the currently hardcoded 5 minutes default. What do you think?

@paskal
Copy link
Contributor Author

paskal commented Jan 6, 2023

I forgot we're dealing with LRU here, and the expiry check can't stop at first non-expired elements. When items are moved to the front after access, their TTL is left intact.

So the cleanup complexity now is plain N, with the additional heap would be M*log(N), and with the additional list would be M. Currently, we have the maximal size of the cache before the clean up, but with an additional list or heap, we can trigger the cleanup call in the background once we find an expired entry so that the size would be more balanced at the cost of more frequent cleanups with locks. Basically, we're delving into the Garbage Collection area of problems:)

Another simple improvement I can add to the current setup to decrease the complexity to M is checking for size == 0 and then not doing LRU so that the cleanup will take N for LRU expirable cache and M for the non-LRU expirable cache.

Please let me know if you want me to alter the solution and how.

@jefferai
Copy link
Member

So I don't have super great answers here, because anything that simplifies the mechanism necessarily trades off performance. As you noted, we're into the GC side of things.

One possibility is to actually load a timer for every single expiring entry. This has the benefit that expiration is essentially instant and you don't have to track anything, but it trades off goroutines. Vault does this, but Vault is not necessarily tracking as many objects as you might want to track here.

If we knew for sure that the key is something we could e.g. hash -- or require such a value for expiring entries -- you could use something like a heap but shard entries across some set of these. E.g. hash it, take the first byte as an index into 256 different heaps, and have a background process check each heap periodically.

Another approach is time bucketing -- keep a list or heap or something for all items expiring within some period of time. The challenge here is figuring out the bucketing size -- maybe you have 1000 entries expiring every minute, or maybe 1 entry every 1000 minutes -- but it means you only ever need to check one bucket of items when checking for expiration, and you can just run through the list.

@paskal
Copy link
Contributor Author

paskal commented Apr 18, 2023

I've added expiration per item. Force push due to rebasing on top of master. Performance:

Benchmark2Q_Rand-8       4531797               323.5 ns/op
Benchmark2Q_Freq-8       5347616               215.2 ns/op
BenchmarkARC_Rand-8      3702835               328.1 ns/op
BenchmarkARC_Freq-8      4809096               267.6 ns/op
BenchmarkLRU_Rand-8      9049192               133.5 ns/op
BenchmarkLRU_Freq-8      9291907               129.5 ns/op
BenchmarkExpirableLRU_Rand_NoExpire-8            3758601               308.4 ns/op
BenchmarkExpirableLRU_Freq_NoExpire-8            3941444               306.3 ns/op
BenchmarkExpirableLRU_Rand_WithExpire-8          3819110               323.2 ns/op
BenchmarkExpirableLRU_Freq_WithExpire-8          3205101               390.2 ns/op

@paskal paskal force-pushed the separate_expirable_LRU branch 2 times, most recently from dd3daea to 067032a Compare April 19, 2023 18:49
@jefferai
Copy link
Member

I am worried about a goroutine per item. Nothing stops you from having a million items with a million goroutines. Even partitioning it as some libs do can leave you with a very lengthy stack trace if you SIGQUIT. go-cache for instance uses a pool of goroutines to do expiration and even then any stack trace requires filtering out goroutine after goroutine.

@paskal
Copy link
Contributor Author

paskal commented Apr 19, 2023

Any other approach I can think of is more complicated to implement, read, comprehend, and troubleshoot. I know a production project with 200-250k goroutines, so the number is not a big problem.

Time buckets mean memory overhead, and with more complicated code, we're still having a big number of goroutines. For example, with buckets of size TTL/100, for 10ms it would be 100us, for 100ms - 1ms, 5m - 6s, 1h - 36s, 4h - 2m44s, 12h - 7m12s, 24h - 14m24s. The overhead of approximately 1% of entries not expiring on time looks reasonable to me.

What do you think?

@jefferai
Copy link
Member

I agree the number isn't a problem -- the troubleshooting is. A stack trace with 250k goroutines that have to be filtered out in order to get to the important bits makes any project using it much harder to troubleshoot.

Time bucketing could bring you down to a small number of goroutines -- in fact, you could just use a single one, with a timer that is reset to the expire time of the next bucket. So you could easily do 100 or 1000 buckets. I think the question then is one of philosophy for whoever wants to use this: do you care about items expiring at their expiration or not? I'm not sure if there is a clear answer, although I lean on the idea that you don't -- it's an LRU so items can be evicted at any time anyways. So probably the contract should be that you can be sure that when it's purged the callback is run, but the docs make it clear that you cannot count on it being purged at a specific time.

Thoughts?

@paskal
Copy link
Contributor Author

paskal commented Apr 23, 2023

I've reverted the changes introducing per-item cleanup and created 100 buckets cleaned up every 1/100th of TTL, clarifying code and comments in the process. I've outlined the tradeoffs in the code itself: we have ~1% memory overhead and entries are cleaned up to 1% earlier than their expiration date. This cleanup interval and overhead make sense for expiring TTL on intervals varying from 100ms to 10 days.

$ go test -bench=.
Benchmark2Q_Rand-8       4471934               268.0 ns/op
Benchmark2Q_Freq-8       5376478               220.6 ns/op
BenchmarkARC_Rand-8      3617769               326.1 ns/op
BenchmarkARC_Freq-8      4327766               247.9 ns/op
BenchmarkLRU_Rand-8      8963338               134.0 ns/op
BenchmarkLRU_Freq-8      9173343               131.8 ns/op
BenchmarkExpirableLRU_Rand_NoExpire-8            4597234               263.2 ns/op
BenchmarkExpirableLRU_Freq_NoExpire-8            4761190               249.8 ns/op
BenchmarkExpirableLRU_Rand_WithExpire-8          4427887               277.7 ns/op
BenchmarkExpirableLRU_Freq_WithExpire-8          5001783               230.5 ns/op

Because simplelru/list.go was changed, I've checked benchmark from this repo's current master, and the numbers are the same.

I'm satisfied with the current code and the tradeoffs I've made with overhead. Please let me know what you think.

@jefferai
Copy link
Member

jefferai commented Aug 1, 2023

Nope, just need time to get back to a review. @mgaffney can look as well.

simplelru/expirable_lru.go Outdated Show resolved Hide resolved
simplelru/expirable_lru.go Outdated Show resolved Hide resolved
@mgaffney
Copy link
Member

mgaffney commented Aug 3, 2023

@paskal can we move the expirable LRU code to its own package? Having simplelru.LRU and expirable.LRU types seems more idiomatic and readable to me than having simplelru.LRU and simplelru.ExpirableLRU types.

Thread-safe. It could be used in place of simplelru.LRU but shouldn't
as it has built-in locks, whereas simplelru.LRU doesn't, which allows
more effective locking on top of it in top-level package
cache implementations.
That sets the memory overhead to approximately 1% mark at the cost of
expiring the cache entries up to 1% faster than their TTL expires.

Previously, all entries were scanned for expiration by purgeEvery
interval, which created computation overhead, and after this commit, we
delete entries we want to delete without checking extra ones.
@paskal
Copy link
Contributor Author

paskal commented Aug 5, 2023

@mgaffney, it is done; please take a look at the current version. I've had to duplicate the list.go as its unexported methods are used and should not be exposed to the users of the library.

@SoMuchForSubtlety
Copy link
Contributor

I've had to duplicate the list.go as its unexported methods are used and should not be exposed to the users of the library.

It could be moved to internal, so it can be shared without being exported.

expirable/expirable_lru.go Outdated Show resolved Hide resolved
@paskal paskal requested a review from mgaffney August 7, 2023 15:19
expirable/expirable_lru.go Outdated Show resolved Hide resolved
Copy link
Member

@mgaffney mgaffney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paskal thank you for all your hard work on this!

@mgaffney mgaffney merged commit 3b3d259 into hashicorp:master Aug 7, 2023
2 checks passed
@paskal paskal deleted the separate_expirable_LRU branch August 7, 2023 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants