Introduce in-memory caching package #766

souleb · 2024-04-25T15:17:29Z

This PR will allow caching the authentication credentials retrieved by pkg/oci/auth. It should also enable future usage of the caching underlying mechanism.

Part of: #642

Store design

The store is K/V store that can store arbitrary objects. The store interface is a simplified version of client go store. If an object can be cached there, it should be in our store implementations. This is desirable, because the primary envisioned place for the store usage is during a reconciliation of custom resources retrieved from a shared informer cache.

The keys are generated dynamically with deterministic function. Accepting a function instead of key strings enables user to determine themselves the keys uniqueness constraint.

They store must be thread safe, it should support concurrent programs.

There are 2 main uses cases:

1. cache authentication credentials. A credentials key is written, read and overwritten multiple times by a single goroutine. If we plan to store a credential and refresh it every 6 hours and read every 30 minutes(reconciliation interval), the read/write ratio is 12:1. This is a likely scenario. Authentication credentials are expirable.
1. cache helm indexes. The new Store interface and underlying implementation should allow replacing https://github.com/fluxcd/source-controller/tree/main/internal/cache which is today used to cache helm indexes. A helm index is written, read and overwritten multiple times by multiple goroutines. Usually an index is stored and read multiple times (reconciliation of all dependent custom resources * interval) before being overwritten by a new version. It is read intensive. An index is read from local storage before being cached, if the cache is full, we could have several Custom Resources loading the same index in memory, in order to avoid that, we should evict older keys to make room.

Based on the two scenario above, the store should be optimized for reads.

We also have a scenario that needs keys to expirable and another that need them be evicted based on usage.

Hence two implementations are provided:

An expirable store for use case 1
An LRU cache for use case 2

The store interface is a simplified version of client go store. If an object can be cached there, it should be in our store implementations. We also provide a default way to generate keys, but we allow users to provide their own function. This should enable all needed levels of isolation. Signed-off-by: Soule BA <bah.soule@gmail.com>

cache/cache.go

Signed-off-by: Soule BA <bah.soule@gmail.com>

The cache instrumentation purpose is to be self contained. We want to avoid leaking any of the logic here in consuming code. Signed-off-by: Soule BA <bah.soule@gmail.com>

Signed-off-by: Soule BA <bah.soule@gmail.com>

cache/lru.go

cache/metrics.go

cache/cache.go

errordeveloper · 2024-05-08T11:54:55Z

Thanks for working on this, @souleb!

The store interface is a simplified version of client go store.

It would be handy to have a link in the code. At the moment this is only mentioned in a commit message and PR description.

What may need to be made more explicit is that the sore is persisted to a file.

souleb · 2024-05-18T22:19:44Z

Thanks for working on this, @souleb!

The store interface is a simplified version of client go store.

It would be handy to have a link in the code. At the moment this is only mentioned in a commit message and PR description.

What may need to be made more explicit is that the sore is persisted to a file.

There is a comment in store.go pointing to https://pkg.go.dev/k8s.io/client-go/tools/cache#Store.

Persisting to a file will be made in a follow up PR where we will outline the use case we have for it.

stefanprodan

LGTM

Thanks @souleb 🏅

cache/store.go

darkowlzz · 2024-06-03T17:49:28Z

cache/store.go

+	if key, ok := any(object).(ExplicitKey); ok {
+		return string(key), nil
+	}


If the input is an ExplicitKey already, which is of string type, what's the use case for passing it to IdentifiableObjectKeyFunc to obtain a string again?

When you always use the Get() function and never GetByKey() you might have some objects for which you only have a key. The ExplicitKey solve that.

darkowlzz · 2024-06-03T18:08:56Z

cache/store.go

+// StoreObject is a wrapper for an object with its identifying key.
+// It is used to store objects in a Store.
+type StoreObject[T any] struct {


I may be misunderstanding the usage of generics, please correct me, but this says that this is used to store objects in a Store and I do see it being passed to cache.Add() in the tests. Can't we use this StoreObject data type in the Store interface instead of any? As StoreObject in itself contains any. I'm thinking about it because I see very few usage of StoreObject data type.

cache/cache.go

Signed-off-by: Soule BA <bah.soule@gmail.com>

darkowlzz · 2024-06-04T20:32:10Z

cache/cache.go

+	if opt.registerer == nil {
+		return nil, ErrNoRegisterer
+	}


Doesn't this makes metrics a requirement instead of optional?
In our reconciler metrics, we don't do anything if there's no recorder/collector configured and let the component run, refer https://github.com/fluxcd/pkg/blob/runtime/v0.47.1/runtime/controller/metrics.go#L79 .

It's not optional at the moment. I'll have to refactor how we call the metrics funcs. I'll do that.

darkowlzz · 2024-06-04T21:04:32Z

cache/cache.go

+	c.mu.Unlock()
+	return KeyError{object, ErrFull}


If this returns an error and doesn't add the object in the cache, shouldn't it also have

c.metrics.incCacheRequests(StatusFailure)

?
Same in Update() below.

darkowlzz · 2024-06-04T21:05:46Z

cache/cache.go

+	key, err := c.keyFunc(object)
+	if err != nil {
+		return KeyError{object, err}


Shouldn't this too be recorded as a failed request in metrics?
Same in Update() below.

if the keyFunc is wrong, many actions will result in error due to user provided parameter. This might have a consequence of too many metrics. Also this error is not on the cache side, so I wouldn't record it.

But if the key derivation is part of the cache, the cache receives a request even for bad input. Key derivation failure takes place within the cache.

In the LRU cache, this seems to be implemented as suggested, except for in the Add() method.

darkowlzz · 2024-06-04T21:28:31Z

cache/cache.go

+	if item.expiration > 0 {
+		if item.expiration < time.Now().UnixNano() {
+			c.mu.RUnlock()
+			c.metrics.incCacheRequests("succes")


Suggested change

c.metrics.incCacheRequests("succes")

c.metrics.incCacheRequests(StatusSuccess)

darkowlzz · 2024-06-04T21:34:46Z

cache/cache.go

+	if c.labelsFunc != nil {
+		lvs, err = c.labelsFunc(object, len(c.metrics.getExtraLabels()))
+		if err != nil {
+			return res, false, KeyError{object, err}
+		}
+	}
+	key, err := c.keyFunc(object)
+	if err != nil {
+		return res, false, KeyError{object, err}
+	}


Shouldn't the failures in both of the above be recorded as failed cache request metrics?

darkowlzz · 2024-06-04T21:38:22Z

cache/cache.go

+	if err != nil {
+		return KeyError{object, err}


Failed cache request metric here too?

darkowlzz · 2024-06-04T21:50:35Z

cache/cache.go

+
+// Resize resizes the cache and returns the number of index removed.
+func (c *cache[T]) Resize(size int) int {
+	overflow := len(c.items) - size


Some guard for the overflow value would help prevent panic due to out of range errors. Passing any negative number as the size results in out of bound failure.
Maybe check the overflow value with the total capacity before proceeding.

I'll make size an uint

I'm afraid that a uint will need to be converted to an int for subtracting with the length and the integer overflow will also result in a negative number.
But maybe you can deal with it in some other way.

cache/cache.go

darkowlzz · 2024-06-04T22:22:09Z

cache/cache.go

+// GetExpiration returns the expiration for the given key.
+// Returns zero if the key is not in the cache or the item
+// has already expired.
+func (c *Cache[T]) GetExpiration(object T) (time.Duration, error) {


Looking at the definition of this is making me wonder about the return type.
In terms of the Expirable interface, accepting a duration is easier to set an expiration for now + x duration. But for getting the expiration, if the same duration is returned, it's difficult to know the exact time as the duration was calculated relative to when it was obtained. It would be easier to use if the exact time is obtained for GetExpiration.

Looking at its usage in #776, for example for AWS, the ECR token provides an exact time, but we calculate a relative time and return that. Then use that relative time to set the cache expiration. If there's any delay between obtaining the token and caching it due to any reason, world stops due to GC or the system hangs momentarily, the relative time will be incorrect.

darkowlzz · 2024-06-05T11:57:40Z

cache/cache_test.go

+type testObject struct {
+	metav1.TypeMeta   `json:",inline"`
+	metav1.ObjectMeta `json:"metadata,omitempty"`
+}


How about replacing this with the PartialObjectMetadata from https://github.com/kubernetes/apimachinery/blob/v0.30.1/pkg/apis/meta/v1/types.go#L1496-L1502 ?

darkowlzz · 2024-06-05T16:01:16Z

cache/cache.go

+	if c.closed {
+		return ErrClosed
+	}
+	c.janitor.stop <- true


If no cleanup interval is passed as an option when creating the cache, the janitor goroutine doesn't get started. But upon calling Close() on such a cache, the program hangs because the above code tries to send data to a channel that has not started receiving anything. I encountered this in TestCache test which calls Close() at the very end.
Although there may be no need to call Close in such a situation, we can handle this gracefully by either including a default interval for the cleanup to always run the janitor, or check the interval to find out if the janitor goroutine may be running before running the code above.
Since Delete() depends on the janitor to actually delete the expired items from cache, the janitor seems to be an essential part of the cache for proper functioning. This makes me favor a default cleanup interval approach that ensures that the janitor always runs.

darkowlzz · 2024-06-05T16:34:51Z

cache/cache.go

+}
+
+// Delete an item from the cache. Does nothing if the key is not in the cache.
+// It actually sets the item expiration to now, so that it will at the cleanup.


"... will be deleted at the cleanup" ?

darkowlzz · 2024-06-05T17:42:22Z

cache/cache.go

+	mu sync.RWMutex
+}
+
+var _ Expirable[any] = &Cache[any]{}


It would be good to also add

var _ Store[any] = &Cache[any]{}

similar to the one in lru.go to ensure it satisfies the interface.

darkowlzz · 2024-06-05T21:56:14Z

cache/lru.go

+	// if node is head or tail, do nothing
+	if key == c.head.key || key == c.tail.key {
+		c.metrics.incCacheRequests(StatusSuccess)
+		return nil
+	}


This code is making me question if the design is a bit accidental. Reading the LRU.add(), I'm visualizing the linked-list and the head/tail pointers as

Head and tail don't contain object data. They are like cursors/references pointing to the last and first element of the list. They aren't counted as cache elements. But the last and first nodes also point to these empty head and tail as their previous and next nodes, respectively. The key of head and tail are always empty as they don't contain any object.
The above code checks to see if the node that's requested to be deleted matches with head or tail. Based on my above understanding, this if condition won't be true because of how LRU.add() inserts the objects and organizes the cache.

Am I misunderstanding?

souleb requested a review from a team as a code owner April 25, 2024 15:17

souleb marked this pull request as draft April 25, 2024 15:17

stefanprodan reviewed Apr 26, 2024

View reviewed changes

cache/cache.go Outdated Show resolved Hide resolved

souleb added 5 commits April 28, 2024 22:59

make the store and cache generic

f491bf2

Signed-off-by: Soule BA <bah.soule@gmail.com>

Add a lru cache that implements the store interface

305494d

Signed-off-by: Soule BA <bah.soule@gmail.com>

Add a lru cache that implements the store interface

64822d5

Signed-off-by: Soule BA <bah.soule@gmail.com>

Add instrumentation for the cache implementations

21c60f7

The cache instrumentation purpose is to be self contained. We want to avoid leaking any of the logic here in consuming code. Signed-off-by: Soule BA <bah.soule@gmail.com>

adding concurrency test

1e86776

Signed-off-by: Soule BA <bah.soule@gmail.com>

stefanprodan reviewed Apr 30, 2024

View reviewed changes

cache/lru.go Outdated Show resolved Hide resolved

cache/metrics.go Outdated Show resolved Hide resolved

stefanprodan reviewed May 2, 2024

View reviewed changes

cache/cache.go Outdated Show resolved Hide resolved

souleb force-pushed the enable-cachin-auth-tokens branch from f79844e to f5a5a36 Compare May 3, 2024 13:27

souleb force-pushed the enable-cachin-auth-tokens branch 4 times, most recently from 8eec6c2 to 4ddba8d Compare May 19, 2024 21:24

souleb marked this pull request as ready for review May 19, 2024 21:24

souleb requested review from stefanprodan and darkowlzz May 19, 2024 21:24

souleb force-pushed the enable-cachin-auth-tokens branch from 4ddba8d to a108669 Compare May 19, 2024 21:51

souleb mentioned this pull request May 21, 2024

SOPS AZKV getDefaultAzureCredential triggers API rate limit fluxcd/kustomize-controller#840

Open

stefanprodan changed the title ~~Auth token caching~~ Introduce in-memory caching package May 23, 2024

stefanprodan approved these changes May 23, 2024

View reviewed changes

This was referenced May 27, 2024

[OCI] Cache Login credentials #776

Open

Cache persistence to disk #778

Open

darkowlzz reviewed Jun 3, 2024

View reviewed changes

Add more tests cases

c6ec0fb

Signed-off-by: Soule BA <bah.soule@gmail.com>

souleb force-pushed the enable-cachin-auth-tokens branch from a108669 to c6ec0fb Compare June 4, 2024 08:50

darkowlzz reviewed Jun 4, 2024

View reviewed changes

darkowlzz reviewed Jun 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce in-memory caching package #766

Introduce in-memory caching package #766

souleb commented Apr 25, 2024 •

edited by stefanprodan

errordeveloper commented May 8, 2024 •

edited

souleb commented May 18, 2024

stefanprodan left a comment

darkowlzz Jun 3, 2024

souleb Jun 3, 2024

darkowlzz Jun 3, 2024 •

edited

darkowlzz Jun 4, 2024 •

edited

souleb Jun 5, 2024

darkowlzz Jun 4, 2024

darkowlzz Jun 4, 2024

souleb Jun 5, 2024

darkowlzz Jun 5, 2024 •

edited

darkowlzz Jun 5, 2024

darkowlzz Jun 4, 2024

darkowlzz Jun 4, 2024 •

edited

darkowlzz Jun 4, 2024

darkowlzz Jun 4, 2024

souleb Jun 5, 2024

darkowlzz Jun 5, 2024

darkowlzz Jun 4, 2024 •

edited

darkowlzz Jun 5, 2024 •

edited

darkowlzz Jun 5, 2024

darkowlzz Jun 5, 2024

darkowlzz Jun 5, 2024

darkowlzz Jun 5, 2024 •

edited

	c.metrics.incCacheRequests("succes")
	c.metrics.incCacheRequests(StatusSuccess)

Introduce in-memory caching package #766

Are you sure you want to change the base?

Introduce in-memory caching package #766

Conversation

souleb commented Apr 25, 2024 • edited by stefanprodan

Store design

errordeveloper commented May 8, 2024 • edited

souleb commented May 18, 2024

stefanprodan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 3, 2024 • edited

Choose a reason for hiding this comment

darkowlzz Jun 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 4, 2024 • edited

Choose a reason for hiding this comment

darkowlzz Jun 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 5, 2024 • edited

Choose a reason for hiding this comment

souleb commented Apr 25, 2024 •

edited by stefanprodan

errordeveloper commented May 8, 2024 •

edited

darkowlzz Jun 3, 2024 •

edited

darkowlzz Jun 4, 2024 •

edited

darkowlzz Jun 5, 2024 •

edited

darkowlzz Jun 4, 2024 •

edited

darkowlzz Jun 4, 2024 •

edited

darkowlzz Jun 5, 2024 •

edited

darkowlzz Jun 5, 2024 •

edited