-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce in-memory caching package #766
base: main
Are you sure you want to change the base?
Conversation
The store interface is a simplified version of client go store. If an object can be cached there, it should be in our store implementations. We also provide a default way to generate keys, but we allow users to provide their own function. This should enable all needed levels of isolation. Signed-off-by: Soule BA <bah.soule@gmail.com>
Signed-off-by: Soule BA <bah.soule@gmail.com>
Signed-off-by: Soule BA <bah.soule@gmail.com>
Signed-off-by: Soule BA <bah.soule@gmail.com>
The cache instrumentation purpose is to be self contained. We want to avoid leaking any of the logic here in consuming code. Signed-off-by: Soule BA <bah.soule@gmail.com>
Signed-off-by: Soule BA <bah.soule@gmail.com>
f79844e
to
f5a5a36
Compare
Thanks for working on this, @souleb!
It would be handy to have a link in the code. At the moment this is only mentioned in a commit message and PR description. What may need to be made more explicit is that the sore is persisted to a file. |
There is a comment in Persisting to a file will be made in a follow up PR where we will outline the use case we have for it. |
8eec6c2
to
4ddba8d
Compare
4ddba8d
to
a108669
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @souleb 🏅
if key, ok := any(object).(ExplicitKey); ok { | ||
return string(key), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input is an ExplicitKey
already, which is of string
type, what's the use case for passing it to IdentifiableObjectKeyFunc
to obtain a string
again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you always use the Get()
function and never GetByKey()
you might have some objects for which you only have a key
. The ExplicitKey
solve that.
// StoreObject is a wrapper for an object with its identifying key. | ||
// It is used to store objects in a Store. | ||
type StoreObject[T any] struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding the usage of generics, please correct me, but this says that this is used to store objects in a Store and I do see it being passed to cache.Add()
in the tests. Can't we use this StoreObject
data type in the Store
interface instead of any
? As StoreObject
in itself contains any. I'm thinking about it because I see very few usage of StoreObject
data type.
Signed-off-by: Soule BA <bah.soule@gmail.com>
a108669
to
c6ec0fb
Compare
if opt.registerer == nil { | ||
return nil, ErrNoRegisterer | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this makes metrics a requirement instead of optional?
In our reconciler metrics, we don't do anything if there's no recorder/collector configured and let the component run, refer https://github.com/fluxcd/pkg/blob/runtime/v0.47.1/runtime/controller/metrics.go#L79 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not optional at the moment. I'll have to refactor how we call the metrics funcs. I'll do that.
c.mu.Unlock() | ||
return KeyError{object, ErrFull} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this returns an error and doesn't add the object in the cache, shouldn't it also have
c.metrics.incCacheRequests(StatusFailure)
?
Same in Update()
below.
key, err := c.keyFunc(object) | ||
if err != nil { | ||
return KeyError{object, err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this too be recorded as a failed request in metrics?
Same in Update()
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the keyFunc is wrong, many actions will result in error due to user provided parameter. This might have a consequence of too many metrics. Also this error is not on the cache side, so I wouldn't record it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the key derivation is part of the cache, the cache receives a request even for bad input. Key derivation failure takes place within the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the LRU cache, this seems to be implemented as suggested, except for in the Add()
method.
if item.expiration > 0 { | ||
if item.expiration < time.Now().UnixNano() { | ||
c.mu.RUnlock() | ||
c.metrics.incCacheRequests("succes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.metrics.incCacheRequests("succes") | |
c.metrics.incCacheRequests(StatusSuccess) |
if c.labelsFunc != nil { | ||
lvs, err = c.labelsFunc(object, len(c.metrics.getExtraLabels())) | ||
if err != nil { | ||
return res, false, KeyError{object, err} | ||
} | ||
} | ||
key, err := c.keyFunc(object) | ||
if err != nil { | ||
return res, false, KeyError{object, err} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the failures in both of the above be recorded as failed cache request metrics?
if err != nil { | ||
return KeyError{object, err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed cache request metric here too?
|
||
// Resize resizes the cache and returns the number of index removed. | ||
func (c *cache[T]) Resize(size int) int { | ||
overflow := len(c.items) - size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some guard for the overflow value would help prevent panic due to out of range errors. Passing any negative number as the size results in out of bound failure.
Maybe check the overflow value with the total capacity before proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make size an uint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that a uint will need to be converted to an int for subtracting with the length and the integer overflow will also result in a negative number.
But maybe you can deal with it in some other way.
// GetExpiration returns the expiration for the given key. | ||
// Returns zero if the key is not in the cache or the item | ||
// has already expired. | ||
func (c *Cache[T]) GetExpiration(object T) (time.Duration, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the definition of this is making me wonder about the return type.
In terms of the Expirable
interface, accepting a duration is easier to set an expiration for now + x duration. But for getting the expiration, if the same duration is returned, it's difficult to know the exact time as the duration was calculated relative to when it was obtained. It would be easier to use if the exact time is obtained for GetExpiration
.
Looking at its usage in #776, for example for AWS, the ECR token provides an exact time, but we calculate a relative time and return that. Then use that relative time to set the cache expiration. If there's any delay between obtaining the token and caching it due to any reason, world stops due to GC or the system hangs momentarily, the relative time will be incorrect.
type testObject struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about replacing this with the PartialObjectMetadata
from https://github.com/kubernetes/apimachinery/blob/v0.30.1/pkg/apis/meta/v1/types.go#L1496-L1502 ?
if c.closed { | ||
return ErrClosed | ||
} | ||
c.janitor.stop <- true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no cleanup interval is passed as an option when creating the cache, the janitor goroutine doesn't get started. But upon calling Close()
on such a cache, the program hangs because the above code tries to send data to a channel that has not started receiving anything. I encountered this in TestCache
test which calls Close()
at the very end.
Although there may be no need to call Close in such a situation, we can handle this gracefully by either including a default interval for the cleanup to always run the janitor, or check the interval to find out if the janitor goroutine may be running before running the code above.
Since Delete()
depends on the janitor to actually delete the expired items from cache, the janitor seems to be an essential part of the cache for proper functioning. This makes me favor a default cleanup interval approach that ensures that the janitor always runs.
} | ||
|
||
// Delete an item from the cache. Does nothing if the key is not in the cache. | ||
// It actually sets the item expiration to now, so that it will at the cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... will be deleted at the cleanup" ?
mu sync.RWMutex | ||
} | ||
|
||
var _ Expirable[any] = &Cache[any]{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to also add
var _ Store[any] = &Cache[any]{}
similar to the one in lru.go to ensure it satisfies the interface.
// if node is head or tail, do nothing | ||
if key == c.head.key || key == c.tail.key { | ||
c.metrics.incCacheRequests(StatusSuccess) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is making me question if the design is a bit accidental. Reading the LRU.add()
, I'm visualizing the linked-list and the head/tail pointers as
Head and tail don't contain object data. They are like cursors/references pointing to the last and first element of the list. They aren't counted as cache elements. But the last and first nodes also point to these empty head and tail as their previous and next nodes, respectively. The key of head and tail are always empty as they don't contain any object.
The above code checks to see if the node that's requested to be deleted matches with head or tail. Based on my above understanding, this if condition won't be true because of how LRU.add()
inserts the objects and organizes the cache.
Am I misunderstanding?
This PR will allow caching the authentication credentials retrieved by
pkg/oci/auth
. It should also enable future usage of the caching underlying mechanism.Part of: #642
Store design
The store is K/V store that can store arbitrary objects. The store interface is a simplified version of client go store. If an object can be cached there, it should be in our store implementations. This is desirable, because the primary envisioned place for the store usage is during a reconciliation of custom resources retrieved from a shared informer cache.
The
keys
are generated dynamically with deterministic function. Accepting a function instead ofkey strings
enables user to determine themselves the keys uniqueness constraint.They store must be thread safe, it should support concurrent programs.
There are 2 main uses cases:
Store
interface and underlying implementation should allow replacinghttps://github.com/fluxcd/source-controller/tree/main/internal/cache
which is today used to cachehelm indexes
. Ahelm index
is written, read and overwritten multiple times by multiple goroutines. Usually an index is stored and read multiple times (reconciliation of all dependent custom resources * interval) before being overwritten by a new version. It is read intensive. An index is read from local storage before being cached, if the cache is full, we could have several Custom Resources loading the same index in memory, in order to avoid that, we should evict older keys to make room.Based on the two scenario above, the store should be optimized for
reads
.We also have a scenario that needs
keys
to expirable and another that need them be evicted based on usage.Hence two implementations are provided: