[WIP] feat: add cache provider implementation #4169

milosgajdos · 2023-11-23T07:06:42Z

Currently, the only two implementations of cache are redis and inmemory.

This PR introduces cache providers that let you implement your own blobdescriptor cache of choice.

Extended rebase of #2580

EDIT:

⚠️ This changes configuration but it's still the perfect time to do this before v3 release ⚠️

cache now has a dedicated configuration struct instead of being part of the storage configuration.
This simplifies the code quite a bit at the expense of sad configuration decoding. I passionately hat the configuration package and IMHO we should rework it almost from the ground up.

Each cache provider must register itself via theinit() func so it's loaded on startup and available if the config requires it.

Adding new cache providers, should there be such a need in the future, should be easier.
Currently, we continue to provide only redis and inmemory cache provider implementations.

Cache configuration is now expected to be set with the following options:

provider: name of a cache provider
params set of config params that configure the given cache provider

Redis configuration has been moved from the main configuration to redis cache pacakge because its only ever been used for configuring caches and nothing else so it makes no sense to keep it in the global config.

Documentation has been updated accordingly.

currently the only two implementation of cache are redis and inmemory. This commit introduces cache providers that let you implement your own blbodescriptor cache of choice and plug it in. Co-authored-by: Gladkov Alexey <agladkov@redhat.com> Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

corhere · 2023-12-07T22:18:04Z

registry/storage/cache/provider/cacheprovider.go

+	if _, exists := cacheProviders[name]; exists {
+		return fmt.Errorf("name already registered: %s", name)
+	}


Is provider.Register intended to be called from init() functions for packages to self-register? If so, why not just panic?

Good shout. I think there are a few other cases in this codebase where we should rectify this e.g.

distribution/registry/storage/driver/middleware/storagemiddleware.go

Lines 18 to 29 in 04e3bda

func Register(name string, initFunc InitFunc) error {

if storageMiddlewares == nil {

storageMiddlewares = make(map[string]InitFunc)

}

if _, exists := storageMiddlewares[name]; exists {

return fmt.Errorf("name already registered: %s", name)

}

storageMiddlewares[name] = initFunc

return nil

}

registry/storage/cache/provider/cacheprovider.go

corhere · 2023-12-07T22:30:57Z

registry/storage/cache/provider/cacheprovider.go

I wonder how many near-duplicates of this pattern are scattered throughout the codebase. I betcha they could all be replaced with instantiations of a single generic implementation.

type FactoryRegistry[T any, F func(context.Context, map[string]any] struct{ ... } func NewFactoryRegistry[T any, F func(context.Context, map[string]any) (T, error)]() FactoryRegistry[T] func (*FactoryRegistry[T, F]) Register(name string, factory F) func (*FactoryRegistry[T, F]) Get(ctx context.Context, name string, options map[string]any) (T, error)

var Registry = factoryutil.NewFactoryRegistry[cache.BlobDescriptorCacheProvider]()

I think with generics, this codebase could use proper massaging.

Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>

This func is meant to be used from init funcs of specific cacheprovider implementations. If they fail to register we should panic. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2023-12-07T23:38:45Z

With this PR...I am kinda wondering if redis and inmemory caches should be considered to be cacheproviders. That's not the case at the moment. I will mull over this for a bit, but I think if we do this (agree on getting this in) we should make redis and inmemory caches cacheproviders

This commit reworks cache provider setup and configuration. Cache now has a dedicated configuration structure instead of being part of the storage configuration. This simplifies the code quite a bit at the expense of sad configuration decoding. This commit now allows for adding of new cache providers should there be such need in the future. Currently we continue to provide only redis and inmemory cache. In line with this the existing cache implementation: inmemory and redis have now been "turned" into cache providers which also simplifies the code. Cache configuration is now expected to be set the following options: provider - name of the cache provider params - arbitrary set of config params for the given cache provider. Each cache provider must register itself via init func so it's loaded on startup and available if the config requires it. Redis configuration has been removed from the main configuration because its only ever been used for configuring caches and nothing else so it makes no sense to keep it in the global config. Documentation has been updated accordingly. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2023-12-08T19:51:16Z

@corhere I've reworked the cache provider stuff from the ground up. Would love it if you could spare a minute looking through this PR. I've marked it as a draft for the time being so we can iron out some details, but I'm pretty happy with it as it is.

I've also updated the PR config with all the new info.

Also PTAL @thaJeztah @Jamstah

big jump in scope since review

corhere

You might be able to convince me regarding the change to redis configuration.

corhere · 2023-12-08T20:22:47Z

cmd/registry/config-cache.yml

-redis:
-  addr: localhost:6379
-  pool:
-    maxidle: 16
-    maxactive: 64
-    idletimeout: 300s
-  dialtimeout: 10ms
-  readtimeout: 10ms
-  writetimeout: 10ms


I think it is more future-proof to leave redis as a top-level key. It may only be used for one thing now, but that could change in the future. Redis could be useful for other kinds of middleware, e.g. rate-limiting clients, and could be made to coexist with the blobdescriptor cache keys in the same redis instance. Having separate redis connection configs for each would be repetitive to configure, and would result in multiple connection pools targeting the same server.

I was originally thinking about EXACTLY the same lines before I made the change.

The use cases you mention might make some sense in the future but I'm skeptical: nobodoy will implement rate limiters into this bloating codebase. Large deployments are usually behind proxies that handle rate limiting before the requests hit the API -- speaking of experience of managing many of these large volume ones over my career.

Whilst sharing a Redis instance across many use cases makes sense in cute little deployments it becomes a nuisance and proper headache whem you outgrow your small setup, not to mention whatever security use case your security team barfs at you, etc.

Still Not entirely sold on keeping it as a global, but I'm also not strictly against it either - just mulling over my past experiences.

Sometimes it's better to duplicate things than create more problems for yourself by prematurely optimising 🙃

corhere · 2023-12-08T20:31:45Z

registry/storage/cache/memory/memory.go

+// NewCacheOptions returns new memory cache options.
+func NewCacheOptions(size int) map[string]interface{} {
+	return map[string]interface{}{
+		"params": map[interface{}]interface{}{
+			"size": size,
+		},
+	}
+}


The existence and usage of this function tells me that something is upside-down. Specifically, NewBlobDescriptorCacheProvider is not adhering to the Single-Reponsibility Principle. It is parsing user configuration map[string]any into a useable representation, and constructing a provider. Parsing a map[string]any is only needed for the factory registered with the cache-provider registry; it's just extra steps for the registry client and other internal users who instantiate cache providers directly.

Yes, it's one of a few things in this PR that I'm still mulling over. Does not feel right I agree.

Note that the Azure storage driver is also decoding auth config in its constructor: thats where I got some inspiration, but I'm not happy about it no.

It's also one of the reasons I marked this PR as draft. There are things to be figured out here.

Just FYI: this is a helper func, you don't have to use it, there is nothing that prevents you from providing your own map[string]interface{} params in a similar way it's done in the storage drivers.

I was trying to avoid this (and other storage driver config) absolute state of things

distribution/registry/storage/driver/s3-aws/s3.go

Lines 192 to 472 in 04e3bda

func FromParameters(ctx context.Context, parameters map[string]interface{}) (*Driver, error) {

// Providing no values for these is valid in case the user is authenticating

// with an IAM on an ec2 instance (in which case the instance credentials will

// be summoned when GetAuth is called)

accessKey := parameters["accesskey"]

if accessKey == nil {

accessKey = ""

}

secretKey := parameters["secretkey"]

if secretKey == nil {

secretKey = ""

}

regionEndpoint := parameters["regionendpoint"]

if regionEndpoint == nil {

regionEndpoint = ""

}

forcePathStyleBool := true

forcePathStyle := parameters["forcepathstyle"]

switch forcePathStyle := forcePathStyle.(type) {

case string:

b, err := strconv.ParseBool(forcePathStyle)

if err != nil {

return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean")

}

forcePathStyleBool = b

case bool:

forcePathStyleBool = forcePathStyle

case nil:

// do nothing

default:

return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean")

}

regionName := parameters["region"]

if regionName == nil || fmt.Sprint(regionName) == "" {

return nil, fmt.Errorf("no region parameter provided")

}

region := fmt.Sprint(regionName)

// Don't check the region value if a custom endpoint is provided.

if regionEndpoint == "" {

if _, ok := validRegions[region]; !ok {

return nil, fmt.Errorf("invalid region provided: %v", region)

}

}

bucket := parameters["bucket"]

if bucket == nil || fmt.Sprint(bucket) == "" {

return nil, fmt.Errorf("no bucket parameter provided")

}

encryptBool := false

encrypt := parameters["encrypt"]

switch encrypt := encrypt.(type) {

case string:

b, err := strconv.ParseBool(encrypt)

if err != nil {

return nil, fmt.Errorf("the encrypt parameter should be a boolean")

}

encryptBool = b

case bool:

encryptBool = encrypt

case nil:

// do nothing

default:

return nil, fmt.Errorf("the encrypt parameter should be a boolean")

}

secureBool := true

secure := parameters["secure"]

switch secure := secure.(type) {

case string:

b, err := strconv.ParseBool(secure)

if err != nil {

return nil, fmt.Errorf("the secure parameter should be a boolean")

}

secureBool = b

case bool:

secureBool = secure

case nil:

// do nothing

default:

return nil, fmt.Errorf("the secure parameter should be a boolean")

}

skipVerifyBool := false

skipVerify := parameters["skipverify"]

switch skipVerify := skipVerify.(type) {

case string:

b, err := strconv.ParseBool(skipVerify)

if err != nil {

return nil, fmt.Errorf("the skipVerify parameter should be a boolean")

}

skipVerifyBool = b

case bool:

skipVerifyBool = skipVerify

case nil:

// do nothing

default:

return nil, fmt.Errorf("the skipVerify parameter should be a boolean")

}

v4Bool := true

v4auth := parameters["v4auth"]

switch v4auth := v4auth.(type) {

case string:

b, err := strconv.ParseBool(v4auth)

if err != nil {

return nil, fmt.Errorf("the v4auth parameter should be a boolean")

}

v4Bool = b

case bool:

v4Bool = v4auth

case nil:

// do nothing

default:

return nil, fmt.Errorf("the v4auth parameter should be a boolean")

}

keyID := parameters["keyid"]

if keyID == nil {

keyID = ""

}

chunkSize, err := getParameterAsInt64(parameters, "chunksize", defaultChunkSize, minChunkSize, maxChunkSize)

if err != nil {

return nil, err

}

multipartCopyChunkSize, err := getParameterAsInt64(parameters, "multipartcopychunksize", defaultMultipartCopyChunkSize, minChunkSize, maxChunkSize)

if err != nil {

return nil, err

}

multipartCopyMaxConcurrency, err := getParameterAsInt64(parameters, "multipartcopymaxconcurrency", defaultMultipartCopyMaxConcurrency, 1, math.MaxInt64)

if err != nil {

return nil, err

}

multipartCopyThresholdSize, err := getParameterAsInt64(parameters, "multipartcopythresholdsize", defaultMultipartCopyThresholdSize, 0, maxChunkSize)

if err != nil {

return nil, err

}

rootDirectory := parameters["rootdirectory"]

if rootDirectory == nil {

rootDirectory = ""

}

storageClass := s3.StorageClassStandard

storageClassParam := parameters["storageclass"]

if storageClassParam != nil {

storageClassString, ok := storageClassParam.(string)

if !ok {

return nil, fmt.Errorf(

"the storageclass parameter must be one of %v, %v invalid",

s3StorageClasses,

storageClassParam,

)

}

// All valid storage class parameters are UPPERCASE, so be a bit more flexible here

storageClassString = strings.ToUpper(storageClassString)

if storageClassString != noStorageClass &&

storageClassString != s3.StorageClassStandard &&

storageClassString != s3.StorageClassReducedRedundancy &&

storageClassString != s3.StorageClassStandardIa &&

storageClassString != s3.StorageClassOnezoneIa &&

storageClassString != s3.StorageClassIntelligentTiering &&

storageClassString != s3.StorageClassOutposts &&

storageClassString != s3.StorageClassGlacierIr {

return nil, fmt.Errorf(

"the storageclass parameter must be one of %v, %v invalid",

s3StorageClasses,

storageClassParam,

)

}

storageClass = storageClassString

}

userAgent := parameters["useragent"]

if userAgent == nil {

userAgent = ""

}

objectACL := s3.ObjectCannedACLPrivate

objectACLParam := parameters["objectacl"]

if objectACLParam != nil {

objectACLString, ok := objectACLParam.(string)

if !ok {

return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam)

}

if _, ok = validObjectACLs[objectACLString]; !ok {

return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam)

}

objectACL = objectACLString

}

useDualStackBool := false

useDualStack := parameters["usedualstack"]

switch useDualStack := useDualStack.(type) {

case string:

b, err := strconv.ParseBool(useDualStack)

if err != nil {

return nil, fmt.Errorf("the useDualStack parameter should be a boolean")

}

useDualStackBool = b

case bool:

useDualStackBool = useDualStack

case nil:

// do nothing

default:

return nil, fmt.Errorf("the useDualStack parameter should be a boolean")

}

mutlipartCombineSmallPart := true

combine := parameters["multipartcombinesmallpart"]

switch combine := combine.(type) {

case string:

b, err := strconv.ParseBool(combine)

if err != nil {

return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean")

}

mutlipartCombineSmallPart = b

case bool:

mutlipartCombineSmallPart = combine

case nil:

// do nothing

default:

return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean")

}

sessionToken := ""

accelerateBool := false

accelerate := parameters["accelerate"]

switch accelerate := accelerate.(type) {

case string:

b, err := strconv.ParseBool(accelerate)

if err != nil {

return nil, fmt.Errorf("the accelerate parameter should be a boolean")

}

accelerateBool = b

case bool:

accelerateBool = accelerate

case nil:

// do nothing

default:

return nil, fmt.Errorf("the accelerate parameter should be a boolean")

}

params := DriverParameters{

fmt.Sprint(accessKey),

fmt.Sprint(secretKey),

fmt.Sprint(bucket),

region,

fmt.Sprint(regionEndpoint),

forcePathStyleBool,

encryptBool,

fmt.Sprint(keyID),

secureBool,

skipVerifyBool,

v4Bool,

chunkSize,

multipartCopyChunkSize,

multipartCopyMaxConcurrency,

multipartCopyThresholdSize,

mutlipartCombineSmallPart,

fmt.Sprint(rootDirectory),

storageClass,

fmt.Sprint(userAgent),

objectACL,

fmt.Sprint(sessionToken),

useDualStackBool,

accelerateBool,

getS3LogLevelFromParam(parameters["loglevel"]),

}

return New(ctx, params)

}

But maybe I'm trying to be too clever here 🤷‍♂️

Code needs to exist somewhere which transforms a generic map[string]any configuration blob into something the implementation can consume. There's no getting away from the code in the body of s3.FromParameters aside from leveraging metaprogramming (e.g. mapstructure) to do the same thing less verbosely. The only notes I have for s3.FromParameters are:

It would be easier to unit-test if the code to parse parameters was extracted to its own func(map[string]any) (DriverParamters, error)

FromParameters probably does not need to be exported since it is made available through the driver registry. Code which does not need to go through the registry and therefore does not need to be generic over all storage drivers does not need to go through the generic parameter-parsing machinery.

The functional options pattern would make it easier to evolve the configuration and implementation in semver-compatible ways than to export the DriverParameters struct.

Aside from that, I firmly believe that the cache provider implementations should be following the way the s3 storage driver does configuration and construction. Consumers that do not need to be generic over all implementations of an interface should not be forced to use the weakly-typed, generic-over-all-implementations factory interface targeted at humans. Configuration as map[string]any makes runtime errors out of what could have otherwise been caught at compile time.

corhere · 2023-12-08T20:43:25Z

registry/storage/cache/memory/memory.go

-func NewInMemoryBlobDescriptorCacheProvider(size int) cache.BlobDescriptorCacheProvider {
-	if size <= 0 {
+func NewBlobDescriptorCacheProvider(ctx context.Context, options map[string]interface{}) (cache.BlobDescriptorCacheProvider, error) {


func New(size int) *Cache func factory(_ context.Context, options map[string]any) (cache.BlobDescriptorCacheProvider, error) { /* ... */ return New(c.Size), nil }

The names are just for illustration. My point is:

return concrete types for internal users who are constructing caches directly, and export the concrete type

The ~~factory function~~ "initFunc" does not need to be the same function as the constructor. Even if it would just be a trivial wrapper, it would still be doing something: lifting the concrete cache value into the cache.BlobDescriptorCacheProvider interface type.

I'm a big fan of returning concrete values but by the same token we'd have had to burn down half this codebase which returns interfaces left right and center.

Note that this mostly follows the existing code in this repo for consistency sake - though some parts of it are a bit divergent now I agree.

It's even funkier here with the cache descriptors because the return values are actually wrapped in a prometheus wrapper

I'm a big fan of returning concrete values but by the same token we'd have had to burn down half this codebase which returns interfaces left right and center.

I'm not reviewing the whole codebase, nor am I suggesting that we boil the ocean.

Note that this mostly follows the existing code in this repo for consistency sake

And I am objecting to following legacy coding conventions in greenfield code for no reason other than "for consistency's sake." We can never evolve our coding conventions or move on from past mistakes with that attitude.

It's even funkier here with the cache descriptors because the return values are actually wrapped in a prometheus wrapper

Ahh! Now we get to the real reason: it would reveal a layering violation in the redis cache implementation. The metrics middleware for caches has low cohesion with the redis cache implementation, therefore they should not be tightly coupled. The middleware should be applied to the cache.Get() return value (read: in NewApp()) so that any cache provider gets instrumented.

Some open questions I still have (besides the cohesion point you highlighted -- thanks!):

should we move redis config stanza out of the root of the config -- I'm undecided on this; I feel like having it in the root makes it look like Redis is an "essential" part of the app which isn't true: we can instead either use in memory cache or no cache at all. So creating some semantic section for it that highlights its purpose in the wider context of the app could make sense. No idea. I'm just braindumping here.

What should the params passed down to Redis module look like? We've agreed on config decoder living outside of it. To make it easier to consume Redis cache APIs should the pool be created inside the constructor from the passed-in params or should it be created in the app root (this would allow keeping the global pool client in the app and potentially reusing it elsewhere) and then passed it via params to the module (e.g. map["pool"] = pool - this was actually my original idea, but it didn't feel quite right). The former would align with some of the code in this project but make it mildly annoying to consume (i.e. the API consumers would have to supply parameters via the mildly type unsafe map -- any values gonna be anying), the latter would make it easier to consumer (i.e. the API consumers could simply just pass the pool object down) but it doesn't quite feel right either; the last option could be changing the InitFunc that could maybe accept the pool as param directly.

I'm still mulling over this. And since this PR is still a WIP or more like an exercise in exploration of the future, there is no rush on this. I'd prefer getting this right (or as close to the ideal as possible) than regret it in the future

should we move redis config stanza out of the root of the config

I am not too concerned about confusing users by keeping the config stanza in the root. I still think there is some merit to future-proofing the config to afford sharing a single redis client connection pool (which implies a shared database instance) among multiple redis-backed modules within the app. But I don't feel too strongly about it: if in the short-term it would require a complicated architecture to pull it off when there is just a single consumer, I am fully prepared to accept that YAGNI.

(Aside: in a hypothetical future where there is more than one redis-backed module available in the app, root redis config could coexist with per-module overrides. The overridden module would instantiate its own client, inheriting the root redis config values as defaults. E.g.:)

redis: addr: localhost:6379 pool: maxidle: 16 maxactive: 64 idletimeout: 300s dialtimeout: 10ms readtimeout: 10ms writetimeout: 10ms cache: blobdescriptor: provider: redis params: db: 42

What should the params passed down to Redis module look like?

The constructor should continue to accept a *redis.Client directly. How the factory function acquires that client is the big question.

First thing we could do is move the cache-provider registry stuff under an internal subtree to free it from the SemVer compatibility contract. (Maybe the cache implementations, too? With sanely-shaped constructors they might be useful for third parties, but we'd have to think about SemVer compatibility.) Then we'd be free to iterate on the registry and factories without being constrained by past decisions.

In the long term, a shared app-wide redis client (for example) could be made available to the cache-provider factories through some form of dependency injection. A pointer to the app struct could be passed into registry.Get alongside the params map, or perhaps some purpose-built "provider registry" object. In the short term, said dependency injection mechanism could instead provide the top-level redis config to the factory since the redis cache-provider is the only thing which needs the redis client today. With app.redis already being a thing, passing the app struct pointer into the factory may be the path of least resistance as long as we are okay with the factory implementation living in the app package. (I think it makes sense: the factory is necessarily tightly coupled to (at least one subtree of) the app config so it follows logically that it should also have high cohesion with the app package.)

corhere · 2023-12-08T20:50:56Z

registry/storage/cache/cache.go

+// Register is used to register an InitFunc for
+// a BlobDescriptorCacheProvider with the given name.
+// It's meant to be called from init() function.
+func Register(name string, initFunc InitFunc) {


With one sight tweak, it would be possible to register with a bit less boilerplate:

Suggested change

func Register(name string, initFunc InitFunc) {

func Register(name string, initFunc InitFunc) struct{} {

package myprovider import "github.com/distribution/distribution/v3/registry/storage/cache" var _ = cache.Register("myprovider", initFunc)

(I got the idea from Ginkgo, for better or worse.)

Yeah, this is possible, but I'm trying to follow the established conventions in this codebase (see the storage drives and middleware)

What's so sacred about the established conventions?

Nothing in particular, but I prefer keeping code consistent across codebases: it makes navigation, comprehension, onboarding new devs, etc much easier...I got a bit religious with it in this codebase I agree

I agree for keeping code consistent, no code is perfect and having to deal with the same suboptimal convention everywhere is better than having to deal with 5 or 6 practices of varying quality, even if the average quality is better.

If we did want to establish a new convention, we should be deliberate about picking one and moving forward with that single practice, and we should also look towards switching the old code over to that as well.

We don't have to rip everything out all at once, but if the new convention is not compelling enough that we'd see benefit from applying it to the older code, then it may also not be compelling enough to deal with two conventions during the interim period.

Let me get this straight:

All existing code has been grandfathered in as "conventional."

Except for the S3 storage driver. For some reason, everything it does has been decreed anti-conventional.

All new code to which any similarity can be drawn with existing code must be shaped like the existing code.

Except for the S3 storage driver. New code must be inconsistent with the S3 driver, just to be contrary.

Any changes to how code is shaped must be agreed upon by consensus and must then be ratified by applying it to the entire codebase.

Any changes must therefore go through a waterfall process before a single line of new code can be merged which goes against grandfathered conventions.

The determination of whether a new convention is compelling enough must be made without having experience coding in the new convention. Experience in coding the new convention cannot be gained without coding in the new convention. New conventions are therefore de facto banned. Whatever conventions were grandfathered in are our coding conventions, now and forever.

And given which thread this discussion is being made on, anything which might possibly afford code to be written in a divergent style while still affording the conventional style is also banned

That can't be right.

I am literally just suggesting that the brand new Register function grow a dummy result parameter so that implementations could use a less-noisy syntax for registration. The following would also compile just fine!

func init() { cache.Register("myprovider", initFunc) }

I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.

I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.

FYI: I appreciate your feedback very much. In fact, I really enjoy working together/collaborating on this project with you. I had no idea some of my comments would cause this much discussion or maybe even angst (hopefully not!).

This codebase does need a ton of improvements all around and I need to withhold [at least some of my] hot takes (e.g. "set it on fire one lol") here as they're not productive. This discussion should probably belong elsewhere though.

FYI2: I'm not against changing the InitFunc - I'm sure we can find a way to make this work 😄

I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.

Please know that I am also intensely frustrated by the way this codebase does things. This project is old enough to have existed before a lot of Go best practices were solidified, and it shows.

Any changes to how code is shaped must be agreed upon by consensus

You need two approvals from two maintainers in different orgs, in your case that's yourself and one other maintainer.

I tend to be more cautious, and I think it's a good and healthy balance to have someone who wants to shake things up. We need to moderate each other in this way.

and must then be ratified by applying it to the entire codebase.

I have a solution below that might give us both consensus and the capacity to effect change.

The determination of whether a new convention is compelling enough must be made without having experience coding in the new convention. Experience in coding the new convention cannot be gained without coding in the new convention. New conventions are therefore de facto banned. Whatever conventions were grandfathered in are our coding conventions, now and forever.

We all have worked on other projects and seen different conventions applied there and what their results were.
We can reasonably determine if something is going to work out and then decide to commit to it.

If we want to adopt a new convention, we can start with a single PR. However, if there's no barrier to entry, then it's easy to have a bunch of unrelated incohesive practices sprinkled throughout the code. Part of being a maintainer on any project is thinking about the impact of changes in context of existing code.

What I think would help in resolving these conflicts is adopting or implementing a style guide. This way, we can be deliberate about what our practices are, and apply them consistently. This would allow us to stabilize our practices moving forward and justify and encourage creating PRs which make old code conform. Having a style guide should provide the consistency that I want to see, while providing you with the ability to evolve the codebase.

To move forward, I propose:

Bucking the convention of returning interfaces, right now, in this PR.

Creating a style guide with a single entry in support of this style, in this or a followup PR.

Creating an issue to discuss expanding the style guide, for example, if we want to write our own or adopt some existing guide and if so, which one.

registry/storage/cache/cache.go

registry/storage/cache/redis/redis.go

Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>

milosgajdos mentioned this pull request Nov 23, 2023

Allow to use custom cache provider #2580

Closed

milosgajdos added the area/cache label Nov 23, 2023

milosgajdos marked this pull request as ready for review November 23, 2023 07:18

corhere reviewed Dec 7, 2023

View reviewed changes

milosgajdos and others added 2 commits December 7, 2023 23:17

Update registry/storage/cache/provider/cacheprovider.go

30f8cf7

Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>

update: make cacheprovider.Register panic on error

90665be

This func is meant to be used from init funcs of specific cacheprovider implementations. If they fail to register we should panic. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

corhere previously approved these changes Dec 7, 2023

View reviewed changes

milosgajdos marked this pull request as draft December 8, 2023 19:44

milosgajdos requested review from corhere, Jamstah and thaJeztah December 8, 2023 19:51

milosgajdos added the area/config Related to registry config label Dec 8, 2023

corhere requested changes Dec 8, 2023

View reviewed changes

milosgajdos and others added 2 commits December 8, 2023 21:14

Update cache.go

6e079b8

Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>

Update redis.go

7bf9aae

Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>

milosgajdos requested a review from deleteriousEffect December 8, 2023 21:36

milosgajdos changed the title ~~feat: add cache provider implementation~~ [WIP] feat: add cache provider implementation Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat: add cache provider implementation #4169

[WIP] feat: add cache provider implementation #4169

milosgajdos commented Nov 23, 2023 •

edited

corhere Dec 7, 2023

milosgajdos Dec 7, 2023

corhere Dec 7, 2023

milosgajdos Dec 7, 2023

milosgajdos commented Dec 7, 2023

milosgajdos commented Dec 8, 2023 •

edited

corhere left a comment

corhere Dec 8, 2023

milosgajdos Dec 8, 2023

corhere Dec 8, 2023

milosgajdos Dec 8, 2023

milosgajdos Dec 8, 2023

corhere Dec 11, 2023

corhere Dec 8, 2023

milosgajdos Dec 8, 2023 •

edited

corhere Dec 11, 2023

milosgajdos Dec 12, 2023

corhere Dec 12, 2023

corhere Dec 8, 2023

milosgajdos Dec 8, 2023

corhere Dec 11, 2023

milosgajdos Dec 11, 2023

deleteriousEffect Dec 11, 2023

corhere Dec 12, 2023

milosgajdos Dec 12, 2023

deleteriousEffect Dec 12, 2023 •

edited

	func Register(name string, initFunc InitFunc) error {
	if storageMiddlewares == nil {
	storageMiddlewares = make(map[string]InitFunc)
	}
	if _, exists := storageMiddlewares[name]; exists {
	return fmt.Errorf("name already registered: %s", name)
	}

	storageMiddlewares[name] = initFunc

	return nil
	}

	func FromParameters(ctx context.Context, parameters map[string]interface{}) (*Driver, error) {
	// Providing no values for these is valid in case the user is authenticating
	// with an IAM on an ec2 instance (in which case the instance credentials will
	// be summoned when GetAuth is called)
	accessKey := parameters["accesskey"]
	if accessKey == nil {
	accessKey = ""
	}
	secretKey := parameters["secretkey"]
	if secretKey == nil {
	secretKey = ""
	}

	regionEndpoint := parameters["regionendpoint"]
	if regionEndpoint == nil {
	regionEndpoint = ""
	}

	forcePathStyleBool := true
	forcePathStyle := parameters["forcepathstyle"]
	switch forcePathStyle := forcePathStyle.(type) {
	case string:
	b, err := strconv.ParseBool(forcePathStyle)
	if err != nil {
	return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean")
	}
	forcePathStyleBool = b
	case bool:
	forcePathStyleBool = forcePathStyle
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean")
	}

	regionName := parameters["region"]
	if regionName == nil \|\| fmt.Sprint(regionName) == "" {
	return nil, fmt.Errorf("no region parameter provided")
	}
	region := fmt.Sprint(regionName)
	// Don't check the region value if a custom endpoint is provided.
	if regionEndpoint == "" {
	if _, ok := validRegions[region]; !ok {
	return nil, fmt.Errorf("invalid region provided: %v", region)
	}
	}

	bucket := parameters["bucket"]
	if bucket == nil \|\| fmt.Sprint(bucket) == "" {
	return nil, fmt.Errorf("no bucket parameter provided")
	}

	encryptBool := false
	encrypt := parameters["encrypt"]
	switch encrypt := encrypt.(type) {
	case string:
	b, err := strconv.ParseBool(encrypt)
	if err != nil {
	return nil, fmt.Errorf("the encrypt parameter should be a boolean")
	}
	encryptBool = b
	case bool:
	encryptBool = encrypt
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the encrypt parameter should be a boolean")
	}

	secureBool := true
	secure := parameters["secure"]
	switch secure := secure.(type) {
	case string:
	b, err := strconv.ParseBool(secure)
	if err != nil {
	return nil, fmt.Errorf("the secure parameter should be a boolean")
	}
	secureBool = b
	case bool:
	secureBool = secure
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the secure parameter should be a boolean")
	}

	skipVerifyBool := false
	skipVerify := parameters["skipverify"]
	switch skipVerify := skipVerify.(type) {
	case string:
	b, err := strconv.ParseBool(skipVerify)
	if err != nil {
	return nil, fmt.Errorf("the skipVerify parameter should be a boolean")
	}
	skipVerifyBool = b
	case bool:
	skipVerifyBool = skipVerify
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the skipVerify parameter should be a boolean")
	}

	v4Bool := true
	v4auth := parameters["v4auth"]
	switch v4auth := v4auth.(type) {
	case string:
	b, err := strconv.ParseBool(v4auth)
	if err != nil {
	return nil, fmt.Errorf("the v4auth parameter should be a boolean")
	}
	v4Bool = b
	case bool:
	v4Bool = v4auth
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the v4auth parameter should be a boolean")
	}

	keyID := parameters["keyid"]
	if keyID == nil {
	keyID = ""
	}

	chunkSize, err := getParameterAsInt64(parameters, "chunksize", defaultChunkSize, minChunkSize, maxChunkSize)
	if err != nil {
	return nil, err
	}

	multipartCopyChunkSize, err := getParameterAsInt64(parameters, "multipartcopychunksize", defaultMultipartCopyChunkSize, minChunkSize, maxChunkSize)
	if err != nil {
	return nil, err
	}

	multipartCopyMaxConcurrency, err := getParameterAsInt64(parameters, "multipartcopymaxconcurrency", defaultMultipartCopyMaxConcurrency, 1, math.MaxInt64)
	if err != nil {
	return nil, err
	}

	multipartCopyThresholdSize, err := getParameterAsInt64(parameters, "multipartcopythresholdsize", defaultMultipartCopyThresholdSize, 0, maxChunkSize)
	if err != nil {
	return nil, err
	}

	rootDirectory := parameters["rootdirectory"]
	if rootDirectory == nil {
	rootDirectory = ""
	}

	storageClass := s3.StorageClassStandard
	storageClassParam := parameters["storageclass"]
	if storageClassParam != nil {
	storageClassString, ok := storageClassParam.(string)
	if !ok {
	return nil, fmt.Errorf(
	"the storageclass parameter must be one of %v, %v invalid",
	s3StorageClasses,
	storageClassParam,
	)
	}
	// All valid storage class parameters are UPPERCASE, so be a bit more flexible here
	storageClassString = strings.ToUpper(storageClassString)
	if storageClassString != noStorageClass &&
	storageClassString != s3.StorageClassStandard &&
	storageClassString != s3.StorageClassReducedRedundancy &&
	storageClassString != s3.StorageClassStandardIa &&
	storageClassString != s3.StorageClassOnezoneIa &&
	storageClassString != s3.StorageClassIntelligentTiering &&
	storageClassString != s3.StorageClassOutposts &&
	storageClassString != s3.StorageClassGlacierIr {
	return nil, fmt.Errorf(
	"the storageclass parameter must be one of %v, %v invalid",
	s3StorageClasses,
	storageClassParam,
	)
	}
	storageClass = storageClassString
	}

	userAgent := parameters["useragent"]
	if userAgent == nil {
	userAgent = ""
	}

	objectACL := s3.ObjectCannedACLPrivate
	objectACLParam := parameters["objectacl"]
	if objectACLParam != nil {
	objectACLString, ok := objectACLParam.(string)
	if !ok {
	return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam)
	}

	if _, ok = validObjectACLs[objectACLString]; !ok {
	return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam)
	}
	objectACL = objectACLString
	}

	useDualStackBool := false
	useDualStack := parameters["usedualstack"]
	switch useDualStack := useDualStack.(type) {
	case string:
	b, err := strconv.ParseBool(useDualStack)
	if err != nil {
	return nil, fmt.Errorf("the useDualStack parameter should be a boolean")
	}
	useDualStackBool = b
	case bool:
	useDualStackBool = useDualStack
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the useDualStack parameter should be a boolean")
	}

	mutlipartCombineSmallPart := true
	combine := parameters["multipartcombinesmallpart"]
	switch combine := combine.(type) {
	case string:
	b, err := strconv.ParseBool(combine)
	if err != nil {
	return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean")
	}
	mutlipartCombineSmallPart = b
	case bool:
	mutlipartCombineSmallPart = combine
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean")
	}

	sessionToken := ""

	accelerateBool := false
	accelerate := parameters["accelerate"]
	switch accelerate := accelerate.(type) {
	case string:
	b, err := strconv.ParseBool(accelerate)
	if err != nil {
	return nil, fmt.Errorf("the accelerate parameter should be a boolean")
	}
	accelerateBool = b
	case bool:
	accelerateBool = accelerate
	case nil:
	// do nothing
	default:
	return nil, fmt.Errorf("the accelerate parameter should be a boolean")
	}

	params := DriverParameters{
	fmt.Sprint(accessKey),
	fmt.Sprint(secretKey),
	fmt.Sprint(bucket),
	region,
	fmt.Sprint(regionEndpoint),
	forcePathStyleBool,
	encryptBool,
	fmt.Sprint(keyID),
	secureBool,
	skipVerifyBool,
	v4Bool,
	chunkSize,
	multipartCopyChunkSize,
	multipartCopyMaxConcurrency,
	multipartCopyThresholdSize,
	mutlipartCombineSmallPart,
	fmt.Sprint(rootDirectory),
	storageClass,
	fmt.Sprint(userAgent),
	objectACL,
	fmt.Sprint(sessionToken),
	useDualStackBool,
	accelerateBool,
	getS3LogLevelFromParam(parameters["loglevel"]),
	}

	return New(ctx, params)
	}

[WIP] feat: add cache provider implementation #4169

Are you sure you want to change the base?

[WIP] feat: add cache provider implementation #4169

Conversation

milosgajdos commented Nov 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milosgajdos commented Dec 7, 2023

milosgajdos commented Dec 8, 2023 • edited

corhere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milosgajdos Dec 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deleteriousEffect Dec 12, 2023 • edited

Choose a reason for hiding this comment

milosgajdos commented Nov 23, 2023 •

edited

milosgajdos commented Dec 8, 2023 •

edited

milosgajdos Dec 8, 2023 •

edited

deleteriousEffect Dec 12, 2023 •

edited