New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat: add cache provider implementation #4169
base: main
Are you sure you want to change the base?
Conversation
currently the only two implementation of cache are redis and inmemory. This commit introduces cache providers that let you implement your own blbodescriptor cache of choice and plug it in. Co-authored-by: Gladkov Alexey <agladkov@redhat.com> Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
if _, exists := cacheProviders[name]; exists { | ||
return fmt.Errorf("name already registered: %s", name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is provider.Register
intended to be called from init()
functions for packages to self-register? If so, why not just panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good shout. I think there are a few other cases in this codebase where we should rectify this e.g.
distribution/registry/storage/driver/middleware/storagemiddleware.go
Lines 18 to 29 in 04e3bda
func Register(name string, initFunc InitFunc) error { | |
if storageMiddlewares == nil { | |
storageMiddlewares = make(map[string]InitFunc) | |
} | |
if _, exists := storageMiddlewares[name]; exists { | |
return fmt.Errorf("name already registered: %s", name) | |
} | |
storageMiddlewares[name] = initFunc | |
return nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how many near-duplicates of this pattern are scattered throughout the codebase. I betcha they could all be replaced with instantiations of a single generic implementation.
type FactoryRegistry[T any, F func(context.Context, map[string]any] struct{ ... }
func NewFactoryRegistry[T any, F func(context.Context, map[string]any) (T, error)]() FactoryRegistry[T]
func (*FactoryRegistry[T, F]) Register(name string, factory F)
func (*FactoryRegistry[T, F]) Get(ctx context.Context, name string, options map[string]any) (T, error)
var Registry = factoryutil.NewFactoryRegistry[cache.BlobDescriptorCacheProvider]()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with generics, this codebase could use proper massaging.
Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
This func is meant to be used from init funcs of specific cacheprovider implementations. If they fail to register we should panic. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
With this PR...I am kinda wondering if |
This commit reworks cache provider setup and configuration. Cache now has a dedicated configuration structure instead of being part of the storage configuration. This simplifies the code quite a bit at the expense of sad configuration decoding. This commit now allows for adding of new cache providers should there be such need in the future. Currently we continue to provide only redis and inmemory cache. In line with this the existing cache implementation: inmemory and redis have now been "turned" into cache providers which also simplifies the code. Cache configuration is now expected to be set the following options: provider - name of the cache provider params - arbitrary set of config params for the given cache provider. Each cache provider must register itself via init func so it's loaded on startup and available if the config requires it. Redis configuration has been removed from the main configuration because its only ever been used for configuring caches and nothing else so it makes no sense to keep it in the global config. Documentation has been updated accordingly. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
@corhere I've reworked the cache provider stuff from the ground up. Would love it if you could spare a minute looking through this PR. I've marked it as a I've also updated the PR config with all the new info. Also PTAL @thaJeztah @Jamstah |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to convince me regarding the change to redis configuration.
redis: | ||
addr: localhost:6379 | ||
pool: | ||
maxidle: 16 | ||
maxactive: 64 | ||
idletimeout: 300s | ||
dialtimeout: 10ms | ||
readtimeout: 10ms | ||
writetimeout: 10ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more future-proof to leave redis
as a top-level key. It may only be used for one thing now, but that could change in the future. Redis could be useful for other kinds of middleware, e.g. rate-limiting clients, and could be made to coexist with the blobdescriptor cache keys in the same redis instance. Having separate redis connection configs for each would be repetitive to configure, and would result in multiple connection pools targeting the same server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking about EXACTLY the same lines before I made the change.
The use cases you mention might make some sense in the future but I'm skeptical: nobodoy will implement rate limiters into this bloating codebase. Large deployments are usually behind proxies that handle rate limiting before the requests hit the API -- speaking of experience of managing many of these large volume ones over my career.
Whilst sharing a Redis instance across many use cases makes sense in cute little deployments it becomes a nuisance and proper headache whem you outgrow your small setup, not to mention whatever security use case your security team barfs at you, etc.
Still Not entirely sold on keeping it as a global, but I'm also not strictly against it either - just mulling over my past experiences.
Sometimes it's better to duplicate things than create more problems for yourself by prematurely optimising 🙃
// NewCacheOptions returns new memory cache options. | ||
func NewCacheOptions(size int) map[string]interface{} { | ||
return map[string]interface{}{ | ||
"params": map[interface{}]interface{}{ | ||
"size": size, | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existence and usage of this function tells me that something is upside-down. Specifically, NewBlobDescriptorCacheProvider
is not adhering to the Single-Reponsibility Principle. It is parsing user configuration map[string]any
into a useable representation, and constructing a provider. Parsing a map[string]any
is only needed for the factory registered with the cache-provider registry; it's just extra steps for the registry client and other internal users who instantiate cache providers directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's one of a few things in this PR that I'm still mulling over. Does not feel right I agree.
Note that the Azure storage driver is also decoding auth config in its constructor: thats where I got some inspiration, but I'm not happy about it no.
It's also one of the reasons I marked this PR as draft. There are things to be figured out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI: this is a helper func, you don't have to use it, there is nothing that prevents you from providing your own map[string]interface{}
params in a similar way it's done in the storage drivers.
I was trying to avoid this (and other storage driver config) absolute state of things
distribution/registry/storage/driver/s3-aws/s3.go
Lines 192 to 472 in 04e3bda
func FromParameters(ctx context.Context, parameters map[string]interface{}) (*Driver, error) { | |
// Providing no values for these is valid in case the user is authenticating | |
// with an IAM on an ec2 instance (in which case the instance credentials will | |
// be summoned when GetAuth is called) | |
accessKey := parameters["accesskey"] | |
if accessKey == nil { | |
accessKey = "" | |
} | |
secretKey := parameters["secretkey"] | |
if secretKey == nil { | |
secretKey = "" | |
} | |
regionEndpoint := parameters["regionendpoint"] | |
if regionEndpoint == nil { | |
regionEndpoint = "" | |
} | |
forcePathStyleBool := true | |
forcePathStyle := parameters["forcepathstyle"] | |
switch forcePathStyle := forcePathStyle.(type) { | |
case string: | |
b, err := strconv.ParseBool(forcePathStyle) | |
if err != nil { | |
return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean") | |
} | |
forcePathStyleBool = b | |
case bool: | |
forcePathStyleBool = forcePathStyle | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean") | |
} | |
regionName := parameters["region"] | |
if regionName == nil || fmt.Sprint(regionName) == "" { | |
return nil, fmt.Errorf("no region parameter provided") | |
} | |
region := fmt.Sprint(regionName) | |
// Don't check the region value if a custom endpoint is provided. | |
if regionEndpoint == "" { | |
if _, ok := validRegions[region]; !ok { | |
return nil, fmt.Errorf("invalid region provided: %v", region) | |
} | |
} | |
bucket := parameters["bucket"] | |
if bucket == nil || fmt.Sprint(bucket) == "" { | |
return nil, fmt.Errorf("no bucket parameter provided") | |
} | |
encryptBool := false | |
encrypt := parameters["encrypt"] | |
switch encrypt := encrypt.(type) { | |
case string: | |
b, err := strconv.ParseBool(encrypt) | |
if err != nil { | |
return nil, fmt.Errorf("the encrypt parameter should be a boolean") | |
} | |
encryptBool = b | |
case bool: | |
encryptBool = encrypt | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the encrypt parameter should be a boolean") | |
} | |
secureBool := true | |
secure := parameters["secure"] | |
switch secure := secure.(type) { | |
case string: | |
b, err := strconv.ParseBool(secure) | |
if err != nil { | |
return nil, fmt.Errorf("the secure parameter should be a boolean") | |
} | |
secureBool = b | |
case bool: | |
secureBool = secure | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the secure parameter should be a boolean") | |
} | |
skipVerifyBool := false | |
skipVerify := parameters["skipverify"] | |
switch skipVerify := skipVerify.(type) { | |
case string: | |
b, err := strconv.ParseBool(skipVerify) | |
if err != nil { | |
return nil, fmt.Errorf("the skipVerify parameter should be a boolean") | |
} | |
skipVerifyBool = b | |
case bool: | |
skipVerifyBool = skipVerify | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the skipVerify parameter should be a boolean") | |
} | |
v4Bool := true | |
v4auth := parameters["v4auth"] | |
switch v4auth := v4auth.(type) { | |
case string: | |
b, err := strconv.ParseBool(v4auth) | |
if err != nil { | |
return nil, fmt.Errorf("the v4auth parameter should be a boolean") | |
} | |
v4Bool = b | |
case bool: | |
v4Bool = v4auth | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the v4auth parameter should be a boolean") | |
} | |
keyID := parameters["keyid"] | |
if keyID == nil { | |
keyID = "" | |
} | |
chunkSize, err := getParameterAsInt64(parameters, "chunksize", defaultChunkSize, minChunkSize, maxChunkSize) | |
if err != nil { | |
return nil, err | |
} | |
multipartCopyChunkSize, err := getParameterAsInt64(parameters, "multipartcopychunksize", defaultMultipartCopyChunkSize, minChunkSize, maxChunkSize) | |
if err != nil { | |
return nil, err | |
} | |
multipartCopyMaxConcurrency, err := getParameterAsInt64(parameters, "multipartcopymaxconcurrency", defaultMultipartCopyMaxConcurrency, 1, math.MaxInt64) | |
if err != nil { | |
return nil, err | |
} | |
multipartCopyThresholdSize, err := getParameterAsInt64(parameters, "multipartcopythresholdsize", defaultMultipartCopyThresholdSize, 0, maxChunkSize) | |
if err != nil { | |
return nil, err | |
} | |
rootDirectory := parameters["rootdirectory"] | |
if rootDirectory == nil { | |
rootDirectory = "" | |
} | |
storageClass := s3.StorageClassStandard | |
storageClassParam := parameters["storageclass"] | |
if storageClassParam != nil { | |
storageClassString, ok := storageClassParam.(string) | |
if !ok { | |
return nil, fmt.Errorf( | |
"the storageclass parameter must be one of %v, %v invalid", | |
s3StorageClasses, | |
storageClassParam, | |
) | |
} | |
// All valid storage class parameters are UPPERCASE, so be a bit more flexible here | |
storageClassString = strings.ToUpper(storageClassString) | |
if storageClassString != noStorageClass && | |
storageClassString != s3.StorageClassStandard && | |
storageClassString != s3.StorageClassReducedRedundancy && | |
storageClassString != s3.StorageClassStandardIa && | |
storageClassString != s3.StorageClassOnezoneIa && | |
storageClassString != s3.StorageClassIntelligentTiering && | |
storageClassString != s3.StorageClassOutposts && | |
storageClassString != s3.StorageClassGlacierIr { | |
return nil, fmt.Errorf( | |
"the storageclass parameter must be one of %v, %v invalid", | |
s3StorageClasses, | |
storageClassParam, | |
) | |
} | |
storageClass = storageClassString | |
} | |
userAgent := parameters["useragent"] | |
if userAgent == nil { | |
userAgent = "" | |
} | |
objectACL := s3.ObjectCannedACLPrivate | |
objectACLParam := parameters["objectacl"] | |
if objectACLParam != nil { | |
objectACLString, ok := objectACLParam.(string) | |
if !ok { | |
return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam) | |
} | |
if _, ok = validObjectACLs[objectACLString]; !ok { | |
return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam) | |
} | |
objectACL = objectACLString | |
} | |
useDualStackBool := false | |
useDualStack := parameters["usedualstack"] | |
switch useDualStack := useDualStack.(type) { | |
case string: | |
b, err := strconv.ParseBool(useDualStack) | |
if err != nil { | |
return nil, fmt.Errorf("the useDualStack parameter should be a boolean") | |
} | |
useDualStackBool = b | |
case bool: | |
useDualStackBool = useDualStack | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the useDualStack parameter should be a boolean") | |
} | |
mutlipartCombineSmallPart := true | |
combine := parameters["multipartcombinesmallpart"] | |
switch combine := combine.(type) { | |
case string: | |
b, err := strconv.ParseBool(combine) | |
if err != nil { | |
return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean") | |
} | |
mutlipartCombineSmallPart = b | |
case bool: | |
mutlipartCombineSmallPart = combine | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean") | |
} | |
sessionToken := "" | |
accelerateBool := false | |
accelerate := parameters["accelerate"] | |
switch accelerate := accelerate.(type) { | |
case string: | |
b, err := strconv.ParseBool(accelerate) | |
if err != nil { | |
return nil, fmt.Errorf("the accelerate parameter should be a boolean") | |
} | |
accelerateBool = b | |
case bool: | |
accelerateBool = accelerate | |
case nil: | |
// do nothing | |
default: | |
return nil, fmt.Errorf("the accelerate parameter should be a boolean") | |
} | |
params := DriverParameters{ | |
fmt.Sprint(accessKey), | |
fmt.Sprint(secretKey), | |
fmt.Sprint(bucket), | |
region, | |
fmt.Sprint(regionEndpoint), | |
forcePathStyleBool, | |
encryptBool, | |
fmt.Sprint(keyID), | |
secureBool, | |
skipVerifyBool, | |
v4Bool, | |
chunkSize, | |
multipartCopyChunkSize, | |
multipartCopyMaxConcurrency, | |
multipartCopyThresholdSize, | |
mutlipartCombineSmallPart, | |
fmt.Sprint(rootDirectory), | |
storageClass, | |
fmt.Sprint(userAgent), | |
objectACL, | |
fmt.Sprint(sessionToken), | |
useDualStackBool, | |
accelerateBool, | |
getS3LogLevelFromParam(parameters["loglevel"]), | |
} | |
return New(ctx, params) | |
} |
But maybe I'm trying to be too clever here 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code needs to exist somewhere which transforms a generic map[string]any
configuration blob into something the implementation can consume. There's no getting away from the code in the body of s3.FromParameters
aside from leveraging metaprogramming (e.g. mapstructure
) to do the same thing less verbosely. The only notes I have for s3.FromParameters
are:
- It would be easier to unit-test if the code to parse
parameters
was extracted to its ownfunc(map[string]any) (DriverParamters, error)
FromParameters
probably does not need to be exported since it is made available through the driver registry. Code which does not need to go through the registry and therefore does not need to be generic over all storage drivers does not need to go through the generic parameter-parsing machinery.- The functional options pattern would make it easier to evolve the configuration and implementation in semver-compatible ways than to export the
DriverParameters
struct.
Aside from that, I firmly believe that the cache provider implementations should be following the way the s3 storage driver does configuration and construction. Consumers that do not need to be generic over all implementations of an interface should not be forced to use the weakly-typed, generic-over-all-implementations factory interface targeted at humans. Configuration as map[string]any
makes runtime errors out of what could have otherwise been caught at compile time.
func NewInMemoryBlobDescriptorCacheProvider(size int) cache.BlobDescriptorCacheProvider { | ||
if size <= 0 { | ||
func NewBlobDescriptorCacheProvider(ctx context.Context, options map[string]interface{}) (cache.BlobDescriptorCacheProvider, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func New(size int) *Cache
func factory(_ context.Context, options map[string]any) (cache.BlobDescriptorCacheProvider, error) {
/* ... */
return New(c.Size), nil
}
The names are just for illustration. My point is:
- return concrete types for internal users who are constructing caches directly, and export the concrete type
- The
factory function"initFunc" does not need to be the same function as the constructor. Even if it would just be a trivial wrapper, it would still be doing something: lifting the concrete cache value into thecache.BlobDescriptorCacheProvider
interface type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of returning concrete values but by the same token we'd have had to burn down half this codebase which returns interfaces left right and center.
Note that this mostly follows the existing code in this repo for consistency sake - though some parts of it are a bit divergent now I agree.
It's even funkier here with the cache descriptors because the return values are actually wrapped in a prometheus wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of returning concrete values but by the same token we'd have had to burn down half this codebase which returns interfaces left right and center.
I'm not reviewing the whole codebase, nor am I suggesting that we boil the ocean.
Note that this mostly follows the existing code in this repo for consistency sake
And I am objecting to following legacy coding conventions in greenfield code for no reason other than "for consistency's sake." We can never evolve our coding conventions or move on from past mistakes with that attitude.
It's even funkier here with the cache descriptors because the return values are actually wrapped in a prometheus wrapper
Ahh! Now we get to the real reason: it would reveal a layering violation in the redis cache implementation. The metrics middleware for caches has low cohesion with the redis cache implementation, therefore they should not be tightly coupled. The middleware should be applied to the cache.Get()
return value (read: in NewApp()
) so that any cache provider gets instrumented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some open questions I still have (besides the cohesion point you highlighted -- thanks!):
- should we move
redis
config stanza out of the root of the config -- I'm undecided on this; I feel like having it in the root makes it look like Redis is an "essential" part of the app which isn't true: we can instead either use in memory cache or no cache at all. So creating some semantic section for it that highlights its purpose in the wider context of the app could make sense. No idea. I'm just braindumping here. - What should the params passed down to Redis module look like? We've agreed on config decoder living outside of it. To make it easier to consume Redis cache APIs should the
pool
be created inside the constructor from the passed-in params or should it be created in the app root (this would allow keeping the global pool client in the app and potentially reusing it elsewhere) and then passed it viaparams
to the module (e.g.map["pool"] = pool
- this was actually my original idea, but it didn't feel quite right). The former would align with some of the code in this project but make it mildly annoying to consume (i.e. the API consumers would have to supply parameters via the mildly type unsafemap
--any
values gonna beany
ing), the latter would make it easier to consumer (i.e. the API consumers could simply just pass the pool object down) but it doesn't quite feel right either; the last option could be changing theInitFunc
that could maybe accept thepool
as param directly.
I'm still mulling over this. And since this PR is still a WIP or more like an exercise in exploration of the future, there is no rush on this. I'd prefer getting this right (or as close to the ideal as possible) than regret it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should we move
redis
config stanza out of the root of the config
I am not too concerned about confusing users by keeping the config stanza in the root. I still think there is some merit to future-proofing the config to afford sharing a single redis client connection pool (which implies a shared database instance) among multiple redis-backed modules within the app. But I don't feel too strongly about it: if in the short-term it would require a complicated architecture to pull it off when there is just a single consumer, I am fully prepared to accept that YAGNI.
(Aside: in a hypothetical future where there is more than one redis-backed module available in the app, root redis config could coexist with per-module overrides. The overridden module would instantiate its own client, inheriting the root redis config values as defaults. E.g.:)
redis:
addr: localhost:6379
pool:
maxidle: 16
maxactive: 64
idletimeout: 300s
dialtimeout: 10ms
readtimeout: 10ms
writetimeout: 10ms
cache:
blobdescriptor:
provider: redis
params:
db: 42
- What should the params passed down to Redis module look like?
The constructor should continue to accept a *redis.Client
directly. How the factory function acquires that client is the big question.
First thing we could do is move the cache-provider registry stuff under an internal
subtree to free it from the SemVer compatibility contract. (Maybe the cache implementations, too? With sanely-shaped constructors they might be useful for third parties, but we'd have to think about SemVer compatibility.) Then we'd be free to iterate on the registry and factories without being constrained by past decisions.
In the long term, a shared app-wide redis client (for example) could be made available to the cache-provider factories through some form of dependency injection. A pointer to the app struct could be passed into registry.Get
alongside the params map, or perhaps some purpose-built "provider registry" object. In the short term, said dependency injection mechanism could instead provide the top-level redis config to the factory since the redis cache-provider is the only thing which needs the redis client today. With app.redis
already being a thing, passing the app struct pointer into the factory may be the path of least resistance as long as we are okay with the factory implementation living in the app
package. (I think it makes sense: the factory is necessarily tightly coupled to (at least one subtree of) the app config so it follows logically that it should also have high cohesion with the app package.)
// Register is used to register an InitFunc for | ||
// a BlobDescriptorCacheProvider with the given name. | ||
// It's meant to be called from init() function. | ||
func Register(name string, initFunc InitFunc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With one sight tweak, it would be possible to register with a bit less boilerplate:
func Register(name string, initFunc InitFunc) { | |
func Register(name string, initFunc InitFunc) struct{} { |
package myprovider
import "github.com/distribution/distribution/v3/registry/storage/cache"
var _ = cache.Register("myprovider", initFunc)
(I got the idea from Ginkgo, for better or worse.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is possible, but I'm trying to follow the established conventions in this codebase (see the storage drives and middleware)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's so sacred about the established conventions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing in particular, but I prefer keeping code consistent across codebases: it makes navigation, comprehension, onboarding new devs, etc much easier...I got a bit religious with it in this codebase I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for keeping code consistent, no code is perfect and having to deal with the same suboptimal convention everywhere is better than having to deal with 5 or 6 practices of varying quality, even if the average quality is better.
If we did want to establish a new convention, we should be deliberate about picking one and moving forward with that single practice, and we should also look towards switching the old code over to that as well.
We don't have to rip everything out all at once, but if the new convention is not compelling enough that we'd see benefit from applying it to the older code, then it may also not be compelling enough to deal with two conventions during the interim period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me get this straight:
- All existing code has been grandfathered in as "conventional."
- Except for the S3 storage driver. For some reason, everything it does has been decreed anti-conventional.
- All new code to which any similarity can be drawn with existing code must be shaped like the existing code.
- Except for the S3 storage driver. New code must be inconsistent with the S3 driver, just to be contrary.
- Any changes to how code is shaped must be agreed upon by consensus and must then be ratified by applying it to the entire codebase.
- Any changes must therefore go through a waterfall process before a single line of new code can be merged which goes against grandfathered conventions.
- The determination of whether a new convention is compelling enough must be made without having experience coding in the new convention. Experience in coding the new convention cannot be gained without coding in the new convention. New conventions are therefore de facto banned. Whatever conventions were grandfathered in are our coding conventions, now and forever.
- And given which thread this discussion is being made on, anything which might possibly afford code to be written in a divergent style while still affording the conventional style is also banned
That can't be right.
I am literally just suggesting that the brand new Register
function grow a dummy result parameter so that implementations could use a less-noisy syntax for registration. The following would also compile just fine!
func init() {
cache.Register("myprovider", initFunc)
}
I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.
FYI: I appreciate your feedback very much. In fact, I really enjoy working together/collaborating on this project with you. I had no idea some of my comments would cause this much discussion or maybe even angst (hopefully not!).
This codebase does need a ton of improvements all around and I need to withhold [at least some of my] hot takes (e.g. "set it on fire one lol") here as they're not productive. This discussion should probably belong elsewhere though.
FYI2: I'm not against changing the InitFunc
- I'm sure we can find a way to make this work 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.
Please know that I am also intensely frustrated by the way this codebase does things. This project is old enough to have existed before a lot of Go best practices were solidified, and it shows.
Any changes to how code is shaped must be agreed upon by consensus
You need two approvals from two maintainers in different orgs, in your case that's yourself and one other maintainer.
I tend to be more cautious, and I think it's a good and healthy balance to have someone who wants to shake things up. We need to moderate each other in this way.
and must then be ratified by applying it to the entire codebase.
I have a solution below that might give us both consensus and the capacity to effect change.
The determination of whether a new convention is compelling enough must be made without having experience coding in the new convention. Experience in coding the new convention cannot be gained without coding in the new convention. New conventions are therefore de facto banned. Whatever conventions were grandfathered in are our coding conventions, now and forever.
We all have worked on other projects and seen different conventions applied there and what their results were.
We can reasonably determine if something is going to work out and then decide to commit to it.
If we want to adopt a new convention, we can start with a single PR. However, if there's no barrier to entry, then it's easy to have a bunch of unrelated incohesive practices sprinkled throughout the code. Part of being a maintainer on any project is thinking about the impact of changes in context of existing code.
What I think would help in resolving these conflicts is adopting or implementing a style guide. This way, we can be deliberate about what our practices are, and apply them consistently. This would allow us to stabilize our practices moving forward and justify and encourage creating PRs which make old code conform. Having a style guide should provide the consistency that I want to see, while providing you with the ability to evolve the codebase.
To move forward, I propose:
- Bucking the convention of returning interfaces, right now, in this PR.
- Creating a style guide with a single entry in support of this style, in this or a followup PR.
- Creating an issue to discuss expanding the style guide, for example, if we want to write our own or adopt some existing guide and if so, which one.
Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
Currently, the only two implementations of cache are
redis
andinmemory
.This PR introduces cache providers that let you implement your own
blobdescriptor
cache of choice.Extended rebase of #2580
EDIT:
cache
now has a dedicated configuration struct instead of being part of the storage configuration.This simplifies the code quite a bit at the expense of sad configuration decoding. I passionately hat the
configuration
package and IMHO we should rework it almost from the ground up.Each cache provider must register itself via the
init()
func so it's loaded on startup and available if the config requires it.Adding new cache providers, should there be such a need in the future, should be easier.
Currently, we continue to provide only
redis
andinmemory
cache provider implementations.Cache configuration is now expected to be set with the following options:
provider
: name of a cache providerparams
set of config params that configure the given cache providerRedis
configuration has been moved from the main configuration toredis
cache pacakge because its only ever been used for configuring caches and nothing else so it makes no sense to keep it in the global config.Documentation has been updated accordingly.