Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: add cache provider implementation #4169

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

milosgajdos
Copy link
Member

@milosgajdos milosgajdos commented Nov 23, 2023

Currently, the only two implementations of cache are redis and inmemory.

This PR introduces cache providers that let you implement your own blobdescriptor cache of choice.

Extended rebase of #2580

EDIT:

⚠️ This changes configuration but it's still the perfect time to do this before v3 release ⚠️

cache now has a dedicated configuration struct instead of being part of the storage configuration.
This simplifies the code quite a bit at the expense of sad configuration decoding. I passionately hat the configuration package and IMHO we should rework it almost from the ground up.

Each cache provider must register itself via theinit() func so it's loaded on startup and available if the config requires it.

Adding new cache providers, should there be such a need in the future, should be easier.
Currently, we continue to provide only redis and inmemory cache provider implementations.

Cache configuration is now expected to be set with the following options:

  • provider: name of a cache provider
  • params set of config params that configure the given cache provider

Redis configuration has been moved from the main configuration to redis cache pacakge because its only ever been used for configuring caches and nothing else so it makes no sense to keep it in the global config.

Documentation has been updated accordingly.

currently the only two implementation of cache are redis and inmemory.
This commit introduces cache providers that let you implement your own
blbodescriptor cache of choice and plug it in.

Co-authored-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
@milosgajdos milosgajdos marked this pull request as ready for review November 23, 2023 07:18
Comment on lines 22 to 24
if _, exists := cacheProviders[name]; exists {
return fmt.Errorf("name already registered: %s", name)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is provider.Register intended to be called from init() functions for packages to self-register? If so, why not just panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout. I think there are a few other cases in this codebase where we should rectify this e.g.

func Register(name string, initFunc InitFunc) error {
if storageMiddlewares == nil {
storageMiddlewares = make(map[string]InitFunc)
}
if _, exists := storageMiddlewares[name]; exists {
return fmt.Errorf("name already registered: %s", name)
}
storageMiddlewares[name] = initFunc
return nil
}

registry/storage/cache/provider/cacheprovider.go Outdated Show resolved Hide resolved
registry/storage/cache/provider/cacheprovider.go Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many near-duplicates of this pattern are scattered throughout the codebase. I betcha they could all be replaced with instantiations of a single generic implementation.

type FactoryRegistry[T any, F func(context.Context, map[string]any] struct{ ... }
func NewFactoryRegistry[T any, F func(context.Context, map[string]any) (T, error)]() FactoryRegistry[T]

func (*FactoryRegistry[T, F]) Register(name string, factory F)
func (*FactoryRegistry[T, F]) Get(ctx context.Context, name string, options map[string]any) (T, error)
var Registry = factoryutil.NewFactoryRegistry[cache.BlobDescriptorCacheProvider]()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with generics, this codebase could use proper massaging.

milosgajdos and others added 2 commits December 7, 2023 23:17
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
This func is meant to be used from init funcs of specific cacheprovider
implementations. If they fail to register we should panic.

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
corhere
corhere previously approved these changes Dec 7, 2023
@milosgajdos
Copy link
Member Author

With this PR...I am kinda wondering if redis and inmemory caches should be considered to be cacheproviders. That's not the case at the moment. I will mull over this for a bit, but I think if we do this (agree on getting this in) we should make redis and inmemory caches cacheproviders

This commit reworks cache provider setup and configuration.
Cache now has a dedicated configuration structure instead of
being part of the storage configuration. This simplifies the
code quite a bit at the expense of sad configuration decoding.

This commit now allows for adding of new cache providers should there be
such need in the future. Currently we continue to provide only redis and
inmemory cache.

In line with this the existing cache implementation: inmemory and redis
have now been "turned" into cache providers which also simplifies the code.

Cache configuration is now expected to be set the following options:
provider - name of the cache provider
params - arbitrary set of config params for the given cache provider.

Each cache provider must register itself via init func so it's loaded
on startup and available if the config requires it.

Redis configuration has been removed from the main configuration because
its only ever been used for configuring caches and nothing else so it
makes no sense to keep it in the global config.

Documentation has been updated accordingly.

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
@milosgajdos milosgajdos marked this pull request as draft December 8, 2023 19:44
@milosgajdos
Copy link
Member Author

milosgajdos commented Dec 8, 2023

@corhere I've reworked the cache provider stuff from the ground up. Would love it if you could spare a minute looking through this PR. I've marked it as a draft for the time being so we can iron out some details, but I'm pretty happy with it as it is.

I've also updated the PR config with all the new info.

Also PTAL @thaJeztah @Jamstah

@milosgajdos milosgajdos added the area/config Related to registry config label Dec 8, 2023
@corhere corhere dismissed their stale review December 8, 2023 20:05

big jump in scope since review

Copy link
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to convince me regarding the change to redis configuration.

Comment on lines -22 to -30
redis:
addr: localhost:6379
pool:
maxidle: 16
maxactive: 64
idletimeout: 300s
dialtimeout: 10ms
readtimeout: 10ms
writetimeout: 10ms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more future-proof to leave redis as a top-level key. It may only be used for one thing now, but that could change in the future. Redis could be useful for other kinds of middleware, e.g. rate-limiting clients, and could be made to coexist with the blobdescriptor cache keys in the same redis instance. Having separate redis connection configs for each would be repetitive to configure, and would result in multiple connection pools targeting the same server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking about EXACTLY the same lines before I made the change.

The use cases you mention might make some sense in the future but I'm skeptical: nobodoy will implement rate limiters into this bloating codebase. Large deployments are usually behind proxies that handle rate limiting before the requests hit the API -- speaking of experience of managing many of these large volume ones over my career.

Whilst sharing a Redis instance across many use cases makes sense in cute little deployments it becomes a nuisance and proper headache whem you outgrow your small setup, not to mention whatever security use case your security team barfs at you, etc.

Still Not entirely sold on keeping it as a global, but I'm also not strictly against it either - just mulling over my past experiences.

Sometimes it's better to duplicate things than create more problems for yourself by prematurely optimising 🙃

Comment on lines +178 to +185
// NewCacheOptions returns new memory cache options.
func NewCacheOptions(size int) map[string]interface{} {
return map[string]interface{}{
"params": map[interface{}]interface{}{
"size": size,
},
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence and usage of this function tells me that something is upside-down. Specifically, NewBlobDescriptorCacheProvider is not adhering to the Single-Reponsibility Principle. It is parsing user configuration map[string]any into a useable representation, and constructing a provider. Parsing a map[string]any is only needed for the factory registered with the cache-provider registry; it's just extra steps for the registry client and other internal users who instantiate cache providers directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's one of a few things in this PR that I'm still mulling over. Does not feel right I agree.

Note that the Azure storage driver is also decoding auth config in its constructor: thats where I got some inspiration, but I'm not happy about it no.

It's also one of the reasons I marked this PR as draft. There are things to be figured out here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: this is a helper func, you don't have to use it, there is nothing that prevents you from providing your own map[string]interface{} params in a similar way it's done in the storage drivers.

I was trying to avoid this (and other storage driver config) absolute state of things

func FromParameters(ctx context.Context, parameters map[string]interface{}) (*Driver, error) {
// Providing no values for these is valid in case the user is authenticating
// with an IAM on an ec2 instance (in which case the instance credentials will
// be summoned when GetAuth is called)
accessKey := parameters["accesskey"]
if accessKey == nil {
accessKey = ""
}
secretKey := parameters["secretkey"]
if secretKey == nil {
secretKey = ""
}
regionEndpoint := parameters["regionendpoint"]
if regionEndpoint == nil {
regionEndpoint = ""
}
forcePathStyleBool := true
forcePathStyle := parameters["forcepathstyle"]
switch forcePathStyle := forcePathStyle.(type) {
case string:
b, err := strconv.ParseBool(forcePathStyle)
if err != nil {
return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean")
}
forcePathStyleBool = b
case bool:
forcePathStyleBool = forcePathStyle
case nil:
// do nothing
default:
return nil, fmt.Errorf("the forcePathStyle parameter should be a boolean")
}
regionName := parameters["region"]
if regionName == nil || fmt.Sprint(regionName) == "" {
return nil, fmt.Errorf("no region parameter provided")
}
region := fmt.Sprint(regionName)
// Don't check the region value if a custom endpoint is provided.
if regionEndpoint == "" {
if _, ok := validRegions[region]; !ok {
return nil, fmt.Errorf("invalid region provided: %v", region)
}
}
bucket := parameters["bucket"]
if bucket == nil || fmt.Sprint(bucket) == "" {
return nil, fmt.Errorf("no bucket parameter provided")
}
encryptBool := false
encrypt := parameters["encrypt"]
switch encrypt := encrypt.(type) {
case string:
b, err := strconv.ParseBool(encrypt)
if err != nil {
return nil, fmt.Errorf("the encrypt parameter should be a boolean")
}
encryptBool = b
case bool:
encryptBool = encrypt
case nil:
// do nothing
default:
return nil, fmt.Errorf("the encrypt parameter should be a boolean")
}
secureBool := true
secure := parameters["secure"]
switch secure := secure.(type) {
case string:
b, err := strconv.ParseBool(secure)
if err != nil {
return nil, fmt.Errorf("the secure parameter should be a boolean")
}
secureBool = b
case bool:
secureBool = secure
case nil:
// do nothing
default:
return nil, fmt.Errorf("the secure parameter should be a boolean")
}
skipVerifyBool := false
skipVerify := parameters["skipverify"]
switch skipVerify := skipVerify.(type) {
case string:
b, err := strconv.ParseBool(skipVerify)
if err != nil {
return nil, fmt.Errorf("the skipVerify parameter should be a boolean")
}
skipVerifyBool = b
case bool:
skipVerifyBool = skipVerify
case nil:
// do nothing
default:
return nil, fmt.Errorf("the skipVerify parameter should be a boolean")
}
v4Bool := true
v4auth := parameters["v4auth"]
switch v4auth := v4auth.(type) {
case string:
b, err := strconv.ParseBool(v4auth)
if err != nil {
return nil, fmt.Errorf("the v4auth parameter should be a boolean")
}
v4Bool = b
case bool:
v4Bool = v4auth
case nil:
// do nothing
default:
return nil, fmt.Errorf("the v4auth parameter should be a boolean")
}
keyID := parameters["keyid"]
if keyID == nil {
keyID = ""
}
chunkSize, err := getParameterAsInt64(parameters, "chunksize", defaultChunkSize, minChunkSize, maxChunkSize)
if err != nil {
return nil, err
}
multipartCopyChunkSize, err := getParameterAsInt64(parameters, "multipartcopychunksize", defaultMultipartCopyChunkSize, minChunkSize, maxChunkSize)
if err != nil {
return nil, err
}
multipartCopyMaxConcurrency, err := getParameterAsInt64(parameters, "multipartcopymaxconcurrency", defaultMultipartCopyMaxConcurrency, 1, math.MaxInt64)
if err != nil {
return nil, err
}
multipartCopyThresholdSize, err := getParameterAsInt64(parameters, "multipartcopythresholdsize", defaultMultipartCopyThresholdSize, 0, maxChunkSize)
if err != nil {
return nil, err
}
rootDirectory := parameters["rootdirectory"]
if rootDirectory == nil {
rootDirectory = ""
}
storageClass := s3.StorageClassStandard
storageClassParam := parameters["storageclass"]
if storageClassParam != nil {
storageClassString, ok := storageClassParam.(string)
if !ok {
return nil, fmt.Errorf(
"the storageclass parameter must be one of %v, %v invalid",
s3StorageClasses,
storageClassParam,
)
}
// All valid storage class parameters are UPPERCASE, so be a bit more flexible here
storageClassString = strings.ToUpper(storageClassString)
if storageClassString != noStorageClass &&
storageClassString != s3.StorageClassStandard &&
storageClassString != s3.StorageClassReducedRedundancy &&
storageClassString != s3.StorageClassStandardIa &&
storageClassString != s3.StorageClassOnezoneIa &&
storageClassString != s3.StorageClassIntelligentTiering &&
storageClassString != s3.StorageClassOutposts &&
storageClassString != s3.StorageClassGlacierIr {
return nil, fmt.Errorf(
"the storageclass parameter must be one of %v, %v invalid",
s3StorageClasses,
storageClassParam,
)
}
storageClass = storageClassString
}
userAgent := parameters["useragent"]
if userAgent == nil {
userAgent = ""
}
objectACL := s3.ObjectCannedACLPrivate
objectACLParam := parameters["objectacl"]
if objectACLParam != nil {
objectACLString, ok := objectACLParam.(string)
if !ok {
return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam)
}
if _, ok = validObjectACLs[objectACLString]; !ok {
return nil, fmt.Errorf("invalid value for objectacl parameter: %v", objectACLParam)
}
objectACL = objectACLString
}
useDualStackBool := false
useDualStack := parameters["usedualstack"]
switch useDualStack := useDualStack.(type) {
case string:
b, err := strconv.ParseBool(useDualStack)
if err != nil {
return nil, fmt.Errorf("the useDualStack parameter should be a boolean")
}
useDualStackBool = b
case bool:
useDualStackBool = useDualStack
case nil:
// do nothing
default:
return nil, fmt.Errorf("the useDualStack parameter should be a boolean")
}
mutlipartCombineSmallPart := true
combine := parameters["multipartcombinesmallpart"]
switch combine := combine.(type) {
case string:
b, err := strconv.ParseBool(combine)
if err != nil {
return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean")
}
mutlipartCombineSmallPart = b
case bool:
mutlipartCombineSmallPart = combine
case nil:
// do nothing
default:
return nil, fmt.Errorf("the multipartcombinesmallpart parameter should be a boolean")
}
sessionToken := ""
accelerateBool := false
accelerate := parameters["accelerate"]
switch accelerate := accelerate.(type) {
case string:
b, err := strconv.ParseBool(accelerate)
if err != nil {
return nil, fmt.Errorf("the accelerate parameter should be a boolean")
}
accelerateBool = b
case bool:
accelerateBool = accelerate
case nil:
// do nothing
default:
return nil, fmt.Errorf("the accelerate parameter should be a boolean")
}
params := DriverParameters{
fmt.Sprint(accessKey),
fmt.Sprint(secretKey),
fmt.Sprint(bucket),
region,
fmt.Sprint(regionEndpoint),
forcePathStyleBool,
encryptBool,
fmt.Sprint(keyID),
secureBool,
skipVerifyBool,
v4Bool,
chunkSize,
multipartCopyChunkSize,
multipartCopyMaxConcurrency,
multipartCopyThresholdSize,
mutlipartCombineSmallPart,
fmt.Sprint(rootDirectory),
storageClass,
fmt.Sprint(userAgent),
objectACL,
fmt.Sprint(sessionToken),
useDualStackBool,
accelerateBool,
getS3LogLevelFromParam(parameters["loglevel"]),
}
return New(ctx, params)
}

But maybe I'm trying to be too clever here 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code needs to exist somewhere which transforms a generic map[string]any configuration blob into something the implementation can consume. There's no getting away from the code in the body of s3.FromParameters aside from leveraging metaprogramming (e.g. mapstructure) to do the same thing less verbosely. The only notes I have for s3.FromParameters are:

  1. It would be easier to unit-test if the code to parse parameters was extracted to its own func(map[string]any) (DriverParamters, error)
  2. FromParameters probably does not need to be exported since it is made available through the driver registry. Code which does not need to go through the registry and therefore does not need to be generic over all storage drivers does not need to go through the generic parameter-parsing machinery.
  3. The functional options pattern would make it easier to evolve the configuration and implementation in semver-compatible ways than to export the DriverParameters struct.

Aside from that, I firmly believe that the cache provider implementations should be following the way the s3 storage driver does configuration and construction. Consumers that do not need to be generic over all implementations of an interface should not be forced to use the weakly-typed, generic-over-all-implementations factory interface targeted at humans. Configuration as map[string]any makes runtime errors out of what could have otherwise been caught at compile time.

Comment on lines -34 to +40
func NewInMemoryBlobDescriptorCacheProvider(size int) cache.BlobDescriptorCacheProvider {
if size <= 0 {
func NewBlobDescriptorCacheProvider(ctx context.Context, options map[string]interface{}) (cache.BlobDescriptorCacheProvider, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func New(size int) *Cache

func factory(_ context.Context, options map[string]any) (cache.BlobDescriptorCacheProvider, error) {
    /* ... */
    return New(c.Size), nil
}

The names are just for illustration. My point is:

  • return concrete types for internal users who are constructing caches directly, and export the concrete type
  • The factory function "initFunc" does not need to be the same function as the constructor. Even if it would just be a trivial wrapper, it would still be doing something: lifting the concrete cache value into the cache.BlobDescriptorCacheProvider interface type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of returning concrete values but by the same token we'd have had to burn down half this codebase which returns interfaces left right and center.

Note that this mostly follows the existing code in this repo for consistency sake - though some parts of it are a bit divergent now I agree.

It's even funkier here with the cache descriptors because the return values are actually wrapped in a prometheus wrapper

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of returning concrete values but by the same token we'd have had to burn down half this codebase which returns interfaces left right and center.

I'm not reviewing the whole codebase, nor am I suggesting that we boil the ocean.

Note that this mostly follows the existing code in this repo for consistency sake

And I am objecting to following legacy coding conventions in greenfield code for no reason other than "for consistency's sake." We can never evolve our coding conventions or move on from past mistakes with that attitude.

It's even funkier here with the cache descriptors because the return values are actually wrapped in a prometheus wrapper

Ahh! Now we get to the real reason: it would reveal a layering violation in the redis cache implementation. The metrics middleware for caches has low cohesion with the redis cache implementation, therefore they should not be tightly coupled. The middleware should be applied to the cache.Get() return value (read: in NewApp()) so that any cache provider gets instrumented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some open questions I still have (besides the cohesion point you highlighted -- thanks!):

  • should we move redis config stanza out of the root of the config -- I'm undecided on this; I feel like having it in the root makes it look like Redis is an "essential" part of the app which isn't true: we can instead either use in memory cache or no cache at all. So creating some semantic section for it that highlights its purpose in the wider context of the app could make sense. No idea. I'm just braindumping here.
  • What should the params passed down to Redis module look like? We've agreed on config decoder living outside of it. To make it easier to consume Redis cache APIs should the pool be created inside the constructor from the passed-in params or should it be created in the app root (this would allow keeping the global pool client in the app and potentially reusing it elsewhere) and then passed it via params to the module (e.g. map["pool"] = pool - this was actually my original idea, but it didn't feel quite right). The former would align with some of the code in this project but make it mildly annoying to consume (i.e. the API consumers would have to supply parameters via the mildly type unsafe map -- any values gonna be anying), the latter would make it easier to consumer (i.e. the API consumers could simply just pass the pool object down) but it doesn't quite feel right either; the last option could be changing the InitFunc that could maybe accept the pool as param directly.

I'm still mulling over this. And since this PR is still a WIP or more like an exercise in exploration of the future, there is no rush on this. I'd prefer getting this right (or as close to the ideal as possible) than regret it in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should we move redis config stanza out of the root of the config

I am not too concerned about confusing users by keeping the config stanza in the root. I still think there is some merit to future-proofing the config to afford sharing a single redis client connection pool (which implies a shared database instance) among multiple redis-backed modules within the app. But I don't feel too strongly about it: if in the short-term it would require a complicated architecture to pull it off when there is just a single consumer, I am fully prepared to accept that YAGNI.

(Aside: in a hypothetical future where there is more than one redis-backed module available in the app, root redis config could coexist with per-module overrides. The overridden module would instantiate its own client, inheriting the root redis config values as defaults. E.g.:)

redis:
  addr: localhost:6379
  pool:
    maxidle: 16
    maxactive: 64
    idletimeout: 300s
  dialtimeout: 10ms
  readtimeout: 10ms
  writetimeout: 10ms
cache:
  blobdescriptor:
    provider: redis
      params:
        db: 42
  • What should the params passed down to Redis module look like?

The constructor should continue to accept a *redis.Client directly. How the factory function acquires that client is the big question.

First thing we could do is move the cache-provider registry stuff under an internal subtree to free it from the SemVer compatibility contract. (Maybe the cache implementations, too? With sanely-shaped constructors they might be useful for third parties, but we'd have to think about SemVer compatibility.) Then we'd be free to iterate on the registry and factories without being constrained by past decisions.

In the long term, a shared app-wide redis client (for example) could be made available to the cache-provider factories through some form of dependency injection. A pointer to the app struct could be passed into registry.Get alongside the params map, or perhaps some purpose-built "provider registry" object. In the short term, said dependency injection mechanism could instead provide the top-level redis config to the factory since the redis cache-provider is the only thing which needs the redis client today. With app.redis already being a thing, passing the app struct pointer into the factory may be the path of least resistance as long as we are okay with the factory implementation living in the app package. (I think it makes sense: the factory is necessarily tightly coupled to (at least one subtree of) the app config so it follows logically that it should also have high cohesion with the app package.)

// Register is used to register an InitFunc for
// a BlobDescriptorCacheProvider with the given name.
// It's meant to be called from init() function.
func Register(name string, initFunc InitFunc) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With one sight tweak, it would be possible to register with a bit less boilerplate:

Suggested change
func Register(name string, initFunc InitFunc) {
func Register(name string, initFunc InitFunc) struct{} {
package myprovider

import "github.com/distribution/distribution/v3/registry/storage/cache"

var _ = cache.Register("myprovider", initFunc)

(I got the idea from Ginkgo, for better or worse.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is possible, but I'm trying to follow the established conventions in this codebase (see the storage drives and middleware)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's so sacred about the established conventions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in particular, but I prefer keeping code consistent across codebases: it makes navigation, comprehension, onboarding new devs, etc much easier...I got a bit religious with it in this codebase I agree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for keeping code consistent, no code is perfect and having to deal with the same suboptimal convention everywhere is better than having to deal with 5 or 6 practices of varying quality, even if the average quality is better.

If we did want to establish a new convention, we should be deliberate about picking one and moving forward with that single practice, and we should also look towards switching the old code over to that as well.

We don't have to rip everything out all at once, but if the new convention is not compelling enough that we'd see benefit from applying it to the older code, then it may also not be compelling enough to deal with two conventions during the interim period.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me get this straight:

  • All existing code has been grandfathered in as "conventional."
    • Except for the S3 storage driver. For some reason, everything it does has been decreed anti-conventional.
  • All new code to which any similarity can be drawn with existing code must be shaped like the existing code.
  • Any changes to how code is shaped must be agreed upon by consensus and must then be ratified by applying it to the entire codebase.
    • Any changes must therefore go through a waterfall process before a single line of new code can be merged which goes against grandfathered conventions.
    • The determination of whether a new convention is compelling enough must be made without having experience coding in the new convention. Experience in coding the new convention cannot be gained without coding in the new convention. New conventions are therefore de facto banned. Whatever conventions were grandfathered in are our coding conventions, now and forever.
  • And given which thread this discussion is being made on, anything which might possibly afford code to be written in a divergent style while still affording the conventional style is also banned

That can't be right.

I am literally just suggesting that the brand new Register function grow a dummy result parameter so that implementations could use a less-noisy syntax for registration. The following would also compile just fine!

func init() {
        cache.Register("myprovider", initFunc)
}

I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.

FYI: I appreciate your feedback very much. In fact, I really enjoy working together/collaborating on this project with you. I had no idea some of my comments would cause this much discussion or maybe even angst (hopefully not!).

This codebase does need a ton of improvements all around and I need to withhold [at least some of my] hot takes (e.g. "set it on fire one lol") here as they're not productive. This discussion should probably belong elsewhere though.

FYI2: I'm not against changing the InitFunc - I'm sure we can find a way to make this work 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm here to push boundaries and shake up existing conventions in order to evolve the codebase. If existing conventions are so sacred that they cannot even be iterated upon in greenfield code, I do not see a future where I remain on as a distribution maintainer.

Please know that I am also intensely frustrated by the way this codebase does things. This project is old enough to have existed before a lot of Go best practices were solidified, and it shows.

Any changes to how code is shaped must be agreed upon by consensus

You need two approvals from two maintainers in different orgs, in your case that's yourself and one other maintainer.

I tend to be more cautious, and I think it's a good and healthy balance to have someone who wants to shake things up. We need to moderate each other in this way.

and must then be ratified by applying it to the entire codebase.

I have a solution below that might give us both consensus and the capacity to effect change.

The determination of whether a new convention is compelling enough must be made without having experience coding in the new convention. Experience in coding the new convention cannot be gained without coding in the new convention. New conventions are therefore de facto banned. Whatever conventions were grandfathered in are our coding conventions, now and forever.

We all have worked on other projects and seen different conventions applied there and what their results were.
We can reasonably determine if something is going to work out and then decide to commit to it.

If we want to adopt a new convention, we can start with a single PR. However, if there's no barrier to entry, then it's easy to have a bunch of unrelated incohesive practices sprinkled throughout the code. Part of being a maintainer on any project is thinking about the impact of changes in context of existing code.

What I think would help in resolving these conflicts is adopting or implementing a style guide. This way, we can be deliberate about what our practices are, and apply them consistently. This would allow us to stabilize our practices moving forward and justify and encourage creating PRs which make old code conform. Having a style guide should provide the consistency that I want to see, while providing you with the ability to evolve the codebase.

To move forward, I propose:

  • Bucking the convention of returning interfaces, right now, in this PR.
  • Creating a style guide with a single entry in support of this style, in this or a followup PR.
  • Creating an issue to discuss expanding the style guide, for example, if we want to write our own or adopt some existing guide and if so, which one.

registry/storage/cache/cache.go Outdated Show resolved Hide resolved
registry/storage/cache/redis/redis.go Outdated Show resolved Hide resolved
milosgajdos and others added 2 commits December 8, 2023 21:14
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
@milosgajdos milosgajdos changed the title feat: add cache provider implementation [WIP] feat: add cache provider implementation Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cache area/config Related to registry config
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants