Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Outlier Detection configuration in Cluster Resolver Balancer #5371

Merged
merged 6 commits into from Jun 17, 2022

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented May 19, 2022

This PR adds branching logic for constructing the Priority LB's config. In the case that Outlier Detection is configured on with the corresponding Environment variable, the top level balancer's configuration for each priority will be of type Outlier Detection with a Cluster Impl child rather than a Cluster Impl. This had to include the Outlier Detection's balancer to pass Cluster Resolver tests, as the Outlier Detection policy needed to be registered.

Branched off #5370.

RELEASE NOTES: None

@zasweq zasweq added this to the 1.47 Release milestone May 19, 2022
@zasweq zasweq force-pushed the od-config-cluster-resolver branch from 9c7e762 to f01bc93 Compare May 23, 2022 19:24
@zasweq zasweq modified the milestones: 1.47 Release, 1.48 Release May 23, 2022
@zasweq zasweq requested a review from dfawley May 24, 2022 17:32
@dfawley dfawley assigned zasweq and unassigned dfawley May 31, 2022
@dfawley
Copy link
Member

dfawley commented May 31, 2022

Please resolve the conflict and reassign, thanks!

@zasweq zasweq force-pushed the od-config-cluster-resolver branch from f01bc93 to 72138c7 Compare May 31, 2022 21:39
type outlierDetectionBalancer struct {
}

func (b *outlierDetectionBalancer) UpdateClientConnState(s balancer.ClientConnState) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return nil from Build instead of having this type.

Or don't implement balancer.Builder fully yet and call bb.ParseConfig directly in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the second option, it broke since this needs to be registered in config validation at a higher level. Thus, I kept implementing balancer.Builder and returned nil from build instead of the type, and deleted this type.

Comment on lines 44 to 51
bb := balancer.Get(Name)
if bb == nil {
t.Fatalf("balancer.Get(%q) returned nil", Name)
}
parser, ok := bb.(balancer.ConfigParser)
if !ok {
t.Fatalf("balancer %q does not implement the ConfigParser interface", Name)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the above: parser := bb{}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Switched.


tests := []struct {
name string
input json.RawMessage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this a string so you can remove all the json.RawMessage literals (cast before passing to ParseConfig).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

name string
input json.RawMessage
wantCfg serviceconfig.LoadBalancingConfig
wantErr bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a boolean, let's check for a string to be a substring of the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This actually lead me to discover that I was actually getting a parsing error not a validation error (see the 100 below, it shouldn't error - and it's also the wrong field name haha).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, another thing I noticed is that I never check if there is a child/the child is a clusterimpl/the child config is valid. These are assumptions I made for my od balancer, which if broken would cause nil panic. What should I do about this? Should I add a validation? Defensively program in my Outlier Detection PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't ever be called otherwise since we prepare the config ourselves but still.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for child being there (validating child was already there implicitly from unmarshalJSON if present)

Comment on lines 137 to +138
retConfig.Priorities = append(retConfig.Priorities, names...)
retAddrs = append(retAddrs, addrs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be moved up above the if envconfig, and the two if envconfigs merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great catch. Switched.

Comment on lines 164 to +165
retConfig.Priorities = append(retConfig.Priorities, name)
retAddrs = append(retAddrs, addrs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to one if as well.

}
}
return retConfig, retAddrs, nil
}

func convertClusterImplMapToOutlierDetection(ciCfgs map[string]*clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) map[string]*outlierdetection.LBConfig {
odCfgs := make(map[string]*outlierdetection.LBConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make(..., len(ciCfgs))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched. Why is this faster/better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-allocating is faster so it doesn't need to grow as you add entries if you go over the default initial size. Or if you have fewer entries than the default initial size, then you save memory.


func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass *clusterimpl.LBConfig then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get how whether clusterimpl.LBConfig relates to whether this panics or not. It panics on odCfgReturn.ChildPolicy. I needed to make the function arguments value types to make new memory and not write over heap memory used elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see how it panics, callsite at 161. Pointer to nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Let me know if you like solution.

Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments boss man :D!

Comment on lines 137 to +138
retConfig.Priorities = append(retConfig.Priorities, names...)
retAddrs = append(retAddrs, addrs...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great catch. Switched.

Comment on lines 164 to +165
retConfig.Priorities = append(retConfig.Priorities, name)
retAddrs = append(retAddrs, addrs...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to one if as well.

}
}
return retConfig, retAddrs, nil
}

func convertClusterImplMapToOutlierDetection(ciCfgs map[string]*clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) map[string]*outlierdetection.LBConfig {
odCfgs := make(map[string]*outlierdetection.LBConfig)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched. Why is this faster/better?


func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get how whether clusterimpl.LBConfig relates to whether this panics or not. It panics on odCfgReturn.ChildPolicy. I needed to make the function arguments value types to make new memory and not write over heap memory used elsewhere.


tests := []struct {
name string
input json.RawMessage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

name string
input json.RawMessage
wantCfg serviceconfig.LoadBalancingConfig
wantErr bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This actually lead me to discover that I was actually getting a parsing error not a validation error (see the 100 below, it shouldn't error - and it's also the wrong field name haha).

name string
input json.RawMessage
wantCfg serviceconfig.LoadBalancingConfig
wantErr bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, another thing I noticed is that I never check if there is a child/the child is a clusterimpl/the child config is valid. These are assumptions I made for my od balancer, which if broken would cause nil panic. What should I do about this? Should I add a validation? Defensively program in my Outlier Detection PR?

Comment on lines 44 to 51
bb := balancer.Get(Name)
if bb == nil {
t.Fatalf("balancer.Get(%q) returned nil", Name)
}
parser, ok := bb.(balancer.ConfigParser)
if !ok {
t.Fatalf("balancer %q does not implement the ConfigParser interface", Name)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Switched.

type outlierDetectionBalancer struct {
}

func (b *outlierDetectionBalancer) UpdateClientConnState(s balancer.ClientConnState) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the second option, it broke since this needs to be registered in config validation at a higher level. Thus, I kept implementing balancer.Builder and returned nil from build instead of the type, and deleted this type.

name string
input json.RawMessage
wantCfg serviceconfig.LoadBalancingConfig
wantErr bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't ever be called otherwise since we prepare the config ourselves but still.

@zasweq zasweq assigned dfawley and unassigned zasweq Jun 6, 2022
@dfawley dfawley assigned zasweq and unassigned dfawley Jun 7, 2022
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D!


func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see how it panics, callsite at 161. Pointer to nil.


func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Let me know if you like solution.

name string
input json.RawMessage
wantCfg serviceconfig.LoadBalancingConfig
wantErr bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for child being there (validating child was already there implicitly from unmarshalJSON if present)

@zasweq zasweq assigned dfawley and unassigned zasweq Jun 7, 2022
odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
return &odCfgRet
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass ciCfg by pointer? Generally structs should be passed by pointer unless there's a good reason not to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: go/go-style/decisions#pass-values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to passing by pointer.

@@ -90,6 +91,11 @@ func (bb) ParseConfig(s json.RawMessage) (serviceconfig.LoadBalancingConfig, err
if lbCfg.FailurePercentageEjection != nil && lbCfg.FailurePercentageEjection.EnforcementPercentage > 100 {
return nil, fmt.Errorf("LBConfig.FailurePercentageEjection.EnforcementPercentage = %v; must be <= 100", lbCfg.FailurePercentageEjection.EnforcementPercentage)
}

if lbCfg.ChildPolicy == nil {
return nil, errors.New("LBConfig.ChildPolicy needs to be present")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/needs to/must/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := &outlierdetection.LBConfig{}
if odCfg != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid for this to be nil? Why? For the JSON config, can we make the field a value instead of a pointer so it's never nil? Then pass by value to avoid the explicit copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not valid. This config is prepared by the cluster resolver, so we're guaranteed for this not to be nil, but this handles the nil panic I discussed earlier with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the discovery mechanism hold onto a value, and pass it by value around. I kept this function returning a pointer though, as that is the value of internalserviceconfig.BalancerConfig that I populate.

@dfawley dfawley assigned zasweq and unassigned dfawley Jun 8, 2022
odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
return &odCfgRet
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: go/go-style/decisions#pass-values

xds/internal/balancer/outlierdetection/balancer_test.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! I'll wait to rebase my full PR since I don't want to force push in case one of you is in the middle of a review on the full one. Maybe I should wait until this is merged to rebase. The balancer is all orthogonal.

xds/internal/balancer/outlierdetection/balancer.go Outdated Show resolved Hide resolved
xds/internal/balancer/outlierdetection/balancer.go Outdated Show resolved Hide resolved
@@ -90,6 +91,11 @@ func (bb) ParseConfig(s json.RawMessage) (serviceconfig.LoadBalancingConfig, err
if lbCfg.FailurePercentageEjection != nil && lbCfg.FailurePercentageEjection.EnforcementPercentage > 100 {
return nil, fmt.Errorf("LBConfig.FailurePercentageEjection.EnforcementPercentage = %v; must be <= 100", lbCfg.FailurePercentageEjection.EnforcementPercentage)
}

if lbCfg.ChildPolicy == nil {
return nil, errors.New("LBConfig.ChildPolicy needs to be present")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

xds/internal/balancer/outlierdetection/balancer.go Outdated Show resolved Hide resolved
xds/internal/balancer/outlierdetection/balancer.go Outdated Show resolved Hide resolved

func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := &outlierdetection.LBConfig{}
if odCfg != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not valid. This config is prepared by the cluster resolver, so we're guaranteed for this not to be nil, but this handles the nil panic I discussed earlier with you.

odCfgRet := odCfg
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want.
return &odCfgRet
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to passing by pointer.


func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig {
odCfgRet := &outlierdetection.LBConfig{}
if odCfg != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the discovery mechanism hold onto a value, and pass it by value around. I kept this function returning a pointer though, as that is the value of internalserviceconfig.BalancerConfig that I populate.

@zasweq zasweq assigned easwars and dfawley and unassigned zasweq Jun 9, 2022
return err
}
gotCCS := ccs.(balancer.ClientConnState)
if diff := cmp.Diff(gotCCS, wantCCS, cmpopts.IgnoreFields(resolver.State{}, "Addresses", "ServiceConfig", "Attributes")); diff != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work instead?

if diff := cmp.Diff(gotCCS, wantCCS, cmpopts.IgnoreFields(balancer.ClientConnState{}, "ResolverState")); diff != "" {
...
}

Also, why do we have to ignore the resolver state part of the update? Just so that we dont have to specify it in the wantCCS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that works :). I just commented this out to test since I forgot since it's been so long, and yeah super verbose resolver state I would have to match in wantCCS.

@easwars easwars assigned zasweq and unassigned easwars and dfawley Jun 15, 2022
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment :D!

return err
}
gotCCS := ccs.(balancer.ClientConnState)
if diff := cmp.Diff(gotCCS, wantCCS, cmpopts.IgnoreFields(resolver.State{}, "Addresses", "ServiceConfig", "Attributes")); diff != "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that works :). I just commented this out to test since I forgot since it's been so long, and yeah super verbose resolver state I would have to match in wantCCS.

@zasweq zasweq merged commit 29d9970 into grpc:master Jun 17, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants