Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds/resolver: Add support for cluster specifier plugins #4987

Merged
merged 7 commits into from Dec 6, 2021

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Nov 15, 2021

This PR adds support for cluster specifier specifier plugins in the xds resolver/config selector, as per RLS in xDS design. The functionality still left to implement is the RLS Cluster Specifier Plugin.

RELEASE NOTES: None

@@ -121,7 +133,11 @@ type routeCluster struct {

type route struct {
m *xdsresource.CompositeMatcher // converted from route matchers

// Exactly one of clusterSpecifierPlugin or clusters will be set.
clusterSpecifierPlugin string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put the CSP name into clusters instead so we don't need another field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what your saying (persist as little state as possible), but I much prefer keeping this as a separate field. Adding csp to clusters would conflate an individual csp name to a WRR type which can hold onto many cluster names and chooses one randomly. This keeps the branching logic (two fields in state, two separate ways of persistence/handling), which started from the xdsclient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about "persisting state", it's about reducing & simplifying code and data structures. We use the same code paths already for when WRR is used vs. when a single cluster is specified directly, so why not fold CSP into it as well? In gRPC, they are equivalent: CSPs are the same as ordinary clusters from the name resolver and cluster manager LB policy's perspectives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the branching in newConfigSelector (for prefix + hardcoded weight 1 logic, which was in unmarshal_rds for singular cluster)...but now it's one codepath in SelectConfig/one piece of state clusters.

xds/internal/resolver/watch_service.go Show resolved Hide resolved
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D!

@@ -121,7 +133,11 @@ type routeCluster struct {

type route struct {
m *xdsresource.CompositeMatcher // converted from route matchers

// Exactly one of clusterSpecifierPlugin or clusters will be set.
clusterSpecifierPlugin string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what your saying (persist as little state as possible), but I much prefer keeping this as a separate field. Adding csp to clusters would conflate an individual csp name to a WRR type which can hold onto many cluster names and chooses one randomly. This keeps the branching logic (two fields in state, two separate ways of persistence/handling), which started from the xdsclient.

xds/internal/resolver/watch_service.go Show resolved Hide resolved
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! I like the simplicity push. Easwar mentioned you are good at "finding simple solutions to problems".

@@ -121,7 +133,11 @@ type routeCluster struct {

type route struct {
m *xdsresource.CompositeMatcher // converted from route matchers

// Exactly one of clusterSpecifierPlugin or clusters will be set.
clusterSpecifierPlugin string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the branching in newConfigSelector (for prefix + hardcoded weight 1 logic, which was in unmarshal_rds for singular cluster)...but now it's one codepath in SelectConfig/one piece of state clusters.

xds/internal/resolver/watch_service.go Show resolved Hide resolved
if rt.ClusterSpecifierPlugin != "" {
clusters.Add(&routeCluster{
name: "cluster:" + rt.ClusterSpecifierPlugin,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "cluster_specifier_plugin:" + rt.ClusterSpecifierPlugin?

Please use a local to hold that instead of repeating it 4 times. Also, global consts for "cluster:" and "cluster_specifier_plugin:" would be a good idea to avoid any chance of a typo in one usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. Added global consts and also a local var for both branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the comment isn't showing up on this PR, but I refactored the cluster initialization into another function. It made it much cleaner.

Comment on lines 380 to 385
ci := r.activeClusters["cluster_specifier_plugin:" + rt.ClusterSpecifierPlugin]
if ci == nil {
ci = &clusterInfo{refCount: 0}
r.activeClusters["cluster_specifier_plugin:" + rt.ClusterSpecifierPlugin] = ci
}
cs.clusters["cluster_specifier_plugin:" + rt.ClusterSpecifierPlugin] = ci
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this bit be shared with the below section? Maybe factored into another function?

Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D!

if rt.ClusterSpecifierPlugin != "" {
clusters.Add(&routeCluster{
name: "cluster:" + rt.ClusterSpecifierPlugin,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. Added global consts and also a local var for both branches.

if rt.ClusterSpecifierPlugin != "" {
clusters.Add(&routeCluster{
name: "cluster:" + rt.ClusterSpecifierPlugin,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the comment isn't showing up on this PR, but I refactored the cluster initialization into another function. It made it much cleaner.

@zasweq zasweq added this to the 1.43 release milestone Nov 17, 2021
@zasweq zasweq changed the title Draft: Added support for cluster specifier plugins in xds resolver xds: Added support for cluster specifier plugins in xds resolver Nov 17, 2021
@@ -179,7 +181,7 @@ func (b *cdsBalancer) handleClientConnUpdate(update *ccUpdate) {
b.handleErrorFromUpdate(err, true)
return
}
b.clusterHandler.updateRootCluster(update.clusterName)
b.clusterHandler.updateRootCluster(strings.TrimPrefix(update.clusterName, clusterPrefix))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be stripped by the xds cluster manager LB policy instead? That is what I was expecting we'd do... I'm not sure how other languages are implementing this, either, since it apparently wasn't specified in the design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person, decided it would be best it to strip this in the name resolver, while constructing the Cluster Manager LB Config itself. (kept the cluster: prefix for the name of the cluster, but stripped it for the underlying cds policy)

ChildPolicy: newBalancerConfig(cdsName, cdsBalancerConfig{Cluster: cluster}),
// Look into cluster specifier plugins, which hasn't had any prefix attached to it's cluster specifier plugin names,
// to determine the LB Config if the cluster is a CSP.
cspCfg, ok := clusterSpecifierPlugins[strings.TrimPrefix(cluster, clusterSpecifierPluginPrefix)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... if a CSP is called "cluster:foo" and a cluster named "foo" exists, then "cluster:foo" will be found in the CSP map, even though that isn't what was intended.

Should we be looking at whether the cluster from activeClusters's key StartsWith clusterSpecifierPluginPrefix instead here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply persisted the config in clusterInfo, which solves this correctness issue.

// Look into cluster specifier plugins, which hasn't had any prefix attached to it's cluster specifier plugin names,
// to determine the LB Config if the cluster is a CSP.
cspCfg, ok := clusterSpecifierPlugins[strings.TrimPrefix(cluster, clusterSpecifierPluginPrefix)]
if ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we handling this case correctly (I think we're not):

  1. First xDS update contains a cluster specifier plugin for RLS
  2. RPC starts that selects that CSP
  3. New xDS update removes that CSP
  4. When generating this JSON here, we need to maintain the CSP config data and keep the "cluster" it references around until (2) is done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I found a solution that solves this problem and the comment about name collisions on the branch in serviceConfigJSON. I simply added the csp balancer configuration (if the cluster is a csp) to the cluster info. This reuses all the plumbing around active clusters, and also solves the correctness issue on the branch of the cluster specifier, as you can branch on clusterInfo.cspCfg != nil.

@dfawley dfawley assigned zasweq and unassigned dfawley Nov 23, 2021
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D! Adding test case for the situation outlined by you.

// Look into cluster specifier plugins, which hasn't had any prefix attached to it's cluster specifier plugin names,
// to determine the LB Config if the cluster is a CSP.
cspCfg, ok := clusterSpecifierPlugins[strings.TrimPrefix(cluster, clusterSpecifierPluginPrefix)]
if ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I found a solution that solves this problem and the comment about name collisions on the branch in serviceConfigJSON. I simply added the csp balancer configuration (if the cluster is a csp) to the cluster info. This reuses all the plumbing around active clusters, and also solves the correctness issue on the branch of the cluster specifier, as you can branch on clusterInfo.cspCfg != nil.

ChildPolicy: newBalancerConfig(cdsName, cdsBalancerConfig{Cluster: cluster}),
// Look into cluster specifier plugins, which hasn't had any prefix attached to it's cluster specifier plugin names,
// to determine the LB Config if the cluster is a CSP.
cspCfg, ok := clusterSpecifierPlugins[strings.TrimPrefix(cluster, clusterSpecifierPluginPrefix)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply persisted the config in clusterInfo, which solves this correctness issue.

@zasweq zasweq assigned dfawley and unassigned zasweq Nov 23, 2021
@dfawley dfawley changed the title xds: Added support for cluster specifier plugins in xds resolver xds/resolver: Add support for cluster specifier plugins Nov 24, 2021
@dfawley
Copy link
Contributor

dfawley commented Nov 24, 2021

As mentioned offline, this is blocked on a decision of how to handle CSP config changes.

}
} else {
children[cluster] = xdsChildConfig{
ChildPolicy: newBalancerConfig(cdsName, cdsBalancerConfig{Cluster: strings.TrimPrefix(cluster, clusterPrefix)}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we store this in the activeClusters map, too, instead of creating it here? That would give us a more unified flow through here. I.e.:

type clusterInfo struct {
	...
	cfg	balancerConfig  // either the csp config or the cds cluster config
}

And this loop becomes simply:

for cluster, ci := range activeClusters {
	children[cluster] = xdsChildConfig{ChildPolicy: ci.cfg}
}

(Or store an xdsChildConfig in clusterInfo, even.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this for a while, chose to persist the whole child config.

// initializeCluster initializes entries in cs.clusters map, creating entries in
// r.activeClusters as necessary. Any created entries will be set to zero as
// they will be incremented by incRefs.
func (r *xdsResolver) initializeCluster(clusterName string, cs *configSelector) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wants to be a method on configSelector instead. It can access the xdsResolver that way, too for adding to activeClusters. You can also pass it a balancerConfig to store in the clusterInfo as mentioned above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great idea. These two comments make it much cleaner. Switched.

@@ -35,6 +36,9 @@ import (
type serviceUpdate struct {
// virtualHost contains routes and other configuration to route RPCs.
virtualHost *xdsresource.VirtualHost
// clusterSpecifierPlugins contain the configurations for any cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: containS (the ...Plugins map is a singular)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, switched.

if !reflect.DeepEqual(picks, want) {
t.Errorf("picked clusters = %v; want %v", picks, want)
}
}

func init() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this & related tests to a new file? This one is already too big.

Maybe cluster_specifier_plugin_test.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

ArbitraryField string `json:"arbitrary_field"`
}

func TestXDSResolverClusterSpecifierPlugin(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick 1-liner description of what this tests would be helpful. E.g. // Tests that cluster specifier plugins produce the correct service config. or something SG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment with what you had and also with explaining picking a cluster as well.

t.Fatal("want: ", cmp.Diff(nil, wantSCParsed3.Config))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add another test that covers updates to a CSP configuration - it should change in the output service config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. I'm assuming this is for the the missing functionality for the graceful switch with the same LB policy.

"cluster_specifier_plugin:cspA":{
"childPolicy":[{"csp_experimental":{"arbitrary_field":"anythingA"}}]
},
"cluster_specifier_plugin:cspB":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting here looks slightly off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the other json with multiple children in xds_resolver_test.go, and it looks the same? In what way does this look off? Leaving as is for now.

wantJSON3 := `{"loadBalancingConfig":[{
"xds_cluster_manager_experimental":{
"children":{
"cluster_specifier_plugin:cspB":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

@dfawley dfawley assigned zasweq and unassigned dfawley Nov 30, 2021
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D!

@@ -35,6 +36,9 @@ import (
type serviceUpdate struct {
// virtualHost contains routes and other configuration to route RPCs.
virtualHost *xdsresource.VirtualHost
// clusterSpecifierPlugins contain the configurations for any cluster
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, switched.

ArbitraryField string `json:"arbitrary_field"`
}

func TestXDSResolverClusterSpecifierPlugin(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment with what you had and also with explaining picking a cluster as well.

if !reflect.DeepEqual(picks, want) {
t.Errorf("picked clusters = %v; want %v", picks, want)
}
}

func init() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

"cluster_specifier_plugin:cspA":{
"childPolicy":[{"csp_experimental":{"arbitrary_field":"anythingA"}}]
},
"cluster_specifier_plugin:cspB":{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the other json with multiple children in xds_resolver_test.go, and it looks the same? In what way does this look off? Leaving as is for now.

wantJSON3 := `{"loadBalancingConfig":[{
"xds_cluster_manager_experimental":{
"children":{
"cluster_specifier_plugin:cspB":{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

t.Fatal("want: ", cmp.Diff(nil, wantSCParsed3.Config))
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. I'm assuming this is for the the missing functionality for the graceful switch with the same LB policy.

}
} else {
children[cluster] = xdsChildConfig{
ChildPolicy: newBalancerConfig(cdsName, cdsBalancerConfig{Cluster: strings.TrimPrefix(cluster, clusterPrefix)}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this for a while, chose to persist the whole child config.

// initializeCluster initializes entries in cs.clusters map, creating entries in
// r.activeClusters as necessary. Any created entries will be set to zero as
// they will be incremented by incRefs.
func (r *xdsResolver) initializeCluster(clusterName string, cs *configSelector) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great idea. These two comments make it much cleaner. Switched.

@zasweq zasweq assigned dfawley and unassigned zasweq Nov 30, 2021
Copy link
Contributor

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the formatting nit.

"cluster_specifier_plugin:cspA":{
"childPolicy":[{"csp_experimental":{"arbitrary_field":"anythingA"}}]
},
"cluster_specifier_plugin:cspB":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New comment from old comment thread, since the code was moved:

This should line up with "cluster_specifier_plugin:cspA" (outdent 1 tab). And the same situation below @ line 356.

It's not a correctness thing, it's just formatting.

@dfawley dfawley assigned zasweq and unassigned dfawley Dec 3, 2021
@zasweq zasweq merged commit 3786ae1 into grpc:master Dec 6, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants