Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Implement circuit breaking support. #4050

Merged
merged 32 commits into from Dec 8, 2020
Merged

xds: Implement circuit breaking support. #4050

merged 32 commits into from Dec 8, 2020

Conversation

GarrettGutierrez1
Copy link
Contributor

xds: Implement circuit breaking support.

@GarrettGutierrez1 GarrettGutierrez1 added this to the 1.34 Release milestone Nov 19, 2020
@GarrettGutierrez1 GarrettGutierrez1 added the Type: Feature New features or improvements in behavior label Nov 19, 2020
@@ -285,6 +285,15 @@ func (b *cdsBalancer) handleSecurityConfig(config *xdsclient.SecurityConfig) err
return nil
}

func getCircuitBreaking(update xdsclient.ClusterUpdate) (circuitBreaking bool, maxRequests uint32) {
for _, threshold := range update.Thresholds {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this for loop into xds client where we read the CDS response. So in ClusterUpdate, we only need the MaxRequests int.

I don't think we have plans to support the other priorities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this change. This logic was moved into the client package, when it creates the ClusterUpdate from the CDS response.

@@ -42,6 +42,10 @@ type EDSConfig struct {
// will be disabled. If set to the empty string, load reporting will
// be sent to the same server that we obtained CDS data from.
LrsLoadReportingServerName *string
// If circuit breaking is enabled.
CircuitBreaking bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this bool.

You can make MaxRequests a pointer, and nil would mean circuit breaking is unset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of this bool, MaxRequests is a pointer now.

@@ -389,6 +391,18 @@ func (edsImpl *edsBalancerImpl) handleSubConnStateChange(sc balancer.SubConn, s
}
}

// updateConfig handles changes to the circuit breaking configuration.
func (edsImpl *edsBalancerImpl) updateConfig(edsConfig *EDSConfig) {
if edsImpl.counter == nil || edsImpl.counter.ServiceName != edsConfig.EDSServiceName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's protect this with an env variable GRPC_XDS_EXPERIMENTAL_CIRCUIT_BREAKING. So if this is set to false, edsImpl.counter is never set (is always nil).

Add the env variable like this: https://github.com/grpc/grpc-go/blob/master/xds/internal/env/env.go#L41

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created the environment variable.

if edsImpl.counter == nil || edsImpl.counter.ServiceName != edsConfig.EDSServiceName {
edsImpl.counter = &client.ServiceRequestsCounter{ServiceName: edsConfig.EDSServiceName}
}
edsImpl.counter.UpdateService(edsConfig.CircuitBreaking, edsConfig.MaxRequests)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't update the service of this count. You should update the counter.

There should be a map[string]counter in the client package. And you set edsImpl.counter to the corresponding value. When service name changes, you read the new value, and set edsImpl.counter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

xds/internal/balancer/edsbalancer/eds_impl.go Show resolved Hide resolved
// If circuit breaking is enabled.
CircuitBreaking bool
// Max requests when circuit breaking.
MaxRequests uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a second thought, we might not need this field.

The reason we need this is to update the upper bound in the counter.
There are two approaches:

  1. keep this field, and store it inside the eds.dropPicker. So the global counter doesn't keep the upper bound, but each picker keeps it. And we update the picker when this value changes.
  2. remove this field (so there's no change to the EDSConfig). Inside the CDS balancer, when it receives the ClusterUpdate, it updates the upper of the corresponding counter.

Now I'm a bit in favor of 2. Maybe give it a try. (Keep them as separate commits though, in case we regret later.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented 1 in this branch and pushed it. I'll implement 2 in a separate branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/GarrettGutierrez1/grpc-go/tree/xds-circuit-breaking-no-edsconfig-change

This branch implements 2. I do think it is simpler (less files changed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it up, this branch now implements 2, and the following link implements 1.

https://github.com/GarrettGutierrez1/grpc-go/tree/xds-circuit-breaking-moved-max-requests-tracking


type servicesRequestsCounter struct {
mu sync.Mutex
services map[string]serviceInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global map[string]counter would work. Each dropPicker would only need the counter, not the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change, now the dropPicker gets the counter.

}

type serviceInfo struct {
circuitBreaking bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this bool.
If circuit breaking is not enabled, this struct wouldn't exist, the drop picker will see a nil counter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this bool.

if !ok {
return fmt.Errorf("service name %v not identified", c.ServiceName)
}
sInfo.numRequests++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to not lock and ++. Use atomic instead.
Lock adds too much contention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a check of numRequests (to make sure it isn't > maxRequests) and then after that it is updated using ++, can I still use an atomic if I am doing 2 dependent operations on it?

// circuitBreakersFromCluster extracts the circuit breakers configuration from
// the received cluster resource. Returns nil if no CircuitBreakers or no
// Thresholds in CircuitBreakers.
func circuitBreakersFromCluster(cluster *v3clusterpb.Cluster) ([]CircuitBreakerThreshold, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to return error, right. It's always nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

@@ -113,6 +113,8 @@ type edsBalancerImplInterface interface {
handleSubConnStateChange(sc balancer.SubConn, state connectivity.State)
// updateState handle a balancer state update from the priority.
updateState(priority priorityType, s balancer.State)
// updateServiceRequestsCounter handles an update to the service name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates the service requests counter to the one for the given service name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

@@ -131,6 +132,10 @@ func (f *fakeEDSBalancer) handleEDSResponse(edsResp xdsclient.EndpointsUpdate) {

func (f *fakeEDSBalancer) updateState(priority priorityType, s balancer.State) {}

func (f *fakeEDSBalancer) updateServiceRequestsCounter(serviceName string) {
f.serviceName.Send(serviceName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should that be verified? I'm guessing added to one of the waitFor functions below?

Yes. And call it to verify in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change. Wasn't 100% sure what test you are talking about, added the check to TestConfigChildPolicyUpdate because it is the only test that invokes the policy update and has a fakeEDSBalancer.

defer src.mu.Unlock()
c, ok := src.services[serviceName]
if !ok {
c = &ServiceRequestsCounter{ServiceName: serviceName, maxRequests: 1024}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a const for 1024, and use the const here and following.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

name: "does-not-exceed-max-requests",
maxRequests: 1024,
numRequests: 1024,
errorExpected: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace errorExpected with expectedSuccesses and expectedErrors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

},
}

func counterTestInit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Is this to reset the counters? If so, call it resetServiceRequestsCounter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

}

// SetMaxRequests updates the max requests for a service's counter.
func SetMaxRequests(serviceName string, maxRequests *uint32) *ServiceRequestsCounter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't need the return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 2 nits.

xds/internal/balancer/edsbalancer/eds_test.go Show resolved Hide resolved
t.Error("no error when error expected")
}
if loadedError != nil {
fmt.Println(loadedError.(error))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants