Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ringhash: handle config updates properly #5557

Merged
merged 5 commits into from Aug 4, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
52 changes: 31 additions & 21 deletions xds/internal/balancer/ringhash/ringhash.go
Expand Up @@ -259,38 +259,48 @@ func (b *ringhashBalancer) updateAddresses(addrs []resolver.Address) bool {

func (b *ringhashBalancer) UpdateClientConnState(s balancer.ClientConnState) error {
b.logger.Infof("Received update from resolver, balancer config: %+v", pretty.ToJSON(s.BalancerConfig))
if b.config == nil {
newConfig, ok := s.BalancerConfig.(*LBConfig)
if !ok {
return fmt.Errorf("unexpected balancer config with type: %T", s.BalancerConfig)
}
b.config = newConfig
newConfig, ok := s.BalancerConfig.(*LBConfig)
if !ok {
return fmt.Errorf("unexpected balancer config with type: %T", s.BalancerConfig)
}

// Successful resolution; clear resolver error and ensure we return nil.
b.resolverErr = nil
// If resolver state contains no addresses, return an error so ClientConn
// will trigger re-resolve. Also records this as an resolver error, so when
// the overall state turns transient failure, the error message will have
// the zero address information.
if len(s.ResolverState.Addresses) == 0 {
b.ResolverError(errors.New("produced zero addresses"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. This happens before b.subConns gets written to. Thus, line 303 won't hit guaranteed. Say you get a first update with 3 addresses, you make 3 subconns. Then, the next UpdateClientConnState comes in with 0 addresses. Then, this conditional hits, the SubConn len is still 3, and doens't set to connectivity.TransientFailure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I moved it to be right before we go about constructing the ring. So, if we end up with zero addresses, we save the effort of making a new ring.

return balancer.ErrBadResolverState
}

// If the ring configuration has changed, we need to regenerate the ring and
// send a new picker.
var regenerateRing bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In correspondence to your comment on my Outlier Detection PR:
regenerateRing := b.updateAddresses(s.ResolverState.Addresses)
if b.config == nil || b.config.MinRingSize != newConfig.MinRingSize || b.config.MaxRingSize != newConfig.MaxRingSize {
regenerateRing = true
}
b.config = newConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

if b.config == nil || b.config.MinRingSize != newConfig.MinRingSize || b.config.MaxRingSize != newConfig.MaxRingSize {
regenerateRing = true
}
b.config = newConfig

// If addresses were updated, whether it resulted in SubConn
// creation/deletion, or just weight update, we need to regenerate the ring
// and send a new picker.
if b.updateAddresses(s.ResolverState.Addresses) {
// If addresses were updated, no matter whether it resulted in SubConn
// creation/deletion, or just weight update, we will need to regenerate
// the ring.
var err error
b.ring, err = newRing(b.subConns, b.config.MinRingSize, b.config.MaxRingSize)
regenerateRing = true
}

if regenerateRing {
ring, err := newRing(b.subConns, b.config.MinRingSize, b.config.MaxRingSize)
Copy link
Contributor

@zasweq zasweq Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes logic, and I don't know if this is correct. I still think it needs to write to b.ring. Otherwise it would persist a built ring map from the previous unrelated good update? Then subsequent ResolverError calls (line 318) and UpdateSubConnState() calls (line 369) call into regeneratePicker which reads b.ring, which would not be in the correct state and "synchronized" with the UpdateClientConnState that caused the SubConn error from the address list. Please triage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, when we were assigning to b.ring here and if err != nil, we were still calling ResolverError, but not sending a new picker. So, we would have still ended up using the old ring (from the old picker) to service RPCs. So, in that sense, i feel that the behavior has not changed. But either ways, I'm not sure if our existing behavior was right. So, I have posted on our chat room, let's see what comes out of it. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, based on the discussion in the chat room, I've change our ring creation routine newRing() to not return an error.

if err != nil {
b.ResolverError(fmt.Errorf("ringhash failed to make a new ring: %v", err))
return balancer.ErrBadResolverState
}
b.ring = ring
b.regeneratePicker()
b.cc.UpdateState(balancer.State{ConnectivityState: b.state, Picker: b.picker})
}

// If resolver state contains no addresses, return an error so ClientConn
// will trigger re-resolve. Also records this as an resolver error, so when
// the overall state turns transient failure, the error message will have
// the zero address information.
if len(s.ResolverState.Addresses) == 0 {
b.ResolverError(errors.New("produced zero addresses"))
return balancer.ErrBadResolverState
}
// Successful resolution; clear resolver error and return nil.
b.resolverErr = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't know if this breaks it, but this doesn't clear it if hits line 295, whereas in master it did. Although it feels appropriate here, but please triage if this breaks anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks totally fine to me.

return nil
}

Expand Down
39 changes: 35 additions & 4 deletions xds/internal/balancer/ringhash/ringhash_test.go
Expand Up @@ -108,6 +108,37 @@ func Test(t *testing.T) {
grpctest.RunSubTests(t, s{})
}

// TestUpdateClientConnState_NewRingSize tests the scenario where the ringhash
// LB policy receives new configuration which specifies new values for the ring
// min and max sizes. The test verifies that a new ring is created and a new
// picker is pushed on the channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we're not testing whether the picker is pushed on "the channel". We're testing whether the new picker gets forwarded to this balancer's client conn, which in this case sends on the channel for verification purposes. Would prefer rewording "is pushed on the channel" to "sent to the ClientConn".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

func (s) TestUpdateClientConnState_NewRingSize(t *testing.T) {
addrs := []resolver.Address{{Addr: testBackendAddrStrs[0]}}
cc, b, p0 := setupTest(t, addrs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I know it's p0 elsewhere, but for some reason p1 and p2 make more sense in my mind than p0 and p1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ring0 := p0.(*picker).ring
if ringSize := len(ring0.items); ringSize < 1 || ringSize > 10 {
t.Fatalf("Ring created with size %d, want between [%d, %d]", ringSize, 1, 10)
}

if err := b.UpdateClientConnState(balancer.ClientConnState{
ResolverState: resolver.State{Addresses: addrs},
BalancerConfig: &LBConfig{MinRingSize: 20, MaxRingSize: 100},
}); err != nil {
t.Fatalf("UpdateClientConnState returned err: %v", err)
}

var ring1 *ring
select {
case <-time.After(defaultTestTimeout):
t.Fatal("Timeout when waiting for a picker update after a configuration update")
case p1 := <-cc.NewPickerCh:
ring1 = p1.(*picker).ring
}
if ringSize := len(ring1.items); ringSize < 20 || ringSize > 100 {
t.Fatalf("Ring created with size %d, want between [%d, %d]", ringSize, 20, 100)
}
}

func (s) TestOneSubConn(t *testing.T) {
wantAddr1 := resolver.Address{Addr: testBackendAddrStrs[0]}
cc, b, p0 := setupTest(t, []resolver.Address{wantAddr1})
Expand Down Expand Up @@ -320,7 +351,7 @@ func (s) TestAddrWeightChange(t *testing.T) {

if err := b.UpdateClientConnState(balancer.ClientConnState{
ResolverState: resolver.State{Addresses: wantAddrs},
BalancerConfig: nil,
BalancerConfig: testConfig,
}); err != nil {
t.Fatalf("UpdateClientConnState returned err: %v", err)
}
Expand All @@ -336,7 +367,7 @@ func (s) TestAddrWeightChange(t *testing.T) {
{Addr: testBackendAddrStrs[0]},
{Addr: testBackendAddrStrs[1]},
}},
BalancerConfig: nil,
BalancerConfig: testConfig,
}); err != nil {
t.Fatalf("UpdateClientConnState returned err: %v", err)
}
Expand All @@ -359,7 +390,7 @@ func (s) TestAddrWeightChange(t *testing.T) {
resolver.Address{Addr: testBackendAddrStrs[1]},
weightedroundrobin.AddrInfo{Weight: 2}),
}},
BalancerConfig: nil,
BalancerConfig: testConfig,
}); err != nil {
t.Fatalf("UpdateClientConnState returned err: %v", err)
}
Expand Down Expand Up @@ -505,7 +536,7 @@ func (s) TestAddrBalancerAttributesChange(t *testing.T) {
addrs2 := []resolver.Address{internal.SetLocalityID(resolver.Address{Addr: testBackendAddrStrs[0]}, internal.LocalityID{Region: "americas"})}
if err := b.UpdateClientConnState(balancer.ClientConnState{
ResolverState: resolver.State{Addresses: addrs2},
BalancerConfig: nil,
BalancerConfig: testConfig,
}); err != nil {
t.Fatalf("UpdateClientConnState returned err: %v", err)
}
Expand Down