New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ringhash: handle config updates properly #5557
Changes from 1 commit
217e9ae
1814745
1a9db40
fbed058
6a45d8e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -259,38 +259,48 @@ func (b *ringhashBalancer) updateAddresses(addrs []resolver.Address) bool { | |
|
||
func (b *ringhashBalancer) UpdateClientConnState(s balancer.ClientConnState) error { | ||
b.logger.Infof("Received update from resolver, balancer config: %+v", pretty.ToJSON(s.BalancerConfig)) | ||
if b.config == nil { | ||
newConfig, ok := s.BalancerConfig.(*LBConfig) | ||
if !ok { | ||
return fmt.Errorf("unexpected balancer config with type: %T", s.BalancerConfig) | ||
} | ||
b.config = newConfig | ||
newConfig, ok := s.BalancerConfig.(*LBConfig) | ||
if !ok { | ||
return fmt.Errorf("unexpected balancer config with type: %T", s.BalancerConfig) | ||
} | ||
|
||
// Successful resolution; clear resolver error and ensure we return nil. | ||
b.resolverErr = nil | ||
// If resolver state contains no addresses, return an error so ClientConn | ||
// will trigger re-resolve. Also records this as an resolver error, so when | ||
// the overall state turns transient failure, the error message will have | ||
// the zero address information. | ||
if len(s.ResolverState.Addresses) == 0 { | ||
b.ResolverError(errors.New("produced zero addresses")) | ||
return balancer.ErrBadResolverState | ||
} | ||
|
||
// If the ring configuration has changed, we need to regenerate the ring and | ||
// send a new picker. | ||
var regenerateRing bool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In correspondence to your comment on my Outlier Detection PR: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Thanks. |
||
if b.config == nil || b.config.MinRingSize != newConfig.MinRingSize || b.config.MaxRingSize != newConfig.MaxRingSize { | ||
regenerateRing = true | ||
} | ||
b.config = newConfig | ||
|
||
// If addresses were updated, whether it resulted in SubConn | ||
// creation/deletion, or just weight update, we need to regenerate the ring | ||
// and send a new picker. | ||
if b.updateAddresses(s.ResolverState.Addresses) { | ||
// If addresses were updated, no matter whether it resulted in SubConn | ||
// creation/deletion, or just weight update, we will need to regenerate | ||
// the ring. | ||
var err error | ||
b.ring, err = newRing(b.subConns, b.config.MinRingSize, b.config.MaxRingSize) | ||
regenerateRing = true | ||
} | ||
|
||
if regenerateRing { | ||
ring, err := newRing(b.subConns, b.config.MinRingSize, b.config.MaxRingSize) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This changes logic, and I don't know if this is correct. I still think it needs to write to b.ring. Otherwise it would persist a built ring map from the previous unrelated good update? Then subsequent ResolverError calls (line 318) and UpdateSubConnState() calls (line 369) call into regeneratePicker which reads b.ring, which would not be in the correct state and "synchronized" with the UpdateClientConnState that caused the SubConn error from the address list. Please triage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Earlier, when we were assigning to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, based on the discussion in the chat room, I've change our ring creation routine |
||
if err != nil { | ||
b.ResolverError(fmt.Errorf("ringhash failed to make a new ring: %v", err)) | ||
return balancer.ErrBadResolverState | ||
} | ||
b.ring = ring | ||
b.regeneratePicker() | ||
b.cc.UpdateState(balancer.State{ConnectivityState: b.state, Picker: b.picker}) | ||
} | ||
|
||
// If resolver state contains no addresses, return an error so ClientConn | ||
// will trigger re-resolve. Also records this as an resolver error, so when | ||
// the overall state turns transient failure, the error message will have | ||
// the zero address information. | ||
if len(s.ResolverState.Addresses) == 0 { | ||
b.ResolverError(errors.New("produced zero addresses")) | ||
return balancer.ErrBadResolverState | ||
} | ||
// Successful resolution; clear resolver error and return nil. | ||
b.resolverErr = nil | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, I don't know if this breaks it, but this doesn't clear it if hits line 295, whereas in master it did. Although it feels appropriate here, but please triage if this breaks anything. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one looks totally fine to me. |
||
return nil | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -108,6 +108,37 @@ func Test(t *testing.T) { | |
grpctest.RunSubTests(t, s{}) | ||
} | ||
|
||
// TestUpdateClientConnState_NewRingSize tests the scenario where the ringhash | ||
// LB policy receives new configuration which specifies new values for the ring | ||
// min and max sizes. The test verifies that a new ring is created and a new | ||
// picker is pushed on the channel. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: we're not testing whether the picker is pushed on "the channel". We're testing whether the new picker gets forwarded to this balancer's client conn, which in this case sends on the channel for verification purposes. Would prefer rewording "is pushed on the channel" to "sent to the ClientConn". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
func (s) TestUpdateClientConnState_NewRingSize(t *testing.T) { | ||
addrs := []resolver.Address{{Addr: testBackendAddrStrs[0]}} | ||
cc, b, p0 := setupTest(t, addrs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: I know it's p0 elsewhere, but for some reason p1 and p2 make more sense in my mind than p0 and p1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
ring0 := p0.(*picker).ring | ||
if ringSize := len(ring0.items); ringSize < 1 || ringSize > 10 { | ||
t.Fatalf("Ring created with size %d, want between [%d, %d]", ringSize, 1, 10) | ||
} | ||
|
||
if err := b.UpdateClientConnState(balancer.ClientConnState{ | ||
ResolverState: resolver.State{Addresses: addrs}, | ||
BalancerConfig: &LBConfig{MinRingSize: 20, MaxRingSize: 100}, | ||
}); err != nil { | ||
t.Fatalf("UpdateClientConnState returned err: %v", err) | ||
} | ||
|
||
var ring1 *ring | ||
select { | ||
case <-time.After(defaultTestTimeout): | ||
t.Fatal("Timeout when waiting for a picker update after a configuration update") | ||
case p1 := <-cc.NewPickerCh: | ||
ring1 = p1.(*picker).ring | ||
} | ||
if ringSize := len(ring1.items); ringSize < 20 || ringSize > 100 { | ||
t.Fatalf("Ring created with size %d, want between [%d, %d]", ringSize, 20, 100) | ||
} | ||
} | ||
|
||
func (s) TestOneSubConn(t *testing.T) { | ||
wantAddr1 := resolver.Address{Addr: testBackendAddrStrs[0]} | ||
cc, b, p0 := setupTest(t, []resolver.Address{wantAddr1}) | ||
|
@@ -320,7 +351,7 @@ func (s) TestAddrWeightChange(t *testing.T) { | |
|
||
if err := b.UpdateClientConnState(balancer.ClientConnState{ | ||
ResolverState: resolver.State{Addresses: wantAddrs}, | ||
BalancerConfig: nil, | ||
BalancerConfig: testConfig, | ||
}); err != nil { | ||
t.Fatalf("UpdateClientConnState returned err: %v", err) | ||
} | ||
|
@@ -336,7 +367,7 @@ func (s) TestAddrWeightChange(t *testing.T) { | |
{Addr: testBackendAddrStrs[0]}, | ||
{Addr: testBackendAddrStrs[1]}, | ||
}}, | ||
BalancerConfig: nil, | ||
BalancerConfig: testConfig, | ||
}); err != nil { | ||
t.Fatalf("UpdateClientConnState returned err: %v", err) | ||
} | ||
|
@@ -359,7 +390,7 @@ func (s) TestAddrWeightChange(t *testing.T) { | |
resolver.Address{Addr: testBackendAddrStrs[1]}, | ||
weightedroundrobin.AddrInfo{Weight: 2}), | ||
}}, | ||
BalancerConfig: nil, | ||
BalancerConfig: testConfig, | ||
}); err != nil { | ||
t.Fatalf("UpdateClientConnState returned err: %v", err) | ||
} | ||
|
@@ -505,7 +536,7 @@ func (s) TestAddrBalancerAttributesChange(t *testing.T) { | |
addrs2 := []resolver.Address{internal.SetLocalityID(resolver.Address{Addr: testBackendAddrStrs[0]}, internal.LocalityID{Region: "americas"})} | ||
if err := b.UpdateClientConnState(balancer.ClientConnState{ | ||
ResolverState: resolver.State{Addresses: addrs2}, | ||
BalancerConfig: nil, | ||
BalancerConfig: testConfig, | ||
}); err != nil { | ||
t.Fatalf("UpdateClientConnState returned err: %v", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. This happens before b.subConns gets written to. Thus, line 303 won't hit guaranteed. Say you get a first update with 3 addresses, you make 3 subconns. Then, the next UpdateClientConnState comes in with 0 addresses. Then, this conditional hits, the SubConn len is still 3, and doens't set to connectivity.TransientFailure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I moved it to be right before we go about constructing the ring. So, if we end up with zero addresses, we save the effort of making a new ring.