Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc: move to TransientFailure in pick_first LB policy when all addresses are removed #5274

Merged
merged 5 commits into from Mar 29, 2022

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Mar 25, 2022

This behavior is in line with what other languages do and what we should have been doing all along.

Summary of changes:

  • In the pick_first LB policy implementation, if a resolver address removes previously provided addresses, delete the existing subConn and move channel to TransientFailure.
  • Get rid of TestEmptyAddrs, which is no longer required because we have tests in test/pickfirst_test.go and test/roundrobin_test.go for this scenario. Also, TestEmptyAddrs was broken anyway.
  • Add a test in test/pibkfirst_test.go to exercise this functionality.

RELEASE NOTES:

  • Change connectivity state to TransientFailure in pick_first LB policy when all addresses are removed

@easwars easwars added the Type: Behavior Change Behavior changes not categorized as bugs label Mar 25, 2022
@easwars easwars added this to the 1.46 Release milestone Mar 25, 2022
pickfirst.go Outdated Show resolved Hide resolved
pickfirst.go Outdated Show resolved Hide resolved
@easwars easwars removed their assignment Mar 25, 2022
pickfirst.go Show resolved Hide resolved
pickfirst.go Outdated
b.ResolverError(errors.New("produced zero addresses"))
return balancer.ErrBadResolverState
}
if b.sc == nil {

if b.subConn == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe invert, early return, and outdent?

if b.subConn != nil {
	b.cc.UpdateAddresses()
	b.subConn.Connect() // do we really need this?
	return
}
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

And yes, we don't have to call Connect here. UpdateAddresses does that for us if the current active connection is not in the new address list.

pickfirst.go Outdated
}
return nil
}

func (b *pickfirstBalancer) UpdateSubConnState(sc balancer.SubConn, s balancer.SubConnState) {
func (b *pickfirstBalancer) UpdateSubConnState(subConn balancer.SubConn, s balancer.SubConnState) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was OK with the parameter name being short, since the scope is more limited. But if you're changing that, let's also change s to scs or subConnState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed to state both here and in UpdateClientConnState.

pickfirst.go Outdated Show resolved Hide resolved
Comment on lines -686 to -688
// TestEmptyAddrs verifies client behavior when a working connection is
// removed. In pick first and round-robin, both will continue using the old
// connections.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure there's still a test for this set of circumstances, but that the RPCs begin failing instead of keep using old connections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I thought we confirmed RR was removing the subconns, so RPCs should have been failing? But this test is validating the RPCs would still succeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I thought we confirmed RR was removing the subconns, so RPCs should have been failing? But this test is validating the RPCs would still succeed?

Yes, this test was quite broken because it was using the pick_first clientConn to verify round_robin behaviour:

	// Confirm several new RPCs succeed on round robin.
	for i := 0; i < 10; i++ {
		if _, err := pfclient.UnaryCall(ctx, &testpb.SimpleRequest{}); err != nil {
			t.Fatalf("UnaryCall(_) = _, %v; want _, nil", err)
		}
		time.Sleep(5 * time.Millisecond)
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure there's still a test for this set of circumstances, but that the RPCs begin failing instead of keep using old connections.

Yes, the newly added test verifies that the channel moves to TransientFailure when addresses are removed and that a WaitForReady RPC blocks until new addresses are added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test was quite broken because it was using the pick_first clientConn to verify round_robin behaviour:

Oof

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the newly added test verifies that the channel moves to TransientFailure when addresses are removed and that a WaitForReady RPC blocks until new addresses are added.

But only with pick first -- is there one for RR too elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a couple of tests for round_robin that I recently cleaned up:
1
2

- clarify comments when pf receives no addresses
- skip calling `Connect` when the address list changes. UpdateAddresses
  takes care of this
- switch conditional to outindent more code
pickfirst.go Outdated
@@ -161,7 +163,7 @@ type picker struct {
err error
}

func (p *picker) Pick(info balancer.PickInfo) (balancer.PickResult, error) {
func (p *picker) Pick(_ balancer.PickInfo) (balancer.PickResult, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: delete parameter name entirely instead of using "_". And below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dfawley dfawley assigned easwars and unassigned dfawley Mar 29, 2022
@easwars easwars merged commit b6873c0 into grpc:master Mar 29, 2022
@easwars easwars deleted the pf_tf branch March 29, 2022 22:06
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants