Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolver: add State fields to support error handling #2951

Merged
merged 14 commits into from Oct 4, 2019

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Aug 2, 2019

Fixes #2102, #2856, #2727


This change is Reviewable

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 20 files at r1.
Reviewable status: 19 of 20 files reviewed, 11 unresolved discussions (waiting on @dfawley and @menghanl)


balancer_conn_wrappers.go, line 90 at r1 (raw file):

type ccBalancerWrapper struct {
	cc               *ClientConn
	balancerMu       sync.Mutex

Comment that this is not to protect the balancer field, but to synchronize balancer calls.


balancer_conn_wrappers.go, line 170 at r1 (raw file):

func (ccb *ccBalancerWrapper) updateClientConnState(ccs *balancer.ClientConnState) error {
	cbn := ccb.cc.curBalancerName
	ccb.cc.mu.Unlock()

Delete and inline this function.


clientconn.go, line 568 at r1 (raw file):

	if err != nil {
		// May need to apply the initial service config

In case the resolver doesn't support service config, or doesn't provide the service config with the new addresses


clientconn.go, line 581 at r1 (raw file):

	var ret error
	if cc.dopts.disableServiceConfig || s.ServiceConfigGetter == nil {
		cc.maybeApplyDefaultServiceConfig(s.Addresses)

Are we going to apply the same config multiple times?


clientconn.go, line 850 at r1 (raw file):

}

func (cc *ClientConn) applyServiceConfig(sc *ServiceConfig, addrs []resolver.Address) {

Rename this function?


resolver_conn_wrapper.go, line 134 at r1 (raw file):

	ccr.polling = p
	go func() {
		for i := 0; true; i++ {

for i:=0; ; i++


resolver_conn_wrapper.go, line 138 at r1 (raw file):

			t := time.NewTimer(resolverBackoff.Backoff(i))
			select {
			case <-p:

This will leak when cc.Close()


resolver_conn_wrapper.go, line 160 at r1 (raw file):

	}
	ccr.curState = s
	ccr.poll(ccr.cc.updateResolverState(ccr.curState, nil) == balancer.ErrBadResolverState)

For readability

if updateResolverState() == ErrBadResolverState {
    ccr.startPoll()
} else {
    ccr.stopPool()
}

balancer/balancer.go, line 322 at r1 (raw file):

	// exponential backoff until a subsequent call to UpdateClientConnState
	// returns a nil error.  Any other errors are currently ignored.
	UpdateClientConnState(ClientConnState) error

What other error would this return? Do we always poll whenever this returns an error?


balancer/xds/xds.go, line 416 at r1 (raw file):

	case <-x.ctx.Done():
	}
	return nil

This is a TODO...


resolver/manual/manual.go, line 43 at r1 (raw file):

	// Fields actually belong to the resolver.
	CC             resolver.ClientConn

Add a method instead of exporting cc?

Copy link
Member Author

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 19 of 20 files reviewed, 11 unresolved discussions (waiting on @dfawley and @menghanl)


balancer_conn_wrappers.go, line 90 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Comment that this is not to protect the balancer field, but to synchronize balancer calls.

Done.


balancer_conn_wrappers.go, line 170 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Delete and inline this function.

I think I may have done something better. PTAL


clientconn.go, line 568 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

In case the resolver doesn't support service config, or doesn't provide the service config with the new addresses

Done.


clientconn.go, line 581 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Are we going to apply the same config multiple times?

Yes. That should be a nop if the same service config is used and the presence of balancer addresses doesn't change (otherwise we will have to swap the load balancer).


clientconn.go, line 850 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Rename this function?

Suggestions? "applyServiceConfigAndSwitchBalancersIfNecessary"?


resolver_conn_wrapper.go, line 134 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

for i:=0; ; i++

Done.


resolver_conn_wrapper.go, line 138 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

This will leak when cc.Close()

True. Fixed.


resolver_conn_wrapper.go, line 160 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

For readability

if updateResolverState() == ErrBadResolverState {
    ccr.startPoll()
} else {
    ccr.stopPool()
}

PTAL


balancer/balancer.go, line 322 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

What other error would this return? Do we always poll whenever this returns an error?

Unclear; I was leaving this open for other options. Maybe we'd want a way to trigger a single ResolveNow or to tear down the whole ClientConn or something. If we trigger this behavior on any error, we limit our future capabilities.

Another option: return a struct instead/in addition, and optionally have a boolean field in the struct to indicate the resolver's bad state.


balancer/xds/xds.go, line 416 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

This is a TODO...

Are you sure? xds can handle no addresses, and the LB config validation should fail if there is any necessary info missing.


resolver/manual/manual.go, line 43 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Add a method instead of exporting cc?

Exporting makes it more "manual". :)

We could also remove UpdateState. But this package is not experimental? Should it be considered inherently experimental in that you need to register it using an experimental API?

menghanl
menghanl previously approved these changes Aug 20, 2019
Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2.
Reviewable status: 19 of 20 files reviewed, 1 unresolved discussion (waiting on @dfawley)


clientconn.go, line 850 at r1 (raw file):

Previously, dfawley (Doug Fawley) wrote…

Suggestions? "applyServiceConfigAndSwitchBalancersIfNecessary"?

applyServiceConfigAndAddresses?
applyResolverState?


balancer/xds/xds.go, line 416 at r1 (raw file):

Previously, dfawley (Doug Fawley) wrote…

Are you sure? xds can handle no addresses, and the LB config validation should fail if there is any necessary info missing.

Right.

Copy link
Member Author

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 19 of 20 files reviewed, all discussions resolved (waiting on @dfawley)


clientconn.go, line 850 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

applyServiceConfigAndAddresses?
applyResolverState?

We don't actually apply the addresses here, though (i.e., update the balancer). We only set the service config and set the balancer. That unfortunately requires passing the address list to determine if we have backend addresses for grpclb. Otherwise, we ignore the address list.

Having updateResolverState and applyResolverState seems strange. applyServiceConfigAndBalancer? I'll go with that for now.

@dfawley
Copy link
Member Author

dfawley commented Sep 18, 2019

Added tests for ResolveNow polling due to resolver errors and balancer-initiated errors. PTAL

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 12 files at r3, 4 of 4 files at r4.
Reviewable status: 23 of 24 files reviewed, 3 unresolved discussions (waiting on @dfawley and @menghanl)


balancer_conn_wrappers_test.go, line 97 at r4 (raw file):

	cc, err := Dial(r.Scheme()+":///test.server", WithInsecure(), WithDefaultServiceConfig(fmt.Sprintf(`
{

Keep json in one line?


resolver_conn_wrapper_test.go, line 143 at r4 (raw file):

	}
	defer cc.Close()
	r.CC.ReportError(errors.New("res err"))

Do we still need this test even though we already have a similar one for balancer?


serviceconfig/serviceconfig.go, line 43 at r4 (raw file):

// Getter provides a service config or an error.
type Getter struct {

Naming bikeshedding.
ParseResult?

And I also think a struct with exported fields sounds better now...

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5.
Reviewable status: 23 of 24 files reviewed, 4 unresolved discussions (waiting on @dfawley and @menghanl)


balancer_conn_wrappers_test.go, line 78 at r5 (raw file):

		},
	}
	const balName = "BalancerErrorResolverPolling2"

Why 2? Is "BalancerErrorResolverPolling"used?


clientconn.go, line 592 at r5 (raw file):

			cc.applyServiceConfigAndBalancer(sc, s.Addresses)
		} else {
			if cc.balancerWrapper == nil {

What does this mean?

This is not an invalid service config, because error is nil (which means parsing didn't fail)
But it it an invalid service config because the type is wrong?

This should never happen, right?
And when this happens, should we consider the whole thing as invalid, and ignore them?

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 17 files at r6, 1 of 1 files at r7.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @dfawley)


clientconn.go, line 592 at r5 (raw file):

Previously, menghanl (Menghan Li) wrote…

What does this mean?

This is not an invalid service config, because error is nil (which means parsing didn't fail)
But it it an invalid service config because the type is wrong?

This should never happen, right?
And when this happens, should we consider the whole thing as invalid, and ignore them?

Actually this is for invalid service config.

This is the first service config/address pair, and the config is invalid. We don't know which balancer to create, even the default might be inappropriate.

However, we need something to fail the RPCs. We can do this with a constErrPicker that always errors on Pick().


serviceconfig/serviceconfig.go, line 44 at r7 (raw file):

// ParseResult contains a service config or an error.
type ParseResult struct {
	Config Config

Make it clear that only one of them will be non-nil.


serviceconfig/serviceconfig.go, line 50 at r7 (raw file):

// NewParseResult returns a ParseResult returning the provided parameter as
// either the Config or Err field, depending upon its type.
func NewParseResult(configOrError interface{}) *ParseResult {

Delete this function?
ClientConn is the only one calling it. And the return type is exported, so there's no way to force anyone to use this function.

Copy link
Member Author

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 21 of 24 files reviewed, 3 unresolved discussions (waiting on @menghanl)


balancer_conn_wrappers_test.go, line 78 at r5 (raw file):

Previously, menghanl (Menghan Li) wrote…

Why 2? Is "BalancerErrorResolverPolling"used?

Done


clientconn.go, line 592 at r5 (raw file):

Previously, menghanl (Menghan Li) wrote…

Actually this is for invalid service config.

This is the first service config/address pair, and the config is invalid. We don't know which balancer to create, even the default might be inappropriate.

However, we need something to fail the RPCs. We can do this with a constErrPicker that always errors on Pick().

Done.


serviceconfig/serviceconfig.go, line 43 at r4 (raw file):

Previously, menghanl (Menghan Li) wrote…

Naming bikeshedding.
ParseResult?

And I also think a struct with exported fields sounds better now...

Done


serviceconfig/serviceconfig.go, line 44 at r7 (raw file):

Previously, menghanl (Menghan Li) wrote…

Make it clear that only one of them will be non-nil.

Done.


serviceconfig/serviceconfig.go, line 50 at r7 (raw file):

Previously, menghanl (Menghan Li) wrote…

Delete this function?
ClientConn is the only one calling it. And the return type is exported, so there's no way to force anyone to use this function.

Done.

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r8.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dfawley)


clientconn.go, line 594 at r8 (raw file):

			ret = balancer.ErrBadResolverState
			if cc.balancerWrapper == nil {
				cc.blockingpicker.updatePicker(base.NewErrPicker(status.Errorf(codes.Unavailable, "error parsing service config: %v", s.ServiceConfig.Err)))

s.ServiceConfig.Err can be nil here (if ok is false).

But ok should never be false?


clientconn.go, line 594 at r8 (raw file):

			ret = balancer.ErrBadResolverState
			if cc.balancerWrapper == nil {
				cc.blockingpicker.updatePicker(base.NewErrPicker(status.Errorf(codes.Unavailable, "error parsing service config: %v", s.ServiceConfig.Err)))

And add a test checking the returned error code?

Copy link
Member Author

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @menghanl)


clientconn.go, line 594 at r8 (raw file):

Previously, menghanl (Menghan Li) wrote…

s.ServiceConfig.Err can be nil here (if ok is false).

But ok should never be false?

Should, but I guess you could put any old garbage into ServiceConfig.Config.


clientconn.go, line 594 at r8 (raw file):

Previously, menghanl (Menghan Li) wrote…

And add a test checking the returned error code?

Done. Caught a bug as well, firstResolveEvent is only fired after contacting the balancer, and this path does not contact a balancer. It may be better to inject a balancer instead of just a picker for this (kind of) reason.

@dfawley
Copy link
Member Author

dfawley commented Oct 4, 2019

Updated to fire firstResolveEvent in a more obvious location. Not sure why it was moved in an older change anymore (we believe the logic behind moving it was flawed), and all tests are now passing.

@dfawley
Copy link
Member Author

dfawley commented Oct 4, 2019

New conflict is trivial: something added to internal.go in the same place as this PR added something else. I'll push -f after LGTY.

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r9, 2 of 2 files at r10.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dfawley dfawley merged commit ed563a0 into grpc:master Oct 4, 2019
@dfawley dfawley deleted the resolver_error branch October 4, 2019 19:59
@lock lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a method for resolver to report error
2 participants