default client balancer only returns one address #1694

trusch · 2017-11-28T08:28:59Z

What version of gRPC are you using?

glide version: ^1.7.2
ref: 5a9f7b4

What version of Go are you using (`go version`)?

1.9.1

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

I created a service in docker swarm which serves gRPC requests with endpoint mode dnsrr (so the DNS returns multiple A records for that service).
Another service inside swarm calls this.
Dialing looks like this:

conn, err := grpc.Dial(target, grpc.WithCredentials(...))
if err != nil {
	return nil, err
}
client := btrfaasgrpc.NewFunctionRunnerClient(conn) // project specific

This client is then reused to serve rpc invocations.

What did you expect to see?

The calls should be dispatched round-robin to all available replicas of the target service out of the box as documented in the go-docs (round-robin must not be registered, because its the default)

What did you see instead?

Only the first replica is used to serve the requests.

Additional Notes

when using WithBalancer(balancer.RoundRobin(resolver.NewDNSResolver())) it gives me an error that no addresses are available

Do I need to setup loadbalancing manually for the moment?

The text was updated successfully, but these errors were encountered:

menghanl · 2017-11-28T19:44:04Z

Did you look at the new balancer package? Or the v1 balancer in grpc package?

Can you try to use the new balancer and resolver

rr := balancer.Get("round_robin")
grpc.Dial("dns:///your.target.name", // "dns:///" specifies the resolver to use
    grpc.WithCredentials(...),
    grpc.WithBalancerBuilder(rr), // use round_robin balancer
)

Not that WithBalancerBuilder is for testing only. I'm planning to add a dial option to set the balancer (#1697).

trusch · 2017-11-29T06:39:18Z

Thanks for that hint @menghanl ! This seems to work now, but unfortunately it doesn't seem to query the DNS server that often. I saw the requery frequency in another package set to 30 minutes. Is this configurable?

edit:
It was not another package, It is the requery frequency of the dns resolver and it doesnt seem to be configurable. I think it would be usefull to make this configurable. I know that frequent polling is generally a bad idea, but i'm currently in the comfortable position of building a complete stateless system and I would not like to introduce something big like etcd or zookeeper into my stack, just for loadbalancing.

Perhaps a WithResolveNowInterval(time.Duration) option and a independent goroutine in the ClientConn which calls ResolveNow() in a loop when this optin is set?

I could write that If it would help. I think it should be a very good and small task to get started ;)

Please let me know if I can help @menghanl

menghanl · 2017-11-29T19:48:09Z

The resolve interval is decided by each resolver implementation. There are resolvers that do pushing instead of polling.
So a WithResolveNowInterval(time.Duration) DialOption doesn't look like a good idea IMO.
A possible solution would be to create another DNS resolver with a custom resolve interval, as I mentioned in #1663 (comment).

From your comment in #1388, you mentioned dead connections will still be retried. This can be solved by #1679. The resolver will re-resolve whenever a connection is down. If the dead server was removed in DNS, the re-resolve will notice that and will remove it from ClientConn.

MAX_CONNECTION_AGE plus #1679 would also cause the resolver to re-resolve and discover new servers.

Let me know what you think about this solution.

trusch · 2017-12-01T09:04:43Z

I tried the MAX_CONNECTION_AGE plus #1679 approach but It doesn't trigger the re-resolving. I dont know if perhaps the MAX_CONNECTION_AGE parameter in the server keep alive parameters of the workers is not respected, or if the resolver is not invoked when the connection closes normally (without error). I could imagine that this happens. I do not even see SubConn state changes in the logs.

When killing one of the worker pods everything works fine and the resolver returns the new address set.

dfawley · 2017-12-07T21:37:03Z

Have you turned on info logging by importing the glogger package or using the environment variable GRPC_GO_LOG_SEVERITY_LEVEL="INFO"?

If killing the server manually works, however, my guess is MAX_CONNECTION_AGE isn't configured correctly or is not working correctly -- it should kill the connection and appear the same as an error to the client.

menghanl · 2017-12-14T22:34:12Z

Does the max age problem still exist? Did you get more logs for this issue?

trusch · 2017-12-27T10:13:27Z

I tried today, but can not reproduce the issue anymore!

menghanl mentioned this issue Nov 28, 2017

Add dial option to set balancer #1697

Merged

trusch mentioned this issue Nov 29, 2017

New balancer and resolver APIs #1388

Closed

8 tasks

dfawley added P2 Type: Bug labels Dec 7, 2017

trusch closed this as completed Dec 27, 2017

lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default client balancer only returns one address #1694

default client balancer only returns one address #1694

trusch commented Nov 28, 2017

menghanl commented Nov 28, 2017 •

edited

trusch commented Nov 29, 2017 •

edited

menghanl commented Nov 29, 2017

trusch commented Dec 1, 2017 •

edited

dfawley commented Dec 7, 2017

menghanl commented Dec 14, 2017

trusch commented Dec 27, 2017

default client balancer only returns one address #1694

default client balancer only returns one address #1694

Comments

trusch commented Nov 28, 2017

What version of gRPC are you using?

What version of Go are you using (go version)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

What did you see instead?

Additional Notes

menghanl commented Nov 28, 2017 • edited

trusch commented Nov 29, 2017 • edited

menghanl commented Nov 29, 2017

trusch commented Dec 1, 2017 • edited

dfawley commented Dec 7, 2017

menghanl commented Dec 14, 2017

trusch commented Dec 27, 2017

What version of Go are you using (`go version`)?

menghanl commented Nov 28, 2017 •

edited

trusch commented Nov 29, 2017 •

edited

trusch commented Dec 1, 2017 •

edited