Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grpc client is always disconnected the connection and then retry to connect to the server after configuring keep-alive parameters #3093

Closed
lrouter opened this issue Oct 14, 2019 · 12 comments

Comments

@lrouter
Copy link

lrouter commented Oct 14, 2019

Before configuring keep-alive parameters, everything is right.
But after configuring keep-alive parameters, the client is always disconnected from the server, and then try to connect it again. It's repeated.
The client configuration is :

s.datapathManagerConn, err = s.Dial(s.cfg.Manager.DatapathManager, 
  grpc.WithInitialConnWindowSize(256*1024),  
  grpc.WithKeepaliveParams(keepalive.ClientParameters{
		Time:                20,
		Timeout:             10,
		PermitWithoutStream: true,
  },
)

The server configuration is :

srv.InitGRPCServer(
		grpc.KeepaliveParams(
			keepalive.ServerParameters{
				Time:    (time.Duration(20) * time.Second),
				Timeout: (time.Duration(10) * time.Second),
			},
		),
		grpc.KeepaliveEnforcementPolicy(
			keepalive.EnforcementPolicy{
				MinTime:             (time.Duration(10) * time.Second),
				PermitWithoutStream: true,
			},
		), )

Why ?

I have track the packets during the communication, and find that after the client sends a ping(flag=0x0) to the server, it immediately shutdowns the connection and then creates a new tcp connection.
Could anyone help me ? Thanks!

@easwars
Copy link
Contributor

easwars commented Oct 15, 2019

I tried running the keepalive example found in https://github.com/grpc/grpc-go/tree/master/examples/features/keepalive with the configuration that you are using, and I do not see any connection disconnects. I only see PING messages going back and forth.

We do have an issue with the keepalive implementation where it sends a PING every [Time + Timeout] period instead of every [Time] period. But I don't think you are affected by that issue.

Could you please make sure there are no other reasons why the connection is terminated.

@dfawley
Copy link
Member

dfawley commented Oct 15, 2019

  grpc.WithKeepaliveParams(keepalive.ClientParameters{
		Time:                20,
		Timeout:             10,
		PermitWithoutStream: true,
  },

Assuming this is verbatim from your code, Time here is 20ns and Timeout is 10ns. I don't think this is what you want (presumably: 20 * time.Second and 10 * time.Second).

@easwars
Copy link
Contributor

easwars commented Oct 15, 2019

Oh yes. I overlooked that. With that verbatim config, I can definitely see the connection being closed. Working as expected, I guess.

@lrouter
Copy link
Author

lrouter commented Oct 17, 2019

@easwars @dfawley thanks.
I have fix the time unit error.
I have a question about the gprc reconnect mechanism. Grpc is based on http2 stream, and http2 is based on tcp.
I am using grpc bidirectional stream mode without configure gprc keep-alive parameters. Does the grcp lib tries to keep tcp connection alive or the http2 stream alive ?

@dfawley
Copy link
Member

dfawley commented Oct 17, 2019

gRPC keepalive is intended to detect network disconnects (TCP) and help prevent proxies from closing connections. reference design

HTTP/2 streams do not get closed due to idleness in any system I'm aware of.

Glad things are working better with the time units fixed.

@dfawley dfawley closed this as completed Oct 17, 2019
@lrouter
Copy link
Author

lrouter commented Oct 18, 2019

Our grpc stream client is connected to the stream server via LVS.

We scheduled the connection to another LVS server, and then the tcp connection was reconnected, but the client always receive a EOF error. Why ?

Client and server were connected without keep-alive during the test.

@dfawley
Copy link
Member

dfawley commented Oct 18, 2019

We scheduled the connection to another LVS server, and then the tcp connection was reconnected, but the client always receive a EOF error. Why ?

Streams cannot move between connections - is this what you mean? All streams are terminated any time the network connection is lost.

@lrouter
Copy link
Author

lrouter commented Nov 11, 2019

Streams cannot move between connections - is this what you mean? All streams are terminated any time the network connection is lost.

Based on our test with lvs in the middle of grpc client and server:
If using unary RPC mode, grpc client do not need to redial to the grpc server.
If using bidirectional streaming RPC mode, grpc client needs to redial to the grpc server. Otherwise client always accounts EOF error.
What causes the difference ?

@dfawley
Copy link
Member

dfawley commented Nov 11, 2019

The ClientConn is not a single connection. It is a connection pool with automatic reconnection. An RPC cannot move between connections, but each unary RPC will find a new connection, so you should not see any errors on unary RPCs when connections are gracefully shut down (existing RPCs are allowed to complete) and new ones are created (new RPCs will use the new connection(s)).

@lrouter
Copy link
Author

lrouter commented Nov 15, 2019

You mean bidirectional streaming RPC will not find a new connection ?

@dfawley
Copy link
Member

dfawley commented Nov 15, 2019

In-progress streaming RPCs are not migrated between connections (it's unclear how this would be possible given there is no way to transfer the state of the server to another server). New streaming RPCs will work on the same ClientConn.

@lrouter
Copy link
Author

lrouter commented Nov 18, 2019

Thank you.

@lock lock bot locked as resolved and limited conversation to collaborators May 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants