Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to manage a rogue client that is not closing connections #4105

Closed
pkpfr opened this issue Dec 12, 2020 · 6 comments
Closed

How to manage a rogue client that is not closing connections #4105

pkpfr opened this issue Dec 12, 2020 · 6 comments

Comments

@pkpfr
Copy link

pkpfr commented Dec 12, 2020

We have a public gRPC API. We have a client that is consuming our API based on the REST paradigm of creating a connection (channel) for every request. We suspect that they are not closing this channel once the request has been made.

On the server side, everything functions ok for a while, then is seems that something is exhausted. Requests back up on the servers and are not processed - this results in our proxy timing out and sending an unavailable response. Restarting the server fixes the issue.

Unfortunately, it seems that there is no way to monitor what is happening on the server side and prune these connections. We have the following keep alive settings, but they don't appear to have an impact:

grpc.KeepaliveParams(keepalive.ServerParameters{
		MaxConnectionIdle:     time.Minute * 5,
		MaxConnectionAge:      time.Minute * 15,
		MaxConnectionAgeGrace: time.Minute * 1,
		Time:    time.Second * 60,
		Timeout: time.Second * 10,
	})

Is there any way that we can monitor channel creation and destruction on the server side - if only to prove to the client that their consumption is causing the problems. Verbose logging has not been helpful as it seems to only log the client activity on the server (I.e. the server consuming pub/sub and logging as a client). I have also looked a channelz, but we have mutual TLS auth and I have been unsuccessful in being able to get it to work on our production pods.

We have instructed our client to use a single channel, and if that is not possible, to close the channels that they are creating, but they are a corporate and move very slowly.

@menghanl
Copy link
Contributor

Is there any way that we can monitor channel creation and destruction on the server side

Servers don't see channels. They only see the TCP connections created by the clients. I think you can count the number of connections from each client (by IP).
In the normal case, each channel should only need to create one connection to one backend (unless you intentionally create a TCP pool). If you are getting multiple connections from each client IP, but are only expecting 1, that kinds of proves the problem.

To count number of connections, you can do it by wrapping the listerner and overriding Accept(). Or use the stats handler.

to use a single channel, and if that is not possible, to close the channels that they are creating

You are essentially DoS'ed by your clients. So this would be the right thing to do.
You can try to limit the number of connection each server accepts. But the hard part is, since you cannot tell which client connection are from the stale channels, you won't know which to reject.

@stale
Copy link

stale bot commented Dec 24, 2020

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@stale stale bot added the stale label Dec 24, 2020
@fm0803
Copy link

fm0803 commented Dec 29, 2020

I face a similar problem. I set keepalive policy as above, expecting to close a client that does nothing but does not close connection, however, after 2 hours, this client stays alive and can send requests normally.

@stale stale bot removed the stale label Dec 29, 2020
@stale
Copy link

stale bot commented Jan 4, 2021

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@stale stale bot added the stale label Jan 4, 2021
@menghanl menghanl removed the stale label Jan 7, 2021
@menghanl
Copy link
Contributor

menghanl commented Jan 7, 2021

Another thing to mention is, go's implementation is missing the IDLE feature: #1719, #1786

With IDLE, the unused clients would drop the connections. We will try to prioritize this feature.

@dfawley
Copy link
Member

dfawley commented Mar 25, 2021

Let's track this part of the feature under #4298.

@dfawley dfawley closed this as completed Mar 25, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants