Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GRPC Keepalive without calls #258

Closed
dangerousplay opened this issue Feb 5, 2020 · 15 comments
Closed

Support GRPC Keepalive without calls #258

dangerousplay opened this issue Feb 5, 2020 · 15 comments
Labels
E-easy Call for participation: Experience needed to fix: Easy / not much E-help-wanted Call for participation: Help is requested to fix this issue.
Milestone

Comments

@dangerousplay
Copy link

Feature Request

Hello, the GRPC has it's own way of doing keepalive, besides you could use the SO_KEEPALIVE flag on the TCP socket. Currently, the GRPC HTTP 2 keepalive is not implemented. The ping part of the protocol can be found here. Here is the documentation of the GRPC part on this.

The keepalive ping is a way to check if a channel is currently working by sending HTTP2 pings over the transport. It is sent periodically, and if the ping is not acknowledged by the peer within a certain timeout period, the transport is disconnected.

Motivation

Without keepalive, long idle connections will be dropped, especially if you have, for instance, a network load balancer that doesn't know the protocol used.

Proposal

Implement the GRPC oficial keepalive solution using HTTP 2 ping frames. The server needs to schedule those ping commands and handle the ACK (and the client).

GRPC - https://github.com/grpc/grpc/blob/master/doc/keepalive.md
HTTP 2 - https://http2.github.io/http2-spec/#PING

From the GRPC FAC:

FAQ

  • When is the keepalive timer started?
    • The keepalive timer is started when a transport is done connecting (after handshake).
  • What happens when the keepalive timer fires?
    • When the keepalive timer fires, gRPC Core will try to send a keepalive ping on the transport. This ping can be blocked if -
      • there is no active call on that transport and GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS is false.
      • the number of pings already sent on the transport without any data has already exceeded GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA.
      • the time elapsed since the previous ping is less than GRPC_ARG_HTTP2_MIN_SENT_PING_INTERVAL_WITHOUT_DATA_MS.
    • If a keepalive ping is not blocked and is sent on the transport, then the keepalive watchdog timer is started which will close the transport if the ping is not acknowledged before it fires.
  • Why am I receiving a GOAWAY with error code ENHANCE_YOUR_CALM?
    • A server sends a GOAWAY with ENHANCE_YOUR_CALM if the client sends too many misbehaving pings. For example -
      • if a server has GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS set to false and the client sends pings without there being any call in flight.
      • if the client's GRPC_ARG_HTTP2_MIN_SENT_PING_INTERVAL_WITHOUT_DATA_MS setting is lower than the server's GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS.
@dangerousplay dangerousplay changed the title Support GRPC Keepalive on HTTP 2 Support GRPC Keepalive without calls Feb 5, 2020
@LucioFranco
Copy link
Member

Looking into this a bit more it looks like h2 doesn't support sending custom data on the keepalive. https://docs.rs/h2/0.2.1/h2/struct.Ping.html

The other problem here will be that hyper doens't support ping either. So I would think for this custom behavior we would want to have a custom transport. But I don't like the idea of maintaining something that is not hyper. It might be worth it to open an issue for custom pings in the hyper repo.

@Ten0
Copy link

Ten0 commented Feb 18, 2020

+1
This would currently be a blocker for us for moving from grpc-rs.

@LucioFranco
Copy link
Member

I am a bit short on time today but the next steps would be to open the corresponding issues in h2 and hyper around custom pings. The problem I see here that will make this really hard is that we have not figured out how to send a keepalive ping via tower-service. Most likely we will have to offload some of this to hyper.

cc @seanmonstar

@seanmonstar
Copy link
Member

Some more details:

  • h2 allows sending of pings, but doesn't allow customizing the payload (is this required?), and only allows a single outstanding (user) ping at a time.

    I've been experimenting using that ping internally in hyper for BDP flow control, so if also used for keep alive, then h2 may need to grow support for different user ping payloads. Maybe worth an issue to discuss.

  • Besides extra support in h2, hyper would need to expose a way to send pings. If so, ideally it's done in a way that supports HTTP3 in the future (does that have pings)?

    Or, I suppose it could just grow a couple http2_keepalive knobs and do it internally?

@LucioFranco
Copy link
Member

@LucioFranco
Copy link
Member

Also looks like we may need to echo exactly what is pinged to us in the ping frame.

Reference: https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#ping-frame

@seanmonstar
Copy link
Member

h2 already automatically responds to PING frames received over the wire.

@seanmonstar
Copy link
Member

Relevant PR in hyper providing generic HTTP2 keep-alive support: hyperium/hyper#2151

@seanmonstar
Copy link
Member

hyper v0.13.4 now includes options to set some HTTP2 keep-alive options. So, next would be taking advantage of them in tonic.

@LucioFranco LucioFranco added the E-help-wanted Call for participation: Help is requested to fix this issue. label Mar 23, 2020
@LucioFranco LucioFranco added this to the 0.2 milestone Mar 27, 2020
@LucioFranco LucioFranco added the E-easy Call for participation: Experience needed to fix: Easy / not much label Mar 27, 2020
@LucioFranco
Copy link
Member

This has been done in #307

@dfreese
Copy link
Contributor

dfreese commented May 28, 2021

I was testing this out, and it does seem like there is a slight mismatch from the spec in terms of error codes returned. In hyperium/hyper#2151, KeepAliveTimedOut from h2 seems to get mapped to h2::Reason::INTERNAL_ERROR, but from here:

An expired client initiated PING will cause all calls to be closed with an UNAVAILABLE status. Note that the frequency of PINGs is highly dependent on the network environment, implementations are free to adjust PING frequency based on network and application requirements.

The error I was seeing was:

[status: Internal, message: "h2 protocol error: protocol error: unexpected internal error encountered", details: [], metadata: MetadataMap { headers: {} }]

Does tonic get the information it needs to be able to map that to UNAVAILABLE, or would that require a change in h2?

@LucioFranco
Copy link
Member

@seanmonstar ^ we likely want to forward that?

@seanmonstar
Copy link
Member

We could maybe get hyper's error type to have is_timeout() return true in that case also, and then maybe tonic could look for that? Or maybe tonic doesn't have that tight coupling with hyper.

@dfreese
Copy link
Contributor

dfreese commented Jun 7, 2021

I was able to take a closer look at hyper::Error and how it might integrate with tonic::Status. I'll follow up with that in #629 once I've had a chance to clean up some of the tests. It seems like there is a reasonable way to handle both the is_timeout() case and connection errors that I was seeing in that PR.

geekbrother added a commit to CommE2E/comm that referenced this issue Dec 1, 2022
…Tonic server

Summary:
This diff introduces removing the "pinging" async task from the gRPC server and using the [[ hyperium/tonic#258 | gRPC http/2 keep-alive support ]] of the Tonic.
The full context is located in the [[ https://linear.app/comm/issue/ENG-1842/consider-using-of-grpc-keep-alive-instead-of-ping-messages | ENG-1842 ]] task. [[ https://linear.app/comm/issue/ENG-1842/consider-using-of-grpc-keep-alive-instead-of-ping-messages#comment-e807b7f9 | Research and proof-of-work ]] of the gRPC keep-alive using Tonic.

Related Linear task: [[ https://linear.app/comm/issue/ENG-1842/consider-using-of-grpc-keep-alive-instead-of-ping-messages | ENG-1842 ]]

Test Plan:
1. Service is successfully built.
2. Tonic keep-alive [[ https://linear.app/comm/issue/ENG-1842/consider-using-of-grpc-keep-alive-instead-of-ping-messages#comment-e807b7f9 | "field" tests were successfully passed ]].

Reviewers: jon, marcin, varun, tomek

Reviewed By: jon, varun, tomek

Subscribers: ashoat, tomek, atul

Differential Revision: https://phab.comm.dev/D5713
@fabianbergmark
Copy link

How can the tonic server be configured to allow PERMIT_KEEPALIVE_WITHOUT_CALLS? Currently I get RESOURCE_EXHAUSTED: HTTP/2 error code: ENHANCE_YOUR_CALM (Bandwidth exhausted) when I configure my client with keepAliveWithoutCalls = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-easy Call for participation: Experience needed to fix: Easy / not much E-help-wanted Call for participation: Help is requested to fix this issue.
Projects
None yet
Development

No branches or pull requests

6 participants