Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated GOAWAY from Client on graceful shut down by graces-go server #1520

Closed
fonok3 opened this issue Nov 9, 2022 · 7 comments · Fixed by #1534
Closed

Duplicated GOAWAY from Client on graceful shut down by graces-go server #1520

fonok3 opened this issue Nov 9, 2022 · 7 comments · Fixed by #1534
Labels

Comments

@fonok3
Copy link

fonok3 commented Nov 9, 2022

Describe the bug

We use a server implemented with grpc-go This server may gracefully shutdown the connection before shutting down itself. In the repository of grpc-go it is documented that the server sends two GOAWAY frames to circumvent a http2 race condition:

'''
// A client can receive multiple GoAways from the server (see
// grpc/grpc-go#1387). The idea is that the first
// GoAway will be sent with an ID of MaxInt32 and the second GoAway will be
// sent after an RTT delay with the ID of the last stream the server will
// process.
'''

The grpc-swift-client responds to both GOAWAY frames with GOAWAY frames which leads the grpc-go-server to crash. E.g. the client implementation of grpc-go checks for already received GOAWAY frames and their stream ids which prevents two following GOAWAY frames by the client (Implementation).

Since the behavior of the grpc-swift-client causes other servers to crash we expect grpc-swift to change the behavior.

To reproduce

Steps to reproduce the bug you've found:

  1. Initiate connection
  2. Make a bidirectional RPC to the server (grpc-go)
  3. Invoke gracefullShutdown on the server
  4. Clients sends two GOAWAY frames
  5. Server crashes due to close of closed channel

We could reproduce the behaviour utilising the client from RouteGuide-Example and implementing a go-server using the example proto file.

Expected behaviour

The grpc-swift-client should only send one GOAWAY frame.

@fonok3 fonok3 added the bug label Nov 9, 2022
@Lukasa
Copy link
Collaborator

Lukasa commented Nov 9, 2022

Based on your description of the server's behaviour, I think the Go implementation is in the wrong here. There is nothing incorrect in sending multiple GOAWAY frames sequentially, so long as the maximum stream IDs do not go upwards. There's no clear reason to require that we send only one GOAWAY frame here, especially as the Go implementation is itself sending multiple GOAWAY frames.

@Lukasa
Copy link
Collaborator

Lukasa commented Nov 9, 2022

For further investigation I followed your suggestion and implemented a trivial example route guide server in Go and used the Swift client, and then performed a packet capture on localhost. The implementation of the RouteChat RPC in the server is just GracefulStop() followed by for { } to prevent the stream from being shut too early. This produces the following Go crash:

panic: close of closed channel

goroutine 22 [running]:
google.golang.org/grpc/internal/transport.(*http2Server).handlePing(0xc00019da00, 0xc0003160c0)
	/Users/lukasa/Documents/Go/pkg/mod/google.golang.org/grpc@v1.50.1/internal/transport/http2_server.go:847 +0xd1
google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc00019da00, 0xc000258b40, 0x1518288)
	/Users/lukasa/Documents/Go/pkg/mod/google.golang.org/grpc@v1.50.1/internal/transport/http2_server.go:655 +0x4f2
google.golang.org/grpc.(*Server).serveStreams(0xc0002501e0, 0x158b5f8, 0xc00019da00)
	/Users/lukasa/Documents/Go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:949 +0x103
google.golang.org/grpc.(*Server).handleRawConn.func1(0xc0002501e0, 0x158b5f8, 0xc00019da00, 0xc00019ada0, 0xa)
	/Users/lukasa/Documents/Go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:891 +0x3f
created by google.golang.org/grpc.(*Server).handleRawConn
	/Users/lukasa/Documents/Go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:890 +0x190

Note that the crash in this instance is about receiving the PING, not about the GOAWAY frame.

Can you provide the Go traceback for your crash?

@zasweq
Copy link

zasweq commented Dec 9, 2022

Got the fix for the closed channel: grpc/grpc-go#5854. However, the gRPC-Go team is still working on identifying why two goaway ping acks were received on the go server in the first place. Either a. we send out two goaway pings in your issue, or b. gRPC-Swift sends out two goaway ping acks somehow. Does your implementation send out ping acks after the two goaway responses (I swear I remember reading something about this) or do we send out two pings? Triaging this further now. Can you provide your go example server? Also, would you mind investigating the frames read and written on the client side? Somehow, we suspect the swift client is sending two goaway ping acks. It would be useful to see the flow. I'll work on this concurrently.

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 12, 2022

I'm afraid that I can't provide my Go example server easily due to a legal limitation. If you can give me a Go server you're happy for me to use as a reference I can do the rest of the investigation here.

@zasweq
Copy link

zasweq commented Dec 12, 2022

Thanks for the reply. No worries at all, but do you think you could print the frames received and sent in your system to help figure out what's going on here. This is the where the framer reads frames (can use %+v formatting directive on the whole frame for clean logs): https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_server.go#L608. https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_server.go#L1342 this is the only place the grpc-go Http2_Server sends a GOAWAY ping, so you can log here. I just want to derive whether the double send of goaway ping acks is due to the Swift client (where we suspect) or us sending a double goaway ping. Logging at these two places will help you derive which side has the problem.

@dfawley
Copy link
Member

dfawley commented Dec 15, 2022

Ping on this issue. We have fixed the panic in gRPC-Go, but I'm worried it could be papering over another issue. The only explanation we have at the moment is that gRPC-swift is sending out multiple ping acks for a single ping, which if true should be fixed, and if not would mean we need to do further investigation on our side.

As far as debugging, on the Go server side, you should be able to simply set GODEBUG=http2debug=2 in the environment to get a list of all the http/2 frames the server receives.

And for the Go server to test with, I was thinking you would be able to use the RouteChat modifications as mentioned in your earlier comment above: #1520 (comment)

@glbrntt
Copy link
Collaborator

glbrntt commented Dec 16, 2022

The only explanation we have at the moment is that gRPC-swift is sending out multiple ping acks for a single ping, which if true should be fixed, and if not would mean we need to do further investigation on our side.

gRPC Swift is emitting multiple acks. We'll fix this.

glbrntt added a commit to glbrntt/grpc-swift that referenced this issue Dec 16, 2022
Motivation:

gRPC Swift is emitting two acks per ping. NIOHTTP2 is emitting one and
the keepalive handler is emitting the other.

Modifications:

- Don't emit ping acks from the keep alive handler; just let the H2
  handler do it.

Result:

- No unnecessary ping acks are emitted.
- Resolves grpc#1520
glbrntt added a commit to glbrntt/grpc-swift that referenced this issue Dec 16, 2022
Motivation:

gRPC Swift is emitting two acks per ping. NIOHTTP2 is emitting one and
the keepalive handler is emitting the other.

Modifications:

- Don't emit ping acks from the keep alive handler; just let the H2
  handler do it.

Result:

- No unnecessary ping acks are emitted.
- Resolves grpc#1520
glbrntt added a commit that referenced this issue Dec 19, 2022
Motivation:

gRPC Swift is emitting two acks per ping. NIOHTTP2 is emitting one and
the keepalive handler is emitting the other.

Modifications:

- Don't emit ping acks from the keep alive handler; just let the H2
  handler do it.

Result:

- No unnecessary ping acks are emitted.
- Resolves #1520
WendellXY pushed a commit to sundayfun/grpc-swift that referenced this issue Aug 24, 2023
Motivation:

gRPC Swift is emitting two acks per ping. NIOHTTP2 is emitting one and
the keepalive handler is emitting the other.

Modifications:

- Don't emit ping acks from the keep alive handler; just let the H2
  handler do it.

Result:

- No unnecessary ping acks are emitted.
- Resolves grpc#1520
pinlin168 pushed a commit to sundayfun/grpc-swift that referenced this issue Aug 24, 2023
Motivation:

gRPC Swift is emitting two acks per ping. NIOHTTP2 is emitting one and
the keepalive handler is emitting the other.

Modifications:

- Don't emit ping acks from the keep alive handler; just let the H2
  handler do it.

Result:

- No unnecessary ping acks are emitted.
- Resolves grpc#1520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants