Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 'Error: 13 INTERNAL: No message received' when sending/receiving JSON message that are 60 - 80 kb #2707

Closed
bongleeEveri opened this issue Apr 2, 2024 · 17 comments

Comments

@bongleeEveri
Copy link

bongleeEveri commented Apr 2, 2024

Problem description

When sending JSON string that are 60 - 80 kb long, we are seeing intermittent 'Error: 13 INTERNAL: No message received' on the gRPC client server. Based on our investigation it is happening due to gRPC client receiving the Ok status call before it receives the message. When we added in the below setTimeout call the issue stopped happening.

image

Below is the exact error message
Error: Error: 13 INTERNAL: No message received at callErrorFromStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\call.js:31:19) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client.js:180:80) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client-interceptors.js:360:141) at Object.onReceiveStatus (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client-interceptors.js:323:181) at C:\grpc-test\node_modules\@grpc\grpc-js\build\src\resolving-call.js:99:78 at process.processTicksAndRejections (node:internal/process/task_queues:77:11) for call at at ServiceClientImpl.makeUnaryRequest (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\client.js:160:32) at ServiceClientImpl.testMethod (C:\grpc-test\node_modules\@grpc\grpc-js\build\src\make-client.js:105:19) at C:\grpc-test\dist\client.js:25:16 at step (C:\grpc-test\node_modules\tslib\tslib.js:195:27) at Object.next (C:\grpc-test\node_modules\tslib\tslib.js:176:57) at C:\grpc-test\node_modules\tslib\tslib.js:169:75 at new Promise (<anonymous>) at Object.__awaiter (C:\grpc-test\node_modules\tslib\tslib.js:165:16) at C:\grpc-test\dist\client.js:22:60 at Layer.handle [as handle_request] (C:\grpc-test\node_modules\express\lib\router\layer.js:95:5) { code: 13, details: 'No message received', metadata: Metadata { internalRepr: Map(0) {}, options: {} } }

Reproduction steps

We setup gRPC client and gRPC server on two different servers and sent JSON string with 7k line count over the network. When we did this the above issue was reproducible.

Environment

  • OS Name Microsoft Windows Server 2019 Standard
  • Node version 20.11.1
  • gRPC-js@1.10.4

Additional context

The issue started happening when we migrated over to grpc-js from deprecated grpc package.

@murgatroid99
Copy link
Member

I published a change in version 1.10.6 that I think will fix this. Please try it out.

@bongleeEveri
Copy link
Author

@murgatroid99 Thank you for a quick response. Unfortunately the change did not seem to work. We are still receiving the OK status on the gRPC client side before we receive the data.

@murgatroid99
Copy link
Member

Can you share the code you are using to reproduce this error?

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

grpc-test-server-client.zip
grpc-test-server-client-updated.zip

Here's the client/server code

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

sorry I have updated the file above @murgatroid99

@murgatroid99
Copy link
Member

That code is configured to communicate on localhost. Do you ever actually reproduce this error on localhost?

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

when both client/server are hosted locally, nope. Only when server is hosted on a remote server

@murgatroid99
Copy link
Member

OK, while I'm figuring out how to reproduce that myself, can you run you test server with the environment variables GRPC_TRACE=all, GRPC_VERBOSITY=DEBUG, and NODE_DEBUG=http2 and share the output from a run that reproduces this error?

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

D 2024-04-03T18:33:15.742Z | v1.10.6 296 | server | (1) Attempting to bind 0.0.0.0:4006
D 2024-04-03T18:33:15.750Z | v1.10.6 296 | server | (1) Successfully bound 0.0.0.0:4006
HTTP2 296: Http2Session server: received a connection
HTTP2 296: Http2Session server: setting up session handle
HTTP2 296: Http2Session server: sending settings
HTTP2 296: Http2Session server: submitting settings
HTTP2 296: Http2Session server: created
D 2024-04-03T18:33:23.105Z | v1.10.6 296 | server | (1) Connection established by client 10.104.53.166:63364
(node:296) Warning: Setting the NODE_DEBUG environment variable to 'http2' can expose sensitive data (such as passwords, tokens and authentication headers) in the resulting log.
(Use node --trace-warnings ... to show where the warning was created)
HTTP2 296: Http2Stream 1 [Http2Session server]: headers received
D 2024-04-03T18:33:23.139Z | v1.10.6 296 | server | (1) Received call to method /db.Proxy/testMethod at address null
D 2024-04-03T18:33:23.141Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod received headers {"grpc-accept-encoding":["identity,deflate,gzip"],"accept-encoding":["identity"
],"user-agent":["grpc-node-js/1.10.6"],"content-type":["application/grpc"],"te":["trailers"]}
D 2024-04-03T18:33:23.142Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod start called
D 2024-04-03T18:33:23.142Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod startRead called
D 2024-04-03T18:33:23.143Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod received data frame of size 5
D 2024-04-03T18:33:23.148Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod startRead called
HTTP2 296: Http2Session server: settings received
HTTP2 296: Http2Stream 1 [Http2Session server]: initiating response
D 2024-04-03T18:33:23.157Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod sent data frame of size 78650
D 2024-04-03T18:33:23.163Z | v1.10.6 296 | server_call | Request to method /db.Proxy/testMethod ended with status code: OK details: OK
HTTP2 296: Http2Stream 1 [Http2Session server]: shutting down writable on _final
HTTP2 296: Http2Stream 1 [Http2Session server]: sending trailers
HTTP2 296: Http2Stream 1 [Http2Session server]: closed with code 0, closed false, readable false
HTTP2 296: Http2Stream 1 [Http2Session server]: destroying stream
D 2024-04-03T18:33:23.168Z | v1.10.6 296 | server_call | Request to method /db.Proxy/testMethod stream closed with rstCode 0

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

Here is a comparison of the logs when running it successfully locally. Main difference is this line here -
D 2024-04-03T18:54:53.755Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod sent data frame of size 356904

D 2024-04-03T18:54:46.687Z | v1.10.6 39072 | server | (1) Server constructed
D 2024-04-03T18:54:46.689Z | v1.10.6 39072 | server | (1) bindAsync port=0.0.0.0:4006
D 2024-04-03T18:54:46.690Z | v1.10.6 39072 | dns_resolver | Resolver constructed for target dns:0.0.0.0:4006
D 2024-04-03T18:54:46.691Z | v1.10.6 39072 | dns_resolver | Returning IP address for target dns:0.0.0.0:4006
Service started at port 4006
D 2024-04-03T18:54:46.695Z | v1.10.6 39072 | server | (1) Attempting to bind 0.0.0.0:4006
D 2024-04-03T18:54:46.698Z | v1.10.6 39072 | server | (1) Successfully bound 0.0.0.0:4006
HTTP2 39072: Http2Session server: received a connection
HTTP2 39072: Http2Session server: setting up session handle
HTTP2 39072: Http2Session server: sending settings
HTTP2 39072: Http2Session server: submitting settings
HTTP2 39072: Http2Session server: created
D 2024-04-03T18:54:53.737Z | v1.10.6 39072 | server | (1) Connection established by client 127.0.0.1:63924
(node:39072) Warning: Setting the NODE_DEBUG environment variable to 'http2' can expose sensitive data (such as passwords, tokens and authentication headers) in
the resulting log.
(Use node --trace-warnings ... to show where the warning was created)
HTTP2 39072: Http2Stream 1 [Http2Session server]: headers received
D 2024-04-03T18:54:53.743Z | v1.10.6 39072 | server | (1) Received call to method /db.Proxy/testMethod at address null
D 2024-04-03T18:54:53.743Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod received headers {"grpc-accept-encoding":["identity,deflate,gzip"],"a
ccept-encoding":["identity"],"user-agent":["grpc-node-js/1.10.6"],"content-type":["application/grpc"],"te":["trailers"]}
D 2024-04-03T18:54:53.744Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod start called
D 2024-04-03T18:54:53.744Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod startRead called
D 2024-04-03T18:54:53.744Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod received data frame of size 5
D 2024-04-03T18:54:53.746Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod startRead called
HTTP2 39072: Http2Session server: settings received
HTTP2 39072: Http2Stream 1 [Http2Session server]: initiating response
D 2024-04-03T18:54:53.755Z | v1.10.6 39072 | server_call | Request to /db.Proxy/testMethod sent data frame of size 356904
D 2024-04-03T18:54:53.763Z | v1.10.6 39072 | server_call | Request to method /db.Proxy/testMethod ended with status code: OK details: OK
HTTP2 39072: Http2Stream 1 [Http2Session server]: shutting down writable on _final
HTTP2 39072: Http2Stream 1 [Http2Session server]: sending trailers
HTTP2 39072: Http2Stream 1 [Http2Session server]: closed with code 0, closed false, readable false
HTTP2 39072: Http2Stream 1 [Http2Session server]: destroying stream
D 2024-04-03T18:54:53.766Z | v1.10.6 39072 | server_call | Request to method /db.Proxy/testMethod stream closed with rstCode 0

@murgatroid99
Copy link
Member

That's not the difference. The first log has the line D 2024-04-03T18:33:23.157Z | v1.10.6 296 | server_call | Request to /db.Proxy/testMethod sent data frame of size 78650, which is the same operation. I don't see any substantial difference between the two logs.

@murgatroid99
Copy link
Member

I suspect that this is a bug in Node. The gRPC code in question waits for the write callback and the wantTrailers event before trying to send the status. It shouldn't be possible for the status to be sent before the message.

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

Ok this wasn't occurring with the deprecated grpc pkg. Also, we noticed that grpc-js sends response in multiple chunks over TCP. The legacy grpc sends the response in a single chunk over RSL protocol. You think this possibly be related to the issue?

@jkomoda
Copy link

jkomoda commented Apr 3, 2024

Btw were you able to reproduce? Also, do you think the 150ms timeout is a viable solution?

@murgatroid99
Copy link
Member

Ok this wasn't occurring with the deprecated grpc pkg

That package didn't use the Node networking stack. If this is a bug in Node, that difference would be expected.

Also, we noticed that grpc-js sends response in multiple chunks over TCP

This is how HTTP/2 works. Data frames have a maximum size, and gRPC messages that are larger than that size are divided into multiple data frames.

The legacy grpc sends the response in a single chunk over RSL protocol

I have no idea what this is referring to. Both libraries use the same protocol (gRPC over HTTP/2), which includes the same chunking rules. I have never even heard of the RSL protocol.

Also, do you think the 150ms timeout is a viable solution?

No. All of the operations involved in a gRPC stream are deterministic and sequential (to the extent that we care about), so it should be possible to get it right consistently without fudging the timing. There is no reason to believe that any particular timeout will consistently prevent this error, and 150ms will substantially harm performance in every other case.

@murgatroid99
Copy link
Member

Btw were you able to reproduce?

I don't yet have the setup to reproduce this. However, I did realize that we have an existing test that involves sending a large payload (~300KB), so I'd like to understand why that didn't already catch this error

  1. Our test uses TLS. Can you still reproduce the error when using TLS?
  2. Our test runs on Linux. Have you encountered this error on any platform other than Windows?

@jkomoda
Copy link

jkomoda commented Apr 4, 2024

Ok so we tested it on TLS and it works! Also, it works on Linux with non-TLS. Maybe adding tests for non-TLS will catch this error in the future.

Thanks for all of the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants