New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled ExecuteBatchError leaves gRPC AsyncIO API in a permanently degraded state #31570
Comments
@XuanWang-Amos Ping on this. |
I'm currently investigating this issue, looks like this error is on server side, will need some time to dig into this. |
Just a note to say that I also encountered this recently with 1.51.1 and python 3.9. |
The error message To everyone who have the similar issue, please include two environment flags while starting gRPC and paste logs so we can help further debug: In the meanwhile, I'll create a PR to enhance the error message so we can have more information on python layer too. |
@XuanWang-Amos I was able to repro with the extra debug turned on, please see attached log. I'm pretty sure this is related to the client closing/abandoning the call early. |
I can confirm that we have this issue whit grpc web when client reload web page while the stream is running |
Thanks for the log, looks like client received RST_STREAM with error code 8:
RST_STREAM with error code 8 should be mapped to CANCELLED when sent by a server, from our code, we handle it by throwing an We'll discuss internally to see how should we proceed from here. |
@XuanWang-Amos - Any updates on this? We are also facing similar issue on our end as mentioned in the description. |
A PR was merged so that we'll no longer throw But it's unclear if that will also fix the degraded performance issue, I'm adding |
Looks like it's not happening anymore, closing this issue now. Again, feel free to comment here if the performance issue still exist. |
What version of gRPC and what language are you using?
grpcio version 1.47.0
Python version 3.8.10
What operating system (Linux, Windows,...) and version?
Docker image: python:3.8-slim-bullseye
What runtime / compiler are you using (e.g. python version or version of gcc)
Docker container is running on Google Kubernetes Engine, Version 1.22.15-gke.100, in zone europe-west4-a.
What did you do?
We have a GRPC API deployed on Kubernetes, which uses the gRPC AsyncIO API, and defines two simple RPCs, one which serves around 200 requests per minute, per replica, another which is just called by a readiness probe, once per 5 seconds and per replica. 2 Replica are deployed.
The API was running fine for about a year. But recently, we had an incident, in which both replica logged the same error message, almost at the same time:
What did you expect to see?
I would have expected one of these two things to happen:
grpcio
to the surrounding python code, so that it can decide how to handle the exception (e.g. don't catch it, and let Kubernetes restart the pod as a result)grpcio
, and the API continues serving requests as before, with the same response times as before.What did you see instead?
Anything else we should know about your project / environment?
Unfortunately, I can't give detailed instructions on how to reproduce the exact circumstances of this bug, since it happened more or less randomly for us, after the API had already been running fine for around a year. However, the bug occured almost simultaneously
in both replica of the API. This leads us to believe, it was some infrastructure-related issue that triggered the exception.
Possibly related to #31527 or #31043.
The text was updated successfully, but these errors were encountered: