Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry: Status should indicate more details when retries are enabled and an RPC fails #7023

Open
ukai opened this issue Mar 6, 2024 · 6 comments
Labels
P2 Type: Feature New features or improvements in behavior

Comments

@ukai
Copy link

ukai commented Mar 6, 2024

NOTE: if you are reporting is a potential security vulnerability or a crash,
please follow our CVE process at
https://github.com/grpc/proposal/blob/master/P4-grpc-cve-process.md instead of
filing an issue here.

Please see the FAQ in our main README.md, then answer the questions below
before submitting your issue.

What version of gRPC are you using?

v1.61.0

What version of Go are you using (go version)?

go 1.21.7

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

If possible, provide a recipe for reproducing the error.

We set service config like this
https://chromium.googlesource.com/infra/infra/+/86b0e81fba4005098b1bec9ca40018fc3dacd523/go/src/infra/build/siso/reapi/reapi.go#99
with https://github.com/bazelbuild/remote-apis-sdks/tree/master/go/pkg/balancer

What did you expect to see?

We expect grpc retries for unavailable error.

What did you see instead?

Seems not retries for error

rpc error: code = Unavailable desc = error reading from server: read tcp 10.0.11.175:47388->64.233.183.95:443: read: connection reset by peer

https://ci.chromium.org/ui/p/chromium/builders/build/build-perf-android-siso/1597/overview

@ukai ukai added the Type: Bug label Mar 6, 2024
@dfawley dfawley self-assigned this Mar 13, 2024
@dfawley
Copy link
Member

dfawley commented Mar 13, 2024

@ukai,

Why do you believe retries are not occurring? Can you provide logs with debugging enabled?

https://github.com/grpc/grpc-go#how-to-turn-on-logging

@ukai
Copy link
Author

ukai commented Mar 14, 2024

I think it's not feasible to get logs as it is very rare case (one a day or so). log volume already too big, so I think we can't get verbose log anymore.

I'm not sure gRPC retries (but return error due to max retires or so), or not.
Can gRPC return error indicating that N-times retried or so?

@dfawley
Copy link
Member

dfawley commented Mar 14, 2024

That's not a bad idea. That should involve modifying the error returned from the RPC here:

return false, err

We should also find out what Java/C++ are doing for this case (and the other cases where RPCs are not retried around there).

@dfawley dfawley changed the title no retry for "code = Unavailable desc = error reading from server: read tcp 10.0.11.175:47388->64.233.183.95:443: read: connection reset by peer" ? retry: Status should indicate more details when retries are enabled and an RPC fails Mar 14, 2024
@dfawley dfawley added the P2 label Mar 14, 2024
@dfawley
Copy link
Member

dfawley commented Mar 14, 2024

After talking with Java/Node teams, they don't do anything like this, either. We'd have to be very careful to make it clear what the server produced. There's no precedent in grpc-go for using wrapped errors, and this might be able to use something like that.

Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@dfawley
Copy link
Member

dfawley commented Mar 21, 2024

(Sorry, I should have removed the label.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Type: Feature New features or improvements in behavior
Projects
None yet
Development

No branches or pull requests

2 participants