Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(transport): Correctly map hyper errors #629

Merged
merged 3 commits into from Oct 20, 2021

Conversation

dfreese
Copy link
Contributor

@dfreese dfreese commented May 4, 2021

This came up when testing retry logic when using the
tonic::transport::Error. Specifically, we would run into Unknown errors
during the following sequence:

  • unplug ethernet
  • create client with connect_lazy
  • unary call -> Unknown error, representing a tonic::transport::Error

This doesn't seem consistent with the C++ gRPC behavior, and makes
it difficult to know when to retry. hyper::Error exposes some information
about the connection that we were seeing, so this specifically handles
the cases from hyper::Error that seemed to be not covered by lower level
errors. Based on the discussion here
this also included the is_timeout() flag, which indicates a keep_alive
failure.

Fixes #628

@dfreese dfreese marked this pull request as draft May 4, 2021 22:28
@davidpdrsn
Copy link
Member

I'm curious what the underlying error is actually causing what you're seeing. Its most likely h2::Error but would be nice to know what its reason is. Perhaps you try to add some more. Maybe some printlns here would show the exact error.

We already map h2::Reason::REFUSED_STREAM to Code::Unavailable here according to https://github.com/grpc/grpc/blob/3977c30/doc/PROTOCOL-HTTP2.md#errors.

transport::Error can also happen for a bunch of different reasons not all of which feel correct to map to unavailable. An invalid user agent for example.

@dfreese
Copy link
Contributor Author

dfreese commented May 8, 2021

That's fair. Let me see if I can come up with a clear test case to demonstrate what I was seeing.

@riking
Copy link

riking commented Jul 7, 2021

#612 potentially related where GOAWAY would cause new streams to fail with a NO_ERROR error

@tdyas
Copy link
Contributor

tdyas commented Jul 7, 2021

Its most likely h2::Error but would be nice to know what its reason is.

FYI #612 was merged recently and makes the underlying h2::Error available as the source of the Status via the std::error::Error::source method.

@dfreese dfreese changed the title Map tonic::transport::Error in tonic::Status::Unavailable Map hyper::Error in tonic::Status::Unavailable Aug 8, 2021
@dfreese dfreese marked this pull request as ready for review August 8, 2021 19:08
@dfreese
Copy link
Contributor Author

dfreese commented Aug 8, 2021

I've updated this to handle hyper::Error, which I think is more representative of the underlying problem. Since there isn't a way to create hyper errors, this is covered by integration, not unit, tests. I updated the connection integration test to check the updated error.

I haven't created a good integration test for the keep_alive behavior yet (which is the is_timeout() branch of the check added here). I wanted to check about the overall approach first before diving into that.

@dfreese
Copy link
Contributor Author

dfreese commented Aug 8, 2021

@tdyas, it looks like #612 changed how the h2 error was being mapped to an error. tonic is no longer checking for an h2 error on every level, and then mapping it to the appropriate gRPC error code. It looks like it now only check in the top-level case. I think I agree from @riking's perspective that the user should only be required to check the gRPC status in order to know how to appropriately handle the gRPC error.

@dfreese
Copy link
Contributor Author

dfreese commented Aug 8, 2021

@davidpdrsn It looks like the connection.rs integration test case was reasonable enough to validate what I was looking at. In that case, what I'm seeing in terms of an error chain is:

  1. tonic transport error
  2. hyper error (is_connect() == true)
  3. std::io::Error (Connection Refused [os error 111])

There's no underlying h2::Error

@riking
Copy link

riking commented Aug 9, 2021

I think I agree from @riking's perspective that the user should only be required to check the gRPC status in order to know how to appropriately handle the gRPC error.

And in particular, correctly returning UNAVAILABLE is necessary for automatic retry machinery to function.

@LucioFranco LucioFranco changed the title Map hyper::Error in tonic::Status::Unavailable fix(transport): Correctly map hyper errors Oct 20, 2021
@LucioFranco LucioFranco merged commit 4947b07 into hyperium:master Oct 20, 2021
rnarubin added a commit to rnarubin/ya-gcp that referenced this pull request Jan 10, 2022
This brings in a new version of the gRPC library Tonic. This may help
fix some bugs[1] in status codes.

This also necessitates an upgrade to the Prost protobuf lib.
Proto-generated sources have been updated, though it appears the only
differences is some doc formatting

[1]: hyperium/tonic#629
rnarubin added a commit to rnarubin/ya-gcp that referenced this pull request Jan 10, 2022
This brings in a new version of the gRPC library Tonic. This may help
fix some bugs[1] in status codes.

This also necessitates an upgrade to the Prost protobuf lib.
Proto-generated sources have been updated, though it appears the only
differences is some doc formatting

[1]: hyperium/tonic#629
rnarubin added a commit to rnarubin/ya-gcp that referenced this pull request Jan 10, 2022
This brings in a new version of the gRPC library Tonic. This may help
fix some bugs[1] in status codes.

This also necessitates an upgrade to Prost 0.9. Protobuf-generated
sources have been updated, though it appears the only differences is
some doc formatting

[1]: hyperium/tonic#629
rnarubin added a commit to standard-ai/ya-gcp that referenced this pull request Jan 10, 2022
This brings in a new version of the gRPC library Tonic. This may help
fix some bugs[1] in status codes.

This also necessitates an upgrade to Prost 0.9. Protobuf-generated
sources have been updated, though it appears the only differences is
some doc formatting

[1]: hyperium/tonic#629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tonic Transport Errors not mapped to Unavailable
5 participants