Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Make HTTP a hidden implementation detail, don't expose it in the public API #90

Closed
per-gron opened this issue Oct 25, 2018 · 26 comments
Closed

Comments

@per-gron
Copy link
Collaborator

Right now, tower-grpc exposes to its users that it uses HTTP in a bunch of places. An example is the tower_grpc::Error::Grpc enum variant, which exposes the http::HeaderMap type from hyperium/http. Another example is the tower_grpc::Response type, which has into_http and from_http methods to convert to and from http::Response values.

I think the tower-grpc API should be changed so that no HTTP specific types are visible to its users. There are both philosophical, practical and experience based reasons for this:

Reasons

Philosophical argument

gRPC is designed as a protocol that is agnostic to its underlying transport. The gRPC over HTTP2 specification document begins with "This document serves as a detailed description for an implementation of gRPC carried over HTTP2 framing."

It can be argued that it's unlikely that users of tower-grpc are going to ever want any other underlying transport, but we don't really know that. In addition, and this is perhaps the stronger reason, gRPC is a useful and easy to reason about subset of HTTP2, which on its own is very complex. By exposing raw HTTP2 constructs to users of tower-grpc, users are unnecessarily exposed to complexities of HTTP2, for little to no benefit.

Practical argument

The direct exposure of HTTP primitives, at least without helpers which don't exist today, makes it difficult or impossible to properly implement some of the gRPC language interop tests.

special_status_message is about sending an error response with Unicode text, which gRPC over HTTP2 requires to be percent encoded. With tower-grpc's current design, this percent encoding has to be done by end user code, which is very error prone and not necessary. All other gRPC libraries handle this automatically. Even without the complication of the percent encoding, requiring the user to know that you should check the "grpc-message" HTTP header to get the error message is a very unintuitive API.

Similarly, for the custom_metadata interop test, the user of the API has to manually do base64 encoding and decoding of custom metadata fields with a "-bin" suffix, which gRPC over HTTP2 requires that the gRPC library does for the user. In addition, tower-grpc does currently not support trailing custom metadata at all because the tower_grpc::Response is just a wrapper around http::Response, which happens to not store trailers (the http library seems to make it possible to read them it's just not stored in that struct).

Placing the burden of percent encoding error messages, base64 encoding some headers, and even just requiring the knowledge that gRPC custom metadata is sent over headers, and what the header name for error messages is, is just not nice. I would even argue that, at least without helper functions, it violates the spec.

Experience based argument

To my knowledge, every widely used gRPC library in existence completely hides HTTP from their users. Custom metadata is transferred with headers but exposed under the name "custom metadata", etc. Also, the data type for storing an error response (the equivalent of tower_grpc::Error) stores only the status code, the error message and the binary error details, not arbitrary HTTP header data. See for example the C++ API: grpc::Status

Because all of the widely used gRPC libraries get away with not exposing low level HTTP2 details to their users, I think it's highly likely that the same will be true for tower-grpc. In fact, just being more similar to the other gRPC libraries is on its own a good reason for this because of the reduced learning curve. (There are ways that tower-grpc is different from other gRPC libraries because it's in Rust, but I don't think this is one of those.)

Proposed changes

  1. Simplify tower_grpc::Error to only store the information that other gRPC libaries store. Don't store HTTP stuff, and don't store prost object or anything else. gRPC specifies how to map HTTP errors and other kinds of errors into this format. Although not strongly typed the gRPC Status type is powerful enough to encode any kind of error. As discussed in Refine Error type #6, doing this would make the API more friendly to use as well because at the moment a server side error is not the same type as a client side error, which causes pain when servers need to propagate errors.

  2. Similarly, hide HTTP types from the tower_grpc::Request and tower_grpc::Response types. Only expose fields in terms of gRPC. This would make the API more similar to the other gRPC implementations, make it easier to understand because users won't have to know the details of the gRPC/HTTP2 mapping, and make it possible to make a more friendly API for custom metadata.

@per-gron
Copy link
Collaborator Author

Ping @carllerche, tower-grpc maintainer, and @jtattermusch, creator of the C# gRPC, who might be able to provide insight and/or say I'm wrong.

@carllerche
Copy link
Member

We should survey what other libs (Go, C++, C#, ...) do in this case. If none of the primary libs provide access to the HTTP types, then it seems fine to encapsulate.

That said, the decision to encapsulate HTTP is separate from providing an API for metadata. We should wrap the HeaderMap with a MetadataMap and provide metadata read / write APIs.

Thoughts?

@per-gron
Copy link
Collaborator Author

Encapsulating HeaderMap in a MetadataMap sounds like a reasonable first step. For that we can provide means to convert to/from a HeaderMap.

For tower_grpc::Response it may not make sense to store both the current http::Response and a MetadataMap because then those two objects can have conflicting values for headers. So there it's probably better to break it down and store a MetadataMap and whatever http::Response contains except headers.

@carllerche
Copy link
Member

tower_grpc::Response does not need to contain the http::Response, it can stick to what it needs.

@per-gron
Copy link
Collaborator Author

Did some research on other gRPC libraries:

For grpc-core (C/C++) I'm quite sure that no underlying HTTP types are exposed. I have used it myself and know that API fairly well. For the other ones I read the API docs and could not find any explicit exposure of HTTP except for when servers and clients are fist instantiated. Java specifically can be used with either Netty, okhttp or an in-memory backend so it really can't expose the underlying HTTP library to the user since there isn't a single one. Go has some methods with "Header" in their names but the docs describe them as means of getting custom metadata, not HTTP headers directly.

Details:

Go gRPC:

Server side: Handlers get context.Context and the protobuf object.
Client side: Calls take a context.Context and the protobuf object and are methods on objects of generated types.


The Context has:

  • https://godoc.org/context#Context
  • Deadline
  • Cancellation detection
  • Untyped “Value” map which contains metadata but also potentially other things. I can’t find any reference to HTTP specific things here but it’s hard to track what this actually has just from docs.

The API docs for Go gRPC https://godoc.org/google.golang.org/grpc does not mention HTTP2 except for:


  • an experimental method that allows serving gRPC from a Go standard library HTTP server object,
  • Two methods that allow injecting “Codec”s, for example a JSON parser for parsing when content-type is application/grpc+json. HTTP is mentioned in the docs because it’s connected to the content-type header.

C# gRPC API docs are at https://grpc.io/grpc/csharp/api/Grpc.Core.html

Server side: Handler methods get the protobuf object and a ServerCallContext
Client side: Calls take the protobuf object and are methods on objects of generated “Client” types.

ServerCallContext https://grpc.io/grpc/csharp/api/Grpc.Core.ServerCallContext.html has:

So there are HTTP related things but I cannot find any direct HTTP references in the docs there.

Java gRPC docs are at https://grpc.io/grpc-java/javadoc/

Server side: Handlers get the request proto message and an object to reply with.
Client side: Calls take the protobuf object and are methods on objects of generated “Client” types.

gRPC Java has multiple transports implemented: The main Netty based one, an Android client-only implementation based on okhttp and io.grpc.inprocess which is notable because it doesn’t use HTTP.

The Java gRPC library does not expose any Netty or okhttp primitives to the user of the library.

@carllerche
Copy link
Member

Ok, I would be fine hiding HeaderMap in favor of a MetadataMap.

To avoid having to dup the map, I would have:

struct MetadataMap {
    headers: http::HeaderMap,
}

Then do the metadata conversion on the fly.

@seanmonstar
Copy link
Contributor

seanmonstar commented Oct 29, 2018

Wow, thanks for the in-depth report, awesome! I also think it's fine to start to hide the HTTPness here.

An alternative (which I'm not saying is better, just that we should consider) to having a MetadataMap is to just have setters and getters on the Request and Response types themselves.

@carllerche
Copy link
Member

@seanmonstar it seems like it would be useful to be able to pass the metadata map around, if it is on Request / Response then fns that take it would need to be generic over the body?

@per-gron
Copy link
Collaborator Author

I see now that the HeaderMap interface is pretty big. It takes quite a bit of wrapping to wrap all the things, including the iters and all other little details.

I agree with @carllerche in that I think there is merit in being able to pass around a MetadataMap without also passing around the Request/Response.

I've started working on an implementation based on

struct MetadataMap {
    headers: http::HeaderMap,
}

@carllerche
Copy link
Member

@per-gron It's totally fine to start incrementally and not expose the entire interface initially.

@per-gron
Copy link
Collaborator Author

HeaderMap enforces an invariant that header names and values don't contain characters that are illegal for HTTP. For MetadataMap the set of legal bytes in its values depends on the header name. I'm still trying to figure out how to put that into code.

@seanmonstar
Copy link
Contributor

It looks like the variants are:

  • Header name: ASCII value
  • Header name + -bin: base64 encoded value.

A possible API: have Ascii and Binary types, kinda like HeaderValue, and set_ascii(name, ascii) and set_bin(name, binary).

@per-gron
Copy link
Collaborator Author

@seanmonstar Fyi non-"-bin" fields can have non-ASCII too. The gRPC language interop test suite tests support for sending UTF-8 there, and HeaderValue in the http library allows more than ASCII. It also allows less than ASCII because e.g. newlines are disallowed. gRPC docs ambiguously call these values "string" metadata values.

Here is a potential approach for handling "-bin" fields: MetadataMap would have two sets of accessor methods: One that operates on MetadataStringValues (behaves like HeaderValue) and one that operates on MetadataBinaryValues (allows arbitrary byte strings). Because of covariance/contravariance you would get a quirky interface where you can't always read your writes:

Operation Methods that return MetadataStringValue Methods that return MetadataBinaryValue
Read Ignores or does not allow "-bin" keys Allows all keys
Write Allows all keys Does not allow non-"-bin" keys
Enumerate Ignores "-bin" keys Allows all keys

Another option is to have two separate sets of accessor methods with zero overlap. Then, to enumerate all metadata the user needs to enumerate string and binary entries separately. A variant of this option is to also have separate MetadataStringKey and MetadataBinaryKey types that can only be constructed with the right "-bin" suffix (or lack of it). Then the key names can be constructed once and individual metadata accesses don't need to handle name errors.

A third option which is more similar to what other gRPC libs to but that maybe feels un-Rusty is to just ditch type safety and allow all entries to contain binary data and then fail when converting it to a HTTP HeaderMap object.

@seanmonstar @carllerche do you have a preference for what to do here? To me it seems like the second option of segregating the map into effectively two separate parts is a nice way to go because that is probably how people will use it in practice anyway.

@carllerche
Copy link
Member

@per-gron I am kind of leaning towards just doing runtime checks. Something like:

pub struct MetadataValue {
    is_bin: bool,
    data: HeaderValue,
}

Then have separate constructors:

impl MetadataValue {
    fn string(s: &str) -> MetadataValue { }

    fn bytes(s: &[u8]) -> MetadataValue { }

    fn as_str(&self) -> Option<&str> { }
}

Some questions:

If string values support UTF-8 values, how are they encoded in headers? Perhaps the binary repr is a valid header value?

Can there be both a string and -bin entry for the same name? For example, can the metadata map contain both "foo" and "foo-bin" keys?

@per-gron
Copy link
Collaborator Author

per-gron commented Nov 2, 2018

string metadata values support all characters that HTTP headers can have, which I think is something like bytes >32 and != 127. gRPC doesn't encode them, they are just raw HTTP headers.

As far as I can tell, nothing prevents having "foo" and "foo-bin". Other gRPC libraries require users to explicitly call the metadata keys "-bin" or non-"-bin".

@jtattermusch
Copy link

Hi there, I'm the maintainer of gRPC C# and contributor for other gRPC languages. I just wanted to confirm what @per-gron is saying:

  • gRPC "metadata" should be agnostic of the protocol and therefore http2 header / trailer types should not be exposed directly to the user. AFAIK all gRPC languages behave that way (C# definitely doesn't expose any HTTP-related types, it uses its own Metadata and Metadata.Entry types, that resemble HTTP2 headers and trailers, but are completely separate types otherwise).
  • There is some additional functionality for gRPC initial and trailing metadata (such as the binary vs non-binary metadata rules mentioned earlier, upcasing and downcasing logic etc.), which also will be easier to implement with types that are independent from rust http2 library.

to summarize:

  • there are "initial metadata" for requests, and "initial metadata" and "trailing metadata" for responses. they are basically multi-maps (single "header" can have multiple values, which is the same as in http2). they either support "text values" or "binary values" (based on the name suffix). all of this is necessary to be able to pass our interoperability test suite.

btw, if there's a draft for the tower-grpc public API, gRPC team can help reviewing it.

@carllerche
Copy link
Member

@jtattermusch Thanks for the thoughts, that makes sense.

As per an API draft, tower-grpc's API is experimental. At some point (soon hopefully), the API will be nailed down a bit more. I will ping you at that point 👍

per-gron added a commit to per-gron/tower-grpc that referenced this issue Nov 8, 2018
As per the discussion in tower-rs#90, it wraps http::HeaderMap to abstract
away the fact that tower-grpc is HTTP based. At this time this class
is a pure wrapper that does not actually implement any of the gRPC
metadata specific stuff (such as binary fields), that will come later.
@per-gron
Copy link
Collaborator Author

per-gron commented Nov 8, 2018

Phew, the metadata map interface turned out to be a lot bigger than I expected.

I uploaded the PR hyperium/http#278 against hyperium http to be able to fully wrap it.

I also uploaded #92 as a large step towards hiding HTTP types from Tower's public interface. This is an important step towards being able to handle binary metadata fields properly, although it's just a refactoring so far.

@per-gron
Copy link
Collaborator Author

per-gron commented Nov 8, 2018

@jtattermusch can you provide links to docs or something about gRPC specific upcasing/downcasing rules? I can't find much about it. tower-grpc currently uses hyperium http's header handling which is very careful about http2 conformance, including case sensitivity.

@jtattermusch
Copy link

@per-gron per the gRPC spec, all metadata keys need to be lower-case only: https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md (see "Header-Name"). There rest of the behavior comes from http2 spec: see summary here: https://blog.yaakov.online/http-2-header-casing/ (all header names must be converted to lower cases before sent on the wire)

@per-gron
Copy link
Collaborator Author

@jtattermusch thanks for the info! Hyperium already enforces that all header names are lower case, so I think we're good on that point.

@per-gron
Copy link
Collaborator Author

@carllerche Working on making MetadataMap actually automatically encode binary entries. At the moment I'm hacking on a type system based approach rather than your suggested runtime check approach, because 1) with the unsafe transmutes I am not able to store that additional flag with MetadataValue, 2) it seems tricky to enforce the invariants at all times if users start taking MetadataValues from Ascii fields and placing them in entries with -bin keys and vice versa.

@carllerche
Copy link
Member

@per-gron I see what you mean. Could you post a sketch of the type system strategy? I would hate for you to spent a ton of time on a path and we opt to go a different route.

Also, what is more common? String values or binary values?

@per-gron
Copy link
Collaborator Author

@carllerche I don't know if string or binary values are more common. My guess is that string values are more common and binary only used when you need to store richer data, in practice protobuf messages.

I have a wip branch at https://github.com/per-gron/tower-grpc/blob/request-metadata-bin/src/metadata_map.rs in case you want to see the gory details but perhaps that can wait until you actually review that upcoming PR. Quick summary of the approach is:

Have a trait ValueEncoding with two implementations: Ascii and Binary. Those don't really do much on their own but are used with PhantomData in keys and values, so a key will be either MetadataKey<Ascii> or MetadataKey<Binary>, and there is also MetadataValue<Ascii> and MetadataValue<Binary>. The end goal with those is to make it so that all binary metadata values that the user of MetadataMap can get access to are of type MetadataValue<Binary> and the accessor methods on that type will do base64 decoding when it's read and encoding when it's written (so actual in-memory storage of binary entries will be base64 encoded).

To get this, you have get, get_all etc methods on MetadataMap for ascii values and separate get_bin, get_all_bin etc for binary values. get_bin doesn't work for ascii values and vice versa.

There's a lot of details in the implementation of this approach that I'm a bit unsure about. For example, I don't really like that HeaderMap and consequently MetadataMap are a bit panic-prone when you pass strings to insert and append. Also, instead of having MetadataKey<Ascii> vs MetadataKey<Binary> there could be only one type for both, but with more runtime checks instead.

@carllerche
Copy link
Member

@per-gron Thanks for the summary. I will have to spend some time thinking about the problem before being able to provide additional thoughts...

Regarding HeaderMap panicking, here should definitely be variants for everything on HeaderMap that do not panic.

per-gron added a commit that referenced this issue Dec 5, 2018
* Rename Request::headers to Request::metadata

* Introduce tower_grpc::metadata::MetadataMap type

As per the discussion in #90, it wraps http::HeaderMap to abstract
away the fact that tower-grpc is HTTP based. At this time this class
is a pure wrapper that does not actually implement any of the gRPC
metadata specific stuff (such as binary fields), that will come later.

* Address some review comments

* Use unsafe transmutes to make MetadataMap <=> HeaderMap conversion faster

* Address review comments

* Make the interop tests build

* Say why some code is commented out

* Use unsafe transmutes for returning &MetadataKey from MetadataMap

This makes the API more self-symmetrical and more similar to HeaderMap.

* Use newly exported IterMut and ValuesMut from http to resolve TODOs

* Add #[repr(transparent)] annotations to types that rely on that behavior
per-gron added a commit to per-gron/tower-grpc that referenced this issue Dec 5, 2018
As per the discussion in tower-rs#90, it wraps http::HeaderMap to abstract
away the fact that tower-grpc is HTTP based. At this time this class
is a pure wrapper that does not actually implement any of the gRPC
metadata specific stuff (such as binary fields), that will come later.
per-gron added a commit to per-gron/tower-grpc that referenced this issue Dec 5, 2018
As per discussion in tower-rs#90.

This commit also fixes a bug that made tower-grpc provide the status code
in the error message field.
per-gron added a commit that referenced this issue Dec 6, 2018
* Don't expose HTTP HeaderMap in tower_grpc::Error enum

As per discussion in #90.

This commit also fixes a bug that made tower-grpc provide the status code
in the error message field.

* Perform percent decoding on error messages

As per gRPC spec. This commit also adds the special_status_message
client interop test to validate this behavior.
@per-gron
Copy link
Collaborator Author

per-gron commented Mar 9, 2019

@seanmonstar I think all the changes proposed in this ticket are done by now. Please reopen if I'm missing something.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants