A proper `Body` trait #1438

seanmonstar · 2018-02-08T22:21:16Z

Currently (v0.11), hyper allows using any impl Stream<Item=impl AsRef<[u8]>, Error=hyper::Error> as the send body for requests and response. That should likely be changed, since it currently prevents using new features from HTTP2. While we can upgrade the current receive stream to have inherent methods to do new things, there is no way for hyper to use them with send streams. Here's some new features we'd like to support:

HTTP trailers: while defined in HTTP/1.1, they were rarely ever used, but HTTP/2 makes them easy to use, and new things are using them, like gRPC.
Push promises: while a design isn't settled yet, it's possible for a push promise to arrive after the headers, and a user may wish to be able to send new push promises after having gotten part way though a body.
Flow control? To start with, hyper will manage flow control internally, but we may want to add the ability for users to control that themselves. If so, they would be new methods on the trait with default implementations.
Other new frame types that get added to HTTP/2.
Being able to adjust the reset code sent before dropping the stream.

To accomplish this, hyper probably needs to define it's own trait, so that new features can be added later.

The `Body` trait

/// Replaces the `Stream<Item=impl AsRef<[u8]>, Error=hyper::Error>`.
///
/// This trait represents a streaming body of a `Request` or `Response`.
pub trait Body {
    /// A buffer of bytes representing a single chunk of a body.
    type Data: AsRef<[u8]> + 'static;
    
    /// The `User` error can be read more about here: https://github.com/hyperium/hyper/issues/1431
    type Error: Into<hyper::error::User>;
    
    /// Poll for a stream of `Data` buffers.
    fn poll_data(&mut self) -> Poll<Option<Self::Data>>, Self::Error>;
    
    /// Poll for an optional *single* `HeaderMap` of trailers. This should
    /// only be called after `poll_data` has ended.
    ///
    /// Note: Trailers aren't currently used for HTTP/1, only for HTTP/2.
    fn poll_trailers(&mut self) -> Poll<Option<HeaderMap>, Self::Error> {
        Ok(Async::Ready(None))
    }
    
    /// A hint that the `Body` is complete, and doesn't need to be polled more.
    ///
    /// This can be useful to determine if the there is any body or trailers
    /// without having to poll. An empty `Body` could return `true` and hyper
    /// would be able to know that only the headers need to be sent. Or, it can
    /// also be checked after each `poll_data` call, to allow hyper to try to end
    /// the underlying stream with the last chunk, instead of needing to send an
    /// extra `DATA` frame just to mark the stream as finished.
    ///
    /// As a hint, it is used to try to optimize, and thus is OK for a default
    /// implementation to return `false`.
    fn is_end_stream(&self) -> bool {
        false
    }
    
    /// Return a length of the total bytes that will be streamed, if known.
    ///
    /// If an exact size of bytes is known, this would allow hyper to send a
    /// `Content-Length` header automatically, not needing to fall back to
    /// `Transfer-Encoding: chunked`.
    ///
    /// This does not need to be kept updated after polls, it will only be called
    /// once to create the headers.
    fn content_length(&self) -> Option<u64> {
        None
    }
    
    /// A callback if there is an error trying to *write* data or trailers
    /// that have been polled from this `Body`.
    ///
    /// This can be things like declaring a certain size in `size_hint`,
    /// or having manually set a `Content-Length` header, and the polled
    /// data happens to be bigger. This error wouldn't necessarily close
    /// the request/response message, but could be useful to know.
    /// 
    /// Additionally, after returning a `Response<impl Body>` in a server
    /// context, there isn't any other mechanism to learn of errors that
    /// may occur trying to flush the body's data.
    ///
    /// The default implementation is to just log the error.
    fn on_error(&mut self, err: hyper::Error) {
        debug!("Body::on_error: {}", err);
    }
    
    // fn into_stream?
    //
    // This could be a simple wrapper that turns any `Body` into a
    // `Stream<Item=Self::Data, Error=Self::Error>`. Besides allowing
    // a user to easily convert a body into a `Stream` when something wants
    // one, it could do so correctly, by making sure that once `poll_data`
    // has reached the end, `poll_trailers` is also called, to allow any
    // `Body` implementation to know when it was finished properly.
    //
    // Alternatively, instead of being on the trait itself, this could
    // just be a type in the `hyper::body` module, `IntoStream<impl Body>`,
    // and it can be imported directly and a constructor used...
}

Some questions about the design that I still have:

Does the signature of poll_trailers() look like it is a stream of trailers? It's supposed to be like a future, only yielding once, not multiple HeaderMaps. It's probably fine, since Iterator::next returns Option<T>, and that doesn't mean that anything else returning Option<T> is iterator-like. Documentation should be enough.
Does content_length pull its weight?
Should there a trait method into_stream? Being there does mean less imports are needed, but does it conflict with Body trait objects? It could just be IntoStream::new(body)...
Does on_error make sense to have? Additionally, should it also be used to allow sending a reset error on a receive stream? For instance, if you've received the default body in a Request in the server, and after polling a little while, some other part of your app is on fire, should you be able to call body.on_error(User::internal_error()) and hyper should try to send that reset error? Currently, all you can do is drop the body, and hyper will send a CANCEL reset.
An alternative to separate poll_* methods is to use a single method, and return a non-exhaustive enum of the frames that a user can receive. This might reduce complication around knowing what order to call methods in, especially as new events are added, like push promises.
```
  match await body.poll_next()? {
    Some(Frame::Data(chunk)) => {},
    Some(Frame::Trailers(trailers)) => {},
    _ => {
        // new, unknown frame type
        // HTTP/2 requires new frames to be ignorable by
        // those that don't understand them
    }
}
```
Then, when hyper adds push promise support, a user just needs to update their match block to handle them. If they are added in a new method, it might be confusing when to call poll_data vs poll_push_promise, etc.

The default body type

Since the trait name is Body, the v0.11 hyper::Body needs a new name. So far, I haven't thought of any that I feel great about. In Java, you often find interfaces or abstract classes with the good name, and DefaultFoo as the implementation. 🤷‍♂️

/// What `hyper::Body` is in v0.11. Don't love the name...
///
/// This type serves two purposes:
/// - It is the body type that is *received* from hyper, such as in
///   client responses and server requests.
/// - It is a simple to use default for applications to create and
///   give to hyper as a send stream.
pub struct DefaultBody {
    // ...
}

impl Body for DefaultBody {
    type Data = hyper::Chunk;
    type Error = hyper::Error;
    
    // ...
}

impl DefaultBody {
    /// Creates a channel with a sender that can be safely passed to
    /// other tasks or threads. Data can be pushed onto the sender,
    /// and the associated `DefaultBody` receives it.
    pub fn channel() -> (DefaultBodySender, Self) {
        // ...
    }
    
    /// Creates an empty body.
    pub fn empty() -> Self {
        // ...
    }
    
    /// Create a body that yields just a single buffer.
    pub fn once<T>(buf: T) -> Self
    where
        T: Into<Chunk>,
    {
        // ...
    }
    
    /// Wraps any `Stream` as a `hyper::Body`.
    ///
    /// Internally, this boxes the stream, so as to not require
    /// a type parameter on `Self`. If boxing is undesirable,
    /// consider implementing `hyper::Body` on a custom type.
    pub fn wrap_stream<T>(stream: T) -> Self
    where
        T: Stream + 'static,
        T::Item: Into<Chunk>,
        T::Error: Into<hyper::error::User>,
    {
        // ...
    }
}

Some things I'd like to add to the (naming, bleck) DefaultBodySender are:

The ability to send new frames as hyper gains support for them, like push promises.
The ability to make closing the stream an explicit action. Right now, with an mpsc channel, if the sender is in another thread and that that thread panics, the sender is dropped, and the receiver just assumes the stream is finished. hyper understands that as "all good, end the writing". However, this can be bad! If the stream had a Content-Length, the remote will realize the body was interrupted. But if it didn't, the remote will just assume that the ended stream is the full response. This can result in the remote treating partial content as the full content.

I'd like for the default sender to have an option to mark that being dropped without having called close() should signal an error has occurred. This could either be a boolean flag you enable on the sender, or it could be a a new method that returns a guard type to be used in places where panics may happen.

The text was updated successfully, but these errors were encountered:

sfackler · 2018-02-09T04:41:46Z

I think content_length might be a better name than size_hint - its semantics and intended use case is pretty different than that of an iterator's size_hint. It'd probably also be good to note that it only needs to be correct if called before any data is streamed, but you don't need to keep it up to date with the amount of data remaining.

I like the idea of merging the data poll method and trailers poll method into one that essentially acts like a Stream<Item = FrameItem>.

seanmonstar · 2018-02-09T23:11:52Z

Good idea about content_length, I'll update the issue text.

I really like being able to just call one method like poll_next and matching an enum, it's much simpler and elegant. But, my worry is that it's impossible to deprecate enum variants, where as methods can be deprecated but still supported.

sfackler · 2018-02-09T23:14:42Z

Why can't you deprecate the variants?

seanmonstar · 2018-02-09T23:16:31Z

You can add a deprecated attribute, but you can't ever stop passing the data in that variant and instead pass a new one, or you break code at runtime.

sfackler · 2018-02-09T23:33:46Z

It seems like there will always need to be a variant for body data and one for trailers though, right?

seanmonstar · 2018-02-09T23:58:34Z

That's correct, and they probably wouldn't ever need to change But, the enum might need to grow new generic parameters, which could also be problematic. For instance, say we start with this:

#[non_exhaustive]
enum Frame<B> {
    Data(B),
    Trailers(HeaderMap),
}

But then want to add a push promise variant. I don't yet now what the quite looks like. Should it be the Request plus a future of the response? Maybe something like:

#[non_exhaustive]
enum Frame<B, PPF> {
    Data(B),
    Trailers(HeaderMap),
    PushPromise(Request<NoBody>, PPF),
}

Adding that generic would be a breaking change. So then, do we not allow passing a future? Or do we require boxing it? Or maybe we require a specific PushPromiseFuture that has an associated sender?

I'm not sure. And, if we later decide that the current design sucks, we're completely stuck with it. If we deprecate the variant and try a PushPromise2 that is a million times better, then we can't actually use it or anyone who was using the previous variant is busted. Maybe I over think this stuff too much.

sfackler · 2018-02-12T04:55:33Z

But wouldn't adding a type parameter break back compat anyway by adding a new associated type?

Thinking more about it, should content_length even be a method on Body? It can always just be set explicitly in the headers, right?

seanmonstar · 2018-02-12T17:43:17Z

But wouldn't adding a type parameter break back compat anyway by adding a new associated type?

So, I think it might be possible to add those without breaking, thanks to defaults allowed on associated types.

Should content_length even be a method on Body? It can always just be set explicitly in the headers, right?

That's what I meant when asked if pulled its weight. The reason I added it was because I wanted to add the ability to hyper to be able to determine length vs chunked automatically. It's especially helpful for HTTP/1.1 clients, since many servers arbitrarily decided that don't love a client sending a chunked body, and so it'd be less surprising for users that did Request::new(DefaultBody::once("hello world")).

Even in cases where servers don't have client chunked encoding, some endpoints want to validate a content length before accepting a body that is too big...

This dedicated `Entity` trait replaces the previous `Stream<Item=impl AsRef<[u8]>, Error=hyper::Error>`. This allows for several improvements immediately, and prepares for HTTP2 support. - The `Entity::is_end_stream` makes up for change away from `Option<Body>`, which was previously used to know if the body should be empty. Since `Request` and `Response` now require a body to be set, this method can be used to tell hyper that the body is actually empty. It also provides the possibility of slight optimizations when polling for data, by allowing to check `is_end_stream` before polling again. This can allow a consumer to know that a body stream has ended without polling for `None` afterwards. - The `Entity::content_length` method allows a body to automatically declare a size, in case a user doesn't set a `Content-Length` or `Transfer-Encoding` header. - It's now possible to send and receive trailers, though this will be for HTTP2 connections only. By being a trait owned by hyper, new methods can be added later as new features are wanted (with default implementations). The `hyper::Body` type now implements `Entity` instead of `Stream`, provides a better channel option, and is easier to use with custom streams via `Body::wrap_stream`. BREAKING CHANGE: All code that was assuming the body was a `Stream` must be adjusted to use an `Entity` instead. Using `hyper::Body` as a `Stream` can call `Body::into_stream` to get a stream wrapper. Passing a custom `impl Stream` will need to either implement `Entity`, or as an easier option, switch to `Body::wrap_stream`. `Body::pair` has been replaced with `Body::channel`, which returns a `hyper::body::Sender` instead of a `futures::sync::mpsc::Sender`. Closes #1438

seanmonstar · 2018-03-14T20:01:03Z

I settled on the trait name being Entity, and keeping Body as the default implementation. I've merged an implementation in the 0.12.x branch in 73a3fad

This dedicated `Entity` trait replaces the previous `Stream<Item=impl AsRef<[u8]>, Error=hyper::Error>`. This allows for several improvements immediately, and prepares for HTTP2 support. - The `Entity::is_end_stream` makes up for change away from `Option<Body>`, which was previously used to know if the body should be empty. Since `Request` and `Response` now require a body to be set, this method can be used to tell hyper that the body is actually empty. It also provides the possibility of slight optimizations when polling for data, by allowing to check `is_end_stream` before polling again. This can allow a consumer to know that a body stream has ended without polling for `None` afterwards. - The `Entity::content_length` method allows a body to automatically declare a size, in case a user doesn't set a `Content-Length` or `Transfer-Encoding` header. - It's now possible to send and receive trailers, though this will be for HTTP2 connections only. By being a trait owned by hyper, new methods can be added later as new features are wanted (with default implementations). The `hyper::Body` type now implements `Entity` instead of `Stream`, provides a better channel option, and is easier to use with custom streams via `Body::wrap_stream`. BREAKING CHANGE: All code that was assuming the body was a `Stream` must be adjusted to use an `Entity` instead. Using `hyper::Body` as a `Stream` can call `Body::into_stream` to get a stream wrapper. Passing a custom `impl Stream` will need to either implement `Entity`, or as an easier option, switch to `Body::wrap_stream`. `Body::pair` has been replaced with `Body::channel`, which returns a `hyper::body::Sender` instead of a `futures::sync::mpsc::Sender`. Closes #1438

seanmonstar added this to the 0.12 milestone Feb 8, 2018

seanmonstar closed this as completed Mar 14, 2018

scottlamb mentioned this issue Mar 20, 2018

Create Body from Stream #1468

Closed

seanmonstar added A-body Area: body streaming. C-feature Category: feature. This is adding a new feature. labels May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A proper `Body` trait #1438

A proper `Body` trait #1438

seanmonstar commented Feb 8, 2018 •

edited

sfackler commented Feb 9, 2018

seanmonstar commented Feb 9, 2018

sfackler commented Feb 9, 2018

seanmonstar commented Feb 9, 2018

sfackler commented Feb 9, 2018

seanmonstar commented Feb 9, 2018 •

edited

sfackler commented Feb 12, 2018

seanmonstar commented Feb 12, 2018

seanmonstar commented Mar 14, 2018

A proper Body trait #1438

A proper Body trait #1438

Comments

seanmonstar commented Feb 8, 2018 • edited

The Body trait

The default body type

sfackler commented Feb 9, 2018

seanmonstar commented Feb 9, 2018

sfackler commented Feb 9, 2018

seanmonstar commented Feb 9, 2018

sfackler commented Feb 9, 2018

seanmonstar commented Feb 9, 2018 • edited

sfackler commented Feb 12, 2018

seanmonstar commented Feb 12, 2018

seanmonstar commented Mar 14, 2018

A proper `Body` trait #1438

A proper `Body` trait #1438

seanmonstar commented Feb 8, 2018 •

edited

The `Body` trait

seanmonstar commented Feb 9, 2018 •

edited