Get rid of hyper by default #15

tailhook · 2016-02-10T17:25:43Z

As discussed in #8

It looks like removing hyper in header parsing added about 10% of performance (65k rps vs 72k rps on my laptop on hello world). For certain cases (for example proxies, including fastcgi or similar ones) it makes the interface even simpler.

It's easy to apply hyper on top of rotor-http, and it should not make any noticeable overhead because we don't allocate any additional structures for our own handling of headers.

The work as of v0.5.0 halfway done: input headers (in both client and server) are passed as a slice of httparse::Header but the output ones (including status/error codes) are used from hyper.

The text was updated successfully, but these errors were encountered:

pyfisch · 2016-02-10T18:14:54Z

Some random notes:

Why does is_transfer_encoding(val: &str) (and similiar functions) not use std::ascii::AsciiExt::eq_ignore_ascii_case(&self, other: &Self). For me the implementation looks equally fast if it is not faster.
In is_chunked function you check for newlines. If I understand httparse correctly it does check for \n and \r in header values and blocks them.
HTTP/2.0 allows the user-agent to send a scheme different from HTTP and HTTPS. So probably the Head struct should use a &str here. Exposing the headers as an array is good but probably most uses will still need to convert it to a HashMap.
Some consumers of rotor will only need client or server but not both, so they should be probably separate crates and a shared crate (cargo makes it possible to host them within the same repo). On the other hand rotor-http is much smaller than eg. hyper so this is less of an issue. (hyper is also not split)

briansmith · 2016-02-10T18:17:54Z

Some consumers of rotor will only need client or server but not both, so they should be probably separate crates and a shared crate (cargo makes it possible to host them within the same repo). On the other hand rotor-http is much smaller than eg. hyper so this is less of an issue. (hyper is also not split)

I don't think this kind of split would be a net positive. If needed, in the future there could be "client" and "server" default features that can be disabled by users that don't want to build the subset of the functionality that they don't use.

tailhook · 2016-02-10T19:29:47Z

Why does is_transfer_encoding(val: &str) (and similiar functions) not use std::ascii::AsciiExt::eq_ignore_ascii_case(&self, other: &Self). For me the implementation looks equally fast if it is not faster.

Wow, sorry, I have somehow missed them. I looked in hyper and they allocate an UniCase string there. Just tried to be faster :) Maybe there is also some .trim() function for bytes that I'm missing?

In is_chunked function you check for newlines. If I understand httparse correctly it does check for \n and \r in header values and blocks them.

Hm, I used to think that line folding is implemented in httparse, but it's optional https://tools.ietf.org/html/rfc7230#section-3.2.4:

A server that receives an obs-fold in a request message that is not
within a message/http container MUST either reject the message by
sending a 400 (Bad Request), preferably with a representation
explaining that obsolete line folding is unacceptable, or replace
each received obs-fold with one or more SP octets prior to
interpreting the field value or forwarding the message downstream.

Maybe I can remove it.

HTTP/2.0 allows the user-agent to send a scheme different from HTTP and HTTPS. So probably the Head struct should use a &str here. Exposing the headers as an array is good but probably most uses will still need to convert it to a HashMap.

Sounds reasonable.

Some consumers of rotor will only need client or server but not both, so they should be probably separate crates and a shared crate (cargo makes it possible to host them within the same repo). On the other hand rotor-http is much smaller than eg. hyper so this is less of an issue. (hyper is also not split)

Well, the library is small enough. Currently, there are no additional dependencies in either of them. Also, rust removes unused code in resulting binary, so should not be a problem. Also, about one-third of the size of the codebase is currently used in both client and server, and this part doesn't have good public interface.

However, these would probably be external libraries:

request routing for server
simplified interface for handling only buffered requests (like most apps will do anyway)
DNS support for client
connection pooling for client

pyfisch · 2016-02-10T20:27:32Z

Linefolding is indeed implemented in httparse. I just mean you do not need to check for \n and \r in rotor-http.

There is no .trim() for bytes.

tailhook · 2016-02-10T20:31:25Z

Linefolding is indeed implemented in httparse. I just mean you do not need to check for \n and \r in rotor-http.

Well, then I'm not sure what you mean, I believe that the header:

Transfer-encoding: whatever,
  chunked

.. will be received as bytes b"whatever,\r\n chunked". right?

pyfisch · 2016-02-10T20:52:29Z

I tried it out with the request GET /index.html HTTP/1.1\r\nHost: foobar.com\r\nTransfer-Encoding: whatever,\r\n chunked\r\n\r\n and it did not work because of the folded-line. It turns out that httparse does not support obs-fold at all. For the given request there is an error message returned.

tailhook · 2016-02-10T21:48:12Z

It turns out that httparse does not support obs-fold at all.

Yes. That's my conclusion too. I'm not sure this is a deliberate decision or just omission. I probably should test whether headers are work in the wild (with todays clients and servers), just to make sure

And another missed comment:

Exposing the headers as an array is good but probably most uses will still need to convert it to a HashMap.

This needs a CaseInsensitiveMultiDict, which is quite complex and ugly data structure. Probably I would keep it for higher level of abstration. In fact I see three cases:

Propagation of headers into some kind of backend protocol where all headers are just re-encoded and never actually looked up (say we want a fastcgi so just encode everything as HTTP_HEADER_NAME)
Use in application where many headers would be just put into fixed fields in some Request structure rather than used as strings
Headers used in routing, like deciding which backend to use or which backend protocol to choose.

Only the case (3) requires a lookup. And it usually requires only handful of headers (probably only Host). So it may be faster to just scan them once again similarly to what is done to determine request length. There are only ~10 headers in each request and they should be in CPU cache in the headers_received handler. So I don't think it may be a bottleneck.

pyfisch · 2016-03-08T17:41:14Z

This can be closed, I think.

tailhook mentioned this issue Feb 10, 2016

effective request uri #8

Closed

This was referenced Feb 13, 2016

remove need for typed hyper headers #17

Merged

Remove hyper #21

Closed

tailhook closed this as completed Mar 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of hyper by default #15

Get rid of hyper by default #15

tailhook commented Feb 10, 2016

pyfisch commented Feb 10, 2016

briansmith commented Feb 10, 2016

tailhook commented Feb 10, 2016

pyfisch commented Feb 10, 2016

tailhook commented Feb 10, 2016

pyfisch commented Feb 10, 2016

tailhook commented Feb 10, 2016

pyfisch commented Mar 8, 2016

Get rid of hyper by default #15

Get rid of hyper by default #15

Comments

tailhook commented Feb 10, 2016

pyfisch commented Feb 10, 2016

briansmith commented Feb 10, 2016

tailhook commented Feb 10, 2016

pyfisch commented Feb 10, 2016

tailhook commented Feb 10, 2016

pyfisch commented Feb 10, 2016

tailhook commented Feb 10, 2016

pyfisch commented Mar 8, 2016