Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of hyper by default #15

Closed
tailhook opened this issue Feb 10, 2016 · 8 comments
Closed

Get rid of hyper by default #15

tailhook opened this issue Feb 10, 2016 · 8 comments

Comments

@tailhook
Copy link
Owner

As discussed in #8

It looks like removing hyper in header parsing added about 10% of performance (65k rps vs 72k rps on my laptop on hello world). For certain cases (for example proxies, including fastcgi or similar ones) it makes the interface even simpler.

It's easy to apply hyper on top of rotor-http, and it should not make any noticeable overhead because we don't allocate any additional structures for our own handling of headers.

The work as of v0.5.0 halfway done: input headers (in both client and server) are passed as a slice of httparse::Header but the output ones (including status/error codes) are used from hyper.

@pyfisch
Copy link
Contributor

pyfisch commented Feb 10, 2016

Some random notes:

  • Why does is_transfer_encoding(val: &str) (and similiar functions) not use std::ascii::AsciiExt::eq_ignore_ascii_case(&self, other: &Self). For me the implementation looks equally fast if it is not faster.
  • In is_chunked function you check for newlines. If I understand httparse correctly it does check for \n and \r in header values and blocks them.
  • HTTP/2.0 allows the user-agent to send a scheme different from HTTP and HTTPS. So probably the Head struct should use a &str here. Exposing the headers as an array is good but probably most uses will still need to convert it to a HashMap.
  • Some consumers of rotor will only need client or server but not both, so they should be probably separate crates and a shared crate (cargo makes it possible to host them within the same repo). On the other hand rotor-http is much smaller than eg. hyper so this is less of an issue. (hyper is also not split)

@briansmith
Copy link

Some consumers of rotor will only need client or server but not both, so they should be probably separate crates and a shared crate (cargo makes it possible to host them within the same repo). On the other hand rotor-http is much smaller than eg. hyper so this is less of an issue. (hyper is also not split)

I don't think this kind of split would be a net positive. If needed, in the future there could be "client" and "server" default features that can be disabled by users that don't want to build the subset of the functionality that they don't use.

@tailhook
Copy link
Owner Author

Why does is_transfer_encoding(val: &str) (and similiar functions) not use std::ascii::AsciiExt::eq_ignore_ascii_case(&self, other: &Self). For me the implementation looks equally fast if it is not faster.

Wow, sorry, I have somehow missed them. I looked in hyper and they allocate an UniCase string there. Just tried to be faster :) Maybe there is also some .trim() function for bytes that I'm missing?

In is_chunked function you check for newlines. If I understand httparse correctly it does check for \n and \r in header values and blocks them.

Hm, I used to think that line folding is implemented in httparse, but it's optional https://tools.ietf.org/html/rfc7230#section-3.2.4:

A server that receives an obs-fold in a request message that is not
within a message/http container MUST either reject the message by
sending a 400 (Bad Request), preferably with a representation
explaining that obsolete line folding is unacceptable, or replace
each received obs-fold with one or more SP octets prior to
interpreting the field value or forwarding the message downstream.

Maybe I can remove it.

HTTP/2.0 allows the user-agent to send a scheme different from HTTP and HTTPS. So probably the Head struct should use a &str here. Exposing the headers as an array is good but probably most uses will still need to convert it to a HashMap.

Sounds reasonable.

Some consumers of rotor will only need client or server but not both, so they should be probably separate crates and a shared crate (cargo makes it possible to host them within the same repo). On the other hand rotor-http is much smaller than eg. hyper so this is less of an issue. (hyper is also not split)

Well, the library is small enough. Currently, there are no additional dependencies in either of them. Also, rust removes unused code in resulting binary, so should not be a problem. Also, about one-third of the size of the codebase is currently used in both client and server, and this part doesn't have good public interface.

However, these would probably be external libraries:

  • request routing for server
  • simplified interface for handling only buffered requests (like most apps will do anyway)
  • DNS support for client
  • connection pooling for client

@pyfisch
Copy link
Contributor

pyfisch commented Feb 10, 2016

Linefolding is indeed implemented in httparse. I just mean you do not need to check for \n and \r in rotor-http.

There is no .trim() for bytes.

@tailhook
Copy link
Owner Author

Linefolding is indeed implemented in httparse. I just mean you do not need to check for \n and \r in rotor-http.

Well, then I'm not sure what you mean, I believe that the header:

Transfer-encoding: whatever,
  chunked

.. will be received as bytes b"whatever,\r\n chunked". right?

@pyfisch
Copy link
Contributor

pyfisch commented Feb 10, 2016

I tried it out with the request GET /index.html HTTP/1.1\r\nHost: foobar.com\r\nTransfer-Encoding: whatever,\r\n chunked\r\n\r\n and it did not work because of the folded-line. It turns out that httparse does not support obs-fold at all. For the given request there is an error message returned.

@tailhook
Copy link
Owner Author

It turns out that httparse does not support obs-fold at all.

Yes. That's my conclusion too. I'm not sure this is a deliberate decision or just omission. I probably should test whether headers are work in the wild (with todays clients and servers), just to make sure

And another missed comment:

Exposing the headers as an array is good but probably most uses will still need to convert it to a HashMap.

This needs a CaseInsensitiveMultiDict, which is quite complex and ugly data structure. Probably I would keep it for higher level of abstration. In fact I see three cases:

  1. Propagation of headers into some kind of backend protocol where all headers are just re-encoded and never actually looked up (say we want a fastcgi so just encode everything as HTTP_HEADER_NAME)
  2. Use in application where many headers would be just put into fixed fields in some Request structure rather than used as strings
  3. Headers used in routing, like deciding which backend to use or which backend protocol to choose.

Only the case (3) requires a lookup. And it usually requires only handful of headers (probably only Host). So it may be faster to just scan them once again similarly to what is done to determine request length. There are only ~10 headers in each request and they should be in CPU cache in the headers_received handler. So I don't think it may be a bottleneck.

This was referenced Feb 13, 2016
@pyfisch
Copy link
Contributor

pyfisch commented Mar 8, 2016

This can be closed, I think.

@tailhook tailhook closed this as completed Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants