Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header order gets rearranged for duplicated header names #2572

Closed
bagder opened this issue Jun 8, 2021 · 6 comments · Fixed by #2798
Closed

Header order gets rearranged for duplicated header names #2572

bagder opened this issue Jun 8, 2021 · 6 comments · Fixed by #2798

Comments

@bagder
Copy link
Contributor

bagder commented Jun 8, 2021

I have a HTTP client test case that receives the following response:

HTTP/1.1 302 eat this!
Date: Tue, 09 Nov 2010 14:49:00 GMT
Server: test-server/fake
Location: this-is-the-first.html
Content-Length: 0
Connection: close
Location: and there's a second one too! / moo.html

The key take-way is that there are two Location: headers. My client downloads the response and then compares that it stored exactly what was sent. The order of the headers as delivered by hyper is however modified from the original order, which makes comparing really hard - and it makes it harder to understand what's going on over the wire when we can't show the same order.

Hyper delivers the headers in this order:

HTTP/1.1 302 eat this!
Date: Tue, 09 Nov 2010 14:49:00 GMT
Server: test-server/fake
Location: this-is-the-first.html
Location: and there's a second one too! / moo.html
Content-Length: 0
Connection: close
@seanmonstar
Copy link
Member

Yes it does, this is a known implementation detail in hyper. Headers are stored in a hashmap, with duplicate values being placed in into the same index, just next to the first value. When iterating, since it doesn't have a unique entry, the value is iterated just after the initial pair.

Spec-wise, this should be fine. The order of values for a specific name are always preserved in relation to each other (since for example, order of Accept: text/plain and then Accept: */* matters), but the order "shouldn't" matter in relation to other headers. I realize this is requesting to keep the behavior as close to similar as possible, in case there are some users who happen to depend on it anyways. But supporting it would require a bit more work in the hashmap to efficiently store the extra ordering information for iteration, and possible slow iteration in general even though most use cases don't need it.

it makes it harder to understand what's going on over the wire when we can't show the same order.

How does it make it harder to understand? The semantic meaning is the same. If someone wanted to always see the exact bytes from the socket, they could be printed from the read callback of a hyper_io. Or a user could just use TCP instead of HTTP, like telnet...

@bagder
Copy link
Contributor Author

bagder commented Jun 9, 2021

Spec-wise, this should be fine

Sure, it is fine HTTP wise, I'm not arguing about that. I do however have an application where I drop in hyper and want to make sure the replacement is as invisible as possible and this is another thing that makes it harder for me to do that. I'm saying that I want the API (be able) to deliver the headers in the same order as they come over the wire.

Imagine users dumping the headers from a server in an automatic fashion to do something with them. After an upgrade of the client, the headers suddenly appears in a new order without the server doing anything different. This will surprise (some) users. (The difference will also put some friction for the curl test suite since the outputs aren't identical anymore, but that's a more minor concern.)

When I speak of "harder to understand" I mean that curl is often used to debug HTTP and server issues. And in servers, there are often seveal different components or "layers" that can provide headers to the response. If there are three different components that does this: A, B and C then in a typical case they could output their headers in a serial fashion. First A's headers, then B's headers and last C's headers. Fire up curl against the server and see all the headers in the response arrive fine.

Enter curl+hyper that now have shuffled the order of the headers. Now curl no longer shows things in the same order they flew over the wire. The headers that came in an order that made sense from the server, and that you as a server admin understand and can map back to A, B and C you can no longer do that because suddenly C's (duplicate) header appears among A's headers. Confusing. And to an end user it also looks rather arbitrary.

@seanmonstar
Copy link
Member

seanmonstar commented Jun 9, 2021

So, this left me thinking about what curl actually wants from hyper, vs what hyper tries to provide for Rust. While it tries to implement transfer semantics of HTTP, it also provides in it's Rust API a conversion from the bytes to more structured data, such as a hashmap of headers. It doesn't seem like curl really wants to use that hashmap, as long as hyper manages internally things like content-length or transfer-encoding.

What if... the C API provided an option to ask for the original header data, as a single unaltered buffer? It'd be cheap in hyper, just a ref count bump on that buffer, and in curl you wouldn't need to re-stitch together the headers into lines, or have a discrepancy of whether it was \r\n or just \n, and the order would what was sent.

It could just be an option set on either hyper_request * or hyper_clientconn_options * (perhaps the conn since you only need to set it once per connection). And then you could fetch it from a hyper_response *:

struct hyper_buf *hyper_response_headers_raw(struct hyper_response *resp);

@bagder
Copy link
Contributor Author

bagder commented Jun 10, 2021

That sounds like pure goodness to me!

@seanmonstar
Copy link
Member

Alright, I opened #2575 to track add that (shouldn't be too hard). I should probably also point out that even when grabbing that buf, you don't need to handle content-length or transfer-encoding yourself, and if there are any other specific headers you want to grab and act on, you can still query for them specifically on the hyper_headers *, instead of needing to parse them yourself.

@bagder
Copy link
Contributor Author

bagder commented Nov 24, 2021

I'm having second thoughts on this route, since using this buffer would basically mean that I have to parse the headers on my own anyway which would go against one of the primary reasons for me to use hyper in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants