Header order gets rearranged for duplicated header names #2572

bagder · 2021-06-08T08:27:37Z

I have a HTTP client test case that receives the following response:

HTTP/1.1 302 eat this!
Date: Tue, 09 Nov 2010 14:49:00 GMT
Server: test-server/fake
Location: this-is-the-first.html
Content-Length: 0
Connection: close
Location: and there's a second one too! / moo.html

The key take-way is that there are two Location: headers. My client downloads the response and then compares that it stored exactly what was sent. The order of the headers as delivered by hyper is however modified from the original order, which makes comparing really hard - and it makes it harder to understand what's going on over the wire when we can't show the same order.

Hyper delivers the headers in this order:

HTTP/1.1 302 eat this!
Date: Tue, 09 Nov 2010 14:49:00 GMT
Server: test-server/fake
Location: this-is-the-first.html
Location: and there's a second one too! / moo.html
Content-Length: 0
Connection: close

The text was updated successfully, but these errors were encountered:

seanmonstar · 2021-06-08T21:13:53Z

Yes it does, this is a known implementation detail in hyper. Headers are stored in a hashmap, with duplicate values being placed in into the same index, just next to the first value. When iterating, since it doesn't have a unique entry, the value is iterated just after the initial pair.

Spec-wise, this should be fine. The order of values for a specific name are always preserved in relation to each other (since for example, order of Accept: text/plain and then Accept: */* matters), but the order "shouldn't" matter in relation to other headers. I realize this is requesting to keep the behavior as close to similar as possible, in case there are some users who happen to depend on it anyways. But supporting it would require a bit more work in the hashmap to efficiently store the extra ordering information for iteration, and possible slow iteration in general even though most use cases don't need it.

it makes it harder to understand what's going on over the wire when we can't show the same order.

How does it make it harder to understand? The semantic meaning is the same. If someone wanted to always see the exact bytes from the socket, they could be printed from the read callback of a hyper_io. Or a user could just use TCP instead of HTTP, like telnet...

bagder · 2021-06-09T08:47:01Z

Spec-wise, this should be fine

Sure, it is fine HTTP wise, I'm not arguing about that. I do however have an application where I drop in hyper and want to make sure the replacement is as invisible as possible and this is another thing that makes it harder for me to do that. I'm saying that I want the API (be able) to deliver the headers in the same order as they come over the wire.

Imagine users dumping the headers from a server in an automatic fashion to do something with them. After an upgrade of the client, the headers suddenly appears in a new order without the server doing anything different. This will surprise (some) users. (The difference will also put some friction for the curl test suite since the outputs aren't identical anymore, but that's a more minor concern.)

When I speak of "harder to understand" I mean that curl is often used to debug HTTP and server issues. And in servers, there are often seveal different components or "layers" that can provide headers to the response. If there are three different components that does this: A, B and C then in a typical case they could output their headers in a serial fashion. First A's headers, then B's headers and last C's headers. Fire up curl against the server and see all the headers in the response arrive fine.

Enter curl+hyper that now have shuffled the order of the headers. Now curl no longer shows things in the same order they flew over the wire. The headers that came in an order that made sense from the server, and that you as a server admin understand and can map back to A, B and C you can no longer do that because suddenly C's (duplicate) header appears among A's headers. Confusing. And to an end user it also looks rather arbitrary.

seanmonstar · 2021-06-09T21:08:03Z

So, this left me thinking about what curl actually wants from hyper, vs what hyper tries to provide for Rust. While it tries to implement transfer semantics of HTTP, it also provides in it's Rust API a conversion from the bytes to more structured data, such as a hashmap of headers. It doesn't seem like curl really wants to use that hashmap, as long as hyper manages internally things like content-length or transfer-encoding.

What if... the C API provided an option to ask for the original header data, as a single unaltered buffer? It'd be cheap in hyper, just a ref count bump on that buffer, and in curl you wouldn't need to re-stitch together the headers into lines, or have a discrepancy of whether it was \r\n or just \n, and the order would what was sent.

It could just be an option set on either hyper_request * or hyper_clientconn_options * (perhaps the conn since you only need to set it once per connection). And then you could fetch it from a hyper_response *:

struct hyper_buf *hyper_response_headers_raw(struct hyper_response *resp);

bagder · 2021-06-10T08:15:58Z

That sounds like pure goodness to me!

seanmonstar · 2021-06-10T21:32:02Z

Alright, I opened #2575 to track add that (shouldn't be too hard). I should probably also point out that even when grabbing that buf, you don't need to handle content-length or transfer-encoding yourself, and if there are any other specific headers you want to grab and act on, you can still query for them specifically on the hyper_headers *, instead of needing to parse them yourself.

bagder · 2021-11-24T08:45:38Z

I'm having second thoughts on this route, since using this buffer would basically mean that I have to parse the headers on my own anyway which would go against one of the primary reasons for me to use hyper in the first place.

seanmonstar closed this as completed Jun 10, 2021

seanmonstar mentioned this issue Mar 17, 2022

Hyper Curl test 580 and 580 Root cause #2780

Closed

seanmonstar linked a pull request Apr 28, 2022 that will close this issue

fix(ffi): Change hyper_headers_foreach to iterate in recieved order #2798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Header order gets rearranged for duplicated header names #2572

Header order gets rearranged for duplicated header names #2572

bagder commented Jun 8, 2021 •

edited

seanmonstar commented Jun 8, 2021

bagder commented Jun 9, 2021

seanmonstar commented Jun 9, 2021 •

edited

bagder commented Jun 10, 2021

seanmonstar commented Jun 10, 2021

bagder commented Nov 24, 2021

Header order gets rearranged for duplicated header names #2572

Header order gets rearranged for duplicated header names #2572

Comments

bagder commented Jun 8, 2021 • edited

seanmonstar commented Jun 8, 2021

bagder commented Jun 9, 2021

seanmonstar commented Jun 9, 2021 • edited

bagder commented Jun 10, 2021

seanmonstar commented Jun 10, 2021

bagder commented Nov 24, 2021

bagder commented Jun 8, 2021 •

edited

seanmonstar commented Jun 9, 2021 •

edited