Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto-detection of HTTP1 vs HTTP2 for inbound http:// connections #128

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zarqman
Copy link
Contributor

@zarqman zarqman commented Aug 24, 2023

This PR adds Async::HTTP::Protocol::HTTP which auto-detects HTTP1 vs HTTP2 for inbound (server) connections. It is comparable to Async::HTTP::Protocol::HTTPS, just for http:// instead.

It uses HTTP/2 Connection Preface detection, outlined in the HTTP/2 RFC.

This is only for server connections. Protocol::HTTP's client always uses HTTP1 as there is no present facility in the RFC for auto-detection for clients using http://. (There was some language around Upgrade: h2c in earlier RFC versions, but it has since been removed.)

Note on HTTP/1.0

HTTP/1.0 requests have the potential to be too short to be properly detected. This is expected behavior, and is treated no differently than any other incomplete inbound HTTP/1.x request.

The risk is limited to HTTP/1.0 requests without any headers, including no Host header, such as:

GET / HTTP/1.0

It's 6 bytes short, and it's only because the Host header is optional in HTTP/1.0. If Host, any other header (ie: User-Agent), a longer path, or a query parameter is present, then it's good to go.

The only place this might show up is using an ultra-basic monitoring script for health checks. Even most of those will add Host and/or User-Agent, but if not, adding a dummy header or query param will solve it (or using https://).

Any Server needing assured HTTP/1.0 request support should continue to use Protocol::HTTP1 and not switch to this new Protocol::HTTP.

Outstanding considerations

  1. Auto-detection uses peek() and requires converting the IO Socket into an Async::IO::Stream one layer earlier. This PR presently modifies the other Protocol::HTTPxx modules to handle this. It's a rather limited change.

    However, I could see changing IO::Stream.new(...) in each of these to a new IO::Stream.wrap(...) method and then moving the is_a? detection into there.

    Let me know if you'd prefer that and I'll change it here and submit a parallel PR on async-io for that.

  2. I'm unsure whether it's appropriate to change the default value in Async::HTTP::Endpoint#protocol from HTTP1 to HTTP. I have not included that change at this time, but let me know if you'd like it added.

As always, questions and feedback welcomed.

Types of Changes

  • New feature.

Contribution

@zarqman zarqman force-pushed the add-http-version-detection branch 2 times, most recently from 0a8bc3c to 2f51d59 Compare September 5, 2023 17:56
@ioquatix
Copy link
Member

Apologies, but I wanted to update the test suite to sus before considering any further PRs. Now that's done, do you mind rebasing?

Also, I have a question, what is the use case? Since we can't use HTTP/2 plain text in general, ALPN is a requirement which is sufficient.

Regarding the special token, isn't it sufficient just to detect PRI * HTTP/2.0? Surely this could never be a valid HTTP/1.x request after that point.

@zarqman
Copy link
Contributor Author

zarqman commented Sep 11, 2023

Rebased.

h2 over http:// is usable, it just requires prior knowledge by the client. While a browser isn't going to use h2 over http://, internal (or otherwise controlled) infrastructure might. For example, a public-facing load balancer performing TLS offload might use http:// to contact the next layer running async-http (could be falcon or otherwise). Using h2 gains all the h2 benefits like reduced connection counts and better stream isolation. At the same time, it's not unusual for healthcheck monitoring to support only HTTP/1.1 (or even HTTP/1.0). This allows both h1 and h2 to work in parallel.

Another use case is testing (manual or automated) where configuring https is unwanted. Instead of spinning up h1 and h2 separately, both are viable on the same port. I've personally found it more efficient to just add/remove a single flag to curl (--http2-prior-knowledge) to switch back and forth from h1 to h2 rather than have to configure and toggle between separate ports or use TLS.

As for the preface token, great question! The preface I've used comes exactly from the RFC which is quite specific that this is the string to use. I haven't researched the background for why they made it longer than the more simplified version you mention, but I'll assume it was done with reason.

@ioquatix
Copy link
Member

I went to check this for myself.

https://datatracker.ietf.org/doc/html/rfc9113#section-3.4

I found at least one implementation that, for HTTP/1 only checks 'PRI ' as the indicator for the upgrade. Maybe if you have time, can you check some other implementations?

The actual connection preface must be specified during HTTP/2 connection negotiation too: https://github.com/socketry/protocol-http2/blob/d2f68154755dc678360763c9f3c178ad5f304b56/lib/protocol/http2/framer.rb#L36

I think if it's convenient, it's okay to peek PRI * HTTP/2.0\r\n and that's a strong indicator of the upgrade. We can later validate the full string. There is no world in which the above string is a valid HTTP/1 request, in practice.

@zarqman
Copy link
Contributor Author

zarqman commented Sep 12, 2023

Nginx and Apache each appear to look for the full string.

Traefik and Caddy both use go's net/http, which seems to parse the initial request line as if it were HTTP/1. Then, as part of early h1 request handling, it looks for 'PRI', '*', and 'HTTP/2.0' in their respective fields. If found, only then does it hand off to the h2 handler.

Thinking about it more, the recommended preface string was possibly selected because it's not actually a compliant HTTP/1.x request in that it has the appearance of having a body, but would be missing Content-Length.

I have a thought on a potential short circuit for requests with too few bytes, assuming that's the issue that's giving you pause. Let me explore that.

@zarqman
Copy link
Contributor Author

zarqman commented Sep 12, 2023

Okay, that new idea worked out quite well. Now, if the peeked bytes cannot match, it stops looking immediately without waiting for enough bytes to satisfy the entire preface. This solves the HTTP/1.0 short request issue.

@ioquatix ioquatix self-assigned this Sep 12, 2023
@@ -27,7 +28,7 @@ def self.client(peer)
end

def self.server(peer)
stream = IO::Stream.new(peer, sync: true)
stream = peer.is_a?(IO::Stream) ? peer : IO::Stream.new(peer, sync: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think a little more about this.

Eventally, Async::IO is going away. I'm not sure if Async::IO::Stream remains or not - probably not. This was for compatibility with shims/Ruby < 3.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't add a new dependency on Async::IO::Stream, but merely allows for peer to have been turned into a Stream one call earlier in the stack, which is needed for the new h1 vs h2 detection to function.

If all of Stream is just a shim (or ends up that way in a newer Ruby), pulling it out later should be no problem. If Stream is adding additional functionality over standard Ruby, then there's a bit more to consider, but that feels beyond the scope of this PR.

Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly okay with this.

I think we can update the protocol to default to HTTP.

@zarqman
Copy link
Contributor Author

zarqman commented Oct 26, 2023

Updated protocol default and **options as requested. Hopefully good to go!

@ioquatix
Copy link
Member

I've rebased this PR, but I think the external tests are failing.

In addition, peek(n) now exists, so we can be more precise about the actual amount of data we want to check. Should we take advantage of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants