Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When should a server add chunked encoding? #2032

Open
MSP-Greg opened this issue Jan 29, 2023 · 5 comments
Open

When should a server add chunked encoding? #2032

MSP-Greg opened this issue Jan 29, 2023 · 5 comments

Comments

@MSP-Greg
Copy link
Contributor

MSP-Greg commented Jan 29, 2023

I assume I'm missing something, but...

See https://msp-greg.github.io/rails_master/ActionController/Streaming.html, which currently uses Rack::Chunked, and sets Transfer-Encoding: chunked.

When Rails upgrades to Rack 3, Rack::Chunked will be removed, and it's unclear what happens then.

So, how is a server to decide whether it needs to apply chunked info, or whether the app is doing so? Obviously, by querying the body's properties, one may be able to guess, but that may not be correct...

I couldn't find the terms 'transfer-encoding' or 'chunked' anywhere in the spec.

A lot of things are mentioned in RFC 9112 and RFC 9110 regarding when and if particular headers are either required or not allowed.

Given that the app is passing most important headers to the server, who is responsible for their accuracy? One example from RFC 9112 6.1:

A server MUST NOT send a Transfer-Encoding header field in any response with a status code of 1xx (Informational) or 204 (No Content). A server MUST NOT send a Transfer-Encoding header field in any 2xx (Successful) response to a CONNECT request (Section 9.3.6 of [HTTP]).

@ioquatix
Copy link
Member

A server has two options when content-length isn't present (or unknown) on an HTTP/1.x connection:

  1. (HTTP/0.9+)connection: close + data + close.
  2. (HTTP/1.1+) transfer-encoding: chunked + chunked data + zero length chunk.

The advantage of 2 is that the connection is not closed afterwards, and the connection has more integrity as the final zero length chunk communicates the end of the stream, vs closing the connection which could happen if the stream drops out, in other words you don't know if the data has completed successfully or the connection just dropped out.

With respect to streaming, there is no other advantage or disadvantage.

Rack applications should never generate a transfer-encoding header IMHO. Maybe we should add that to the spec (i.e. detect if it is added and fail).

Rack::Chunked is fundemantally broken w.r.t HTTP/2+ as the format is entirely different and happens at the protocol level using data frames. It is impossible for an application to generate these data frames. Chunked encoding should be considered a protocol level concern and should not be something an application (or middleware) is doing. From the point of view of Rack, each{|chunk| send(chunk)} should be considered the level of granularity i.e. each string yielded is a chunk for the purpose of chunked encoding or data frames.

A server should always prefer chunked encoding because (1) it is more robust and (2) it allows connection reuse. Even when using sendfile/splice, you can send a single chunk or a few large chunks and get the similar advantage.

This is different from connection: upgrade where there is no entity body and the upgraded connection no longer has HTTP semantics. transfer-encoding: chunked also does not prevent Rack 3 streaming bodies - one can easily wrap the underlying protocol to write chunked data (or read chunked data) as appropriate - this is what async-http already does for example.

def call(env)
  [200, {}, {|stream| ...}]
end

Depending on the situation, stream.read could be decoding chunks and stream.write can be encoding chunks. If the connection was HTTP/1.0, stream.close may close the underlying connection, while in HTTP/1.1, stream.close may simply send a zero length chunk and return the connection to the server request/response loop for further processing of incoming requests.

Coming back to your original question - it would be entirely plausible for most servers to end up double encoding the chunked response if there is application-side chunked encoding. Chunked encoding has nothing to do with streaming, streaming and chunked encoding are entirely orthogonal. The correct way to achieve this is to use Rack 3 streaming bodies or in Rack 2, partial hijack has very similar semantics, while full hijack is completely broken w.r.t. HTTP compatibility - how would it work with HTTP/2+? You'd have to implement a full HTTP/1 <-> HTTP/2 proxy.

Hope that helps.

@MSP-Greg
Copy link
Contributor Author

Rack applications should never generate a transfer-encoding header IMHO. Maybe we should add that to the spec (i.e. detect if it is added and fail).

Yes, but from a practical standpoint, Rails, with at least versions 4 thru current 7, is doing so in ActionController::Streaming. Also, any app using Rack earlier than 3.1 may be using Rack::Chunked. Both set a transfer-encoding header.

I guess things are somewhat dependent on what Rails does with ActionController::Streaming when it drops Rack::Chunked.

If it removes both 'Transfer-Encoding' and 'Content-Length' and the body responds to each, Puma will consider it an enumerable body, and chunk it...

You've mentioned hijacking (callable and connection: upgrade), and the spec states "In both cases, the application is responsible for closing the hijacked stream."

I have seen code that seems to use a partial hijack for writing an http compliant body. If the app is closing the stream (which would often be the actual socket), doesn't this force an HTTP/1.1 connection to close when it could remain open?

@ioquatix
Copy link
Member

Yes, but from a practical standpoint, Rails, with at least versions 4 thru current 7, is doing so in ActionController::Streaming. Also, any app using Rack earlier than 3.1 may be using Rack::Chunked. Both set a transfer-encoding header.

Practically speaking none of this matters if it's never worked correctly.

I guess things are somewhat dependent on what Rails does with ActionController::Streaming when it drops Rack::Chunked.

Given the assumption that no server ever supported it correctly except perhaps unicorn, I would say that it doesn't matter, since it never really worked correctly in the first place.

If it removes both 'Transfer-Encoding' and 'Content-Length' and the body responds to each, Puma will consider it an enumerable body, and chunk it...

Sure, but this won't work with Rack::ETag and other middleware, which is why you MUST use a streaming body if you expect "streaming".

doesn't this force an HTTP/1.1 connection to close when it could remain open?

Yes, rack.hijack is a terrible interface. Partial hijack in theory can remain open. Calling stream.close during a partial hijack could be interpreted as "The current response is finished" not "Close the current connection". IIRC, that's how it's implemented, depending on HTTP/1 or HTTP/2 in async-http.

@MSP-Greg
Copy link
Contributor Author

I haven't mentioned streaming, why Rails uses the term, not sure. I would call it chunked, in that (as they describe it) determining 'Content-length' is difficult. So, chunked. When I hear the term 'streaming', I think of a flow of data/content that will take a measurable amount of time, as in seconds...

Yes, rack.hijack is a terrible interface.

Maybe I'd say 'interesting to work with'. I assume it's been around a while, and a lot has changed.

@ioquatix
Copy link
Member

I would call it chunked, in that (as they describe it) determining 'Content-length' is difficult.

As I said earlier, you can use connection: close for this just fine, it's slightly less efficient than chunked encoding. It makes not difference semantically speaking, at the application layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants