Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sinatra streaming broken in Puma 6.0.0 #3000

Closed
dentarg opened this issue Oct 25, 2022 · 29 comments · Fixed by #3004
Closed

Sinatra streaming broken in Puma 6.0.0 #3000

dentarg opened this issue Oct 25, 2022 · 29 comments · Fixed by #3004
Labels

Comments

@dentarg
Copy link
Member

dentarg commented Oct 25, 2022

See sinatra/sinatra#1832 for more details but this simple app https://github.com/sinatra/sinatra/blob/v3.0.2/examples/stream.ru works fine with Puma 5.6.5 but with 6.0.0 it waits until all sleeps have finished before the response is returned (this is more noticeable if you increase the sleep time in the example).

As noted in sinatra/sinatra#1832 (comment) the example runs as expected in falcon (v0.42.3).

@dentarg
Copy link
Member Author

dentarg commented Oct 25, 2022

If I change IO_BUFFER_LEN_MAX to a low value like 1 the streaming works

puma/lib/puma/request.rb

Lines 24 to 25 in 8159aa4

# max size for io_buffer, force write when exceeded
IO_BUFFER_LEN_MAX = 1_024 * 1_024 * 4

Used here

puma/lib/puma/request.rb

Lines 303 to 325 in 03ed6c8

# for array bodies, flush io_buffer to socket when size is greater than
# IO_BUFFER_LEN_MAX
if chunked
body.each do |part|
next if (byte_size = part.bytesize).zero?
io_buffer.append byte_size.to_s(16), LINE_END, part, LINE_END
if io_buffer.length > IO_BUFFER_LEN_MAX
fast_write_str socket, io_buffer.to_s
io_buffer.reset
end
end
io_buffer.write CLOSE_CHUNKED
else
body.each do |part|
next if part.bytesize.zero?
io_buffer.write part
if io_buffer.length > IO_BUFFER_LEN_MAX
fast_write_str socket, io_buffer.to_s
io_buffer.reset
end
end
end
fast_write_str(socket, io_buffer.to_s) unless io_buffer.length.zero?

The buffering was introduced in #2896

@MSP-Greg
Copy link
Member

What about something like 64kB?

@dentarg
Copy link
Member Author

dentarg commented Oct 26, 2022

Sure, it would probably help some cases. What about making it configurable? Is that a good idea?

I suspect a more reliable solution for Sinatra is to switch to the new Streaming body #2740, that wouldn’t be buffered, or? Need to try this myself

@MSP-Greg
Copy link
Member

What about making it configurable? Is that a good idea?

Yes. Adding a PR with size_to_first_byte as an added DSL option. Not sure what the default should be. 32 kB for now.

@nateberkopec nateberkopec linked a pull request Oct 28, 2022 that will close this issue
7 tasks
@nateberkopec
Copy link
Member

I don't like the idea of closing a bug with a config setting. You shouldn't need to configure Puma to make chunked responses work correctly. Am I reading this wrong?

@dentarg
Copy link
Member Author

dentarg commented Oct 28, 2022

I don't think you are, and I agree.

Then the buffering should be opt-in? Or removed completely? Or is there another way?

@dentarg
Copy link
Member Author

dentarg commented Oct 28, 2022

I suspect a more reliable solution for Sinatra is to switch to the new Streaming body #2740, that wouldn’t be buffered, or? Need to try this myself

I tried this dentarg/testssl.web@5e6b14b and then streaming works in Puma 6 (this code does not work in Puma 5), not sure everyone should need to make changes like this though?

@MSP-Greg
Copy link
Member

You shouldn't need to configure Puma to make chunked responses work correctly. Am I reading this wrong?

Possibly. What is a chunked response? The Sinatra example is a somewhat contrived example of streaming, as it's sending 46 bytes in 1.5 seconds. Is that a realistic setup for streaming?

A long time ago, I believe I checked curl and wrk to make sure they would error on an incorrectly formatted chunked response. As I recall, both did show an error. Using the benchmark code in benchmarks/local and/or curl, you can send requests generating chunked responses and they all are correct.

I didn't start out to improve Puma, I started out to develop better metrics for response time. It became obvious that array/enum bodies were slowing down when the number of 'steps' increased, especially under load. Lots of Ruby code can essentially run at 'light speed', but writes to sockets do not.

So, the refactored code attempts to limit the number of writes. This may mean combining the headers and body if the body is small, performing one write as opposed to one for each, or accumulating the array/enum body, rather than writing each enumeration. As data from myself and others showed, this resulted in large improvements for response time.

But, if one is using an Enum body to mimic a streaming body, this can hinder the speed of delivery if the enum cannot be done at 'light speed'. This includes affecting TTFB.

Additionally, when discussions about Rack middleware were happening, the idea of computing the content length for a 'knowable' body was discussed and seemed like a good idea. Code below. Maybe this should be removed for Array/Enum bodies, and they should just be sent chunked? Added benefit would be that the full body would not need to be accumulated before sending.

puma/lib/puma/request.rb

Lines 141 to 154 in 1a3a46a

# below converts app_body into body, dependent on app_body's characteristics, and
# resp_info[:content_length] will be set if it can be determined
if !resp_info[:content_length] && !resp_info[:transfer_encoding] && status != 204
if app_body.respond_to?(:to_ary)
length = 0
if array_body = app_body.to_ary
body = array_body.map { |part| length += part.bytesize; part }
elsif app_body.is_a?(::File) && app_body.respond_to?(:size)
length = app_body.size
elsif app_body.respond_to?(:each)
body = []
app_body.each { |part| length += part.bytesize; body << part }
end
resp_info[:content_length] = length

At some point, optimizing this relates to the metrics of an app's Array/Enum bodies, as in how many members do they have and what is the bytesize range of the members.

@dentarg
Copy link
Member Author

dentarg commented Oct 29, 2022

@ioquatix Sorry to bother you, but would you like to weigh in on this issue?

@ioquatix
Copy link
Contributor

ioquatix commented Oct 29, 2022

As discussed, buffering is critical to the latency/throughput trade offs of an application.

I believe that when streaming, you should not be buffering - it becomes an application concern to decide the throughput and latency trade off.

I model this in Falcon using the concept of ready?. If a body is ready, reading from it won't block.

A user stream can have a small internal buffer. If there is a chunk in the buffer, then ready? would be true (because a chunk is immediately available). If the user stream is empty (the application has not generated a chunk), ready? is false (reading would block until a chunk is available).

For an Array and other enumerable bodies (respond to to_ary) we should always consider it to be ready?. This is one of the reasons for explicitly modelling enumerable body vs streaming body. If you recall, Rack:ETag was not able to distinguish between these two cases and ended up breaking streaming on a fairly regular basis.

At the output layer, you can take advantage of ready? to avoid latency issues. For example, if the response body is always ready, we don't need to flush and can let TCP continue the buffering: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L288. You can search for other instances of ready? in that code to see how the buffering strategy works.

The actual implementation of ready? is non-trivial for Rack 2 but fairly straight forward in Rack 3.

For enumerable body: https://github.com/socketry/protocol-rack/blob/73705f364ba009eb1bbabbad88de1e627cb37c09/lib/protocol/rack/body/enumerable.rb#L50-L53 (now that I look on it, I realise the first check for Array is redundant :p).

For streaming body the default is used (from inheritance): https://github.com/socketry/protocol-http/blob/5f9c3d4fcda5bf89e8e033d8e1ff210f0dc8342d/lib/protocol/http/body/readable.rb#L47-L52 which is false.

The implication of this, as outlined above, is that streaming responses will map output chunks to TCP packets, and if the user does a poor job of buffering the output, that's on them (according to the implementation/design). You could obviously introduce your own buffering for streaming responses but the trade off would be latency, and I think, all things considered, that is the wrong trade off.

If you want to improve the buffering in this case, I'd advise you to check TCP_NODELAY and TCP_CORK.

@ioquatix
Copy link
Contributor

Additionally, when discussions about Rack middleware were happening, the idea of computing the content length for a 'knowable' body was discussed and seemed like a good idea. Code below. Maybe this should be removed for Array/Enum bodies, and they should just be sent chunked? Added benefit would be that the full body would not need to be accumulated before sending.

My opinion is, you should ignore the content-length in the rack response headers. It's dangerous if incorrect (e.g. buggy code), it can cause security issues http://projects.webappsec.org/w/page/13246931/HTTP-Response-Splitting.

Falcon extracts the length and may use it in cases where it makes sense https://github.com/socketry/protocol-rack/blob/97ec51b0f015a7d2a6b70a283a4d6a34c08d5d0c/lib/protocol/rack/body.rb#L16-L20 but it's not used generally, i.e. it's only used for enumerable bodies: https://github.com/socketry/protocol-rack/blob/97ec51b0f015a7d2a6b70a283a4d6a34c08d5d0c/lib/protocol/rack/body/enumerable.rb#L20-L27.

When writing it out (protocol specific) we try to validate the response body length and the specified content-length is correct, and fail if not: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L283-L285.

It's less of an issue for HTTP/2 since the binary framing does not depend on the content-length header (which serves as more of a hint). That being said, there are some very odd proxy implementations which convert HTTP/1 -> HTTP/2 and completely fail at handling content-length headers. e.g. check https://portswigger.net/research/http2 specifically "H2.CL Desync on Netflix".

So... my advice is:

  • User/application provided content-length should not be trusted and can be a major security issue.
  • Generating content length makes total sense to me (e.g. when you know it's an Array, it's cheap).
  • You can decide the trade off - falcon accepts user provided content length but validates it at the protocol level, OR
  • You can never trust the user provided content-length and recompute it, or drop it completely (allowing you to use chunked encoding).

@MSP-Greg
Copy link
Member

@ioquatix Thanks for your responses here and elsewhere.

Generating content length makes total sense to me (e.g. when you know it's an Array, it's cheap).

That's one thing that's bothered me about the Rack spec -'The Enumerable Body must respond to each. It must only be called once'. If the body is an Array, why the limitation to only call each once? Or, call each to determine content-length, then call each to transmit, buffering as desired? Obviously, both calls would occur before calling close.

JFYI, re content-length, I'm only referring to app provided content-length for use in the response.

@ioquatix
Copy link
Contributor

ioquatix commented Oct 29, 2022

That's one thing that's bothered me about the Rack spec -'The Enumerable Body must respond to each. It must only be called once'. If the body is an Array, why the limitation to only call each once?

The assumption is that the enumerable body can be any kind of object (e.g. a file) and you can't assume it will not be stateful when calling each. However, if you determine that it's an Array instance, it would stand to reason that you could call each as many times as you want without side effects (assuming that the user has not sub-classed it and done something funky).

If you are concerned about this (and potentially other) edge cases, there would be nothing wrong with calling body = body.to_ary. But I think the Array case is so common, and the most easily optimised, that it's obvious. However, I'd be happy to accept a PR to clarify this in the Rack spec.

The assumption is that to_ary isn't a (slow) blocking operation. Obviously, for a file, it might have internally blocking reads... calling to_ary could potentially require a lot of memory... etc.

@MSP-Greg
Copy link
Member

Just to make sure all are aware, to_a and to_ary are similar, but different. Enumerable has a to_a method, Array has a to_ary method (returns self), and also a method to_a because Enumerable is in its instance chain.

for a file, it might have internally blocking reads

That, along with the 'text' based iteration was the reason for my PR in Rack...

@ioquatix
Copy link
Contributor

Thinking about it further, File#each in binary mode should probably just yield page sized chunks.

@MSP-Greg
Copy link
Member

Probably a good idea, but I wonder how many things might break...

Re this issue, I think it would be helpful for an app to define whether an enumerable body (not an Array) requires 'stream' based writing or whether the full enumeration should happen fast enough that buffering can be considered.

@ioquatix
Copy link
Contributor

I think the current answer is:

  • Enumerable bodies can be buffered but don't need to be.
  • Streaming bodies shouldn't be buffered.

@MSP-Greg
Copy link
Member

As in a streaming body responds to call?

@ioquatix
Copy link
Contributor

Yes, as per the description in the rack spec (enumerable bodies respond to each, otherwise it's a streaming body and must respond to call).

@MSP-Greg
Copy link
Member

Thanks. I thought as much. But, the issue is whether enumerable bodies that should be streaming bodies exist in the wild.

Or, the issue with the Sinatra file is that it's assuming enum bodies are not buffered, and hence, TTFB should be immediate. But since it's buffered, it is not...

@ioquatix
Copy link
Contributor

Or, the issue with the Sinatra file is that it's assuming enum bodies are not buffered, and hence, TTFB should be immediate. But since it's buffered, it is not...

Sinatra should not be using enumerable bodies for streaming, or it should have a body that responds to each but not to_ary. This indicates that you can't buffer it, basically, IMHO, because the presence of to_ary indicates there is some way that you can generate a buffered representation of it. That doesn't mean you don't buffer in the network or TCP layer - but if you choose to do that, you are making an explicit throughput/latency trade off and maybe in this case, the application doesn't expect that.

Thinking bigger picture, one thing to be aware of, is web browsers typically buffer 8KiB more or less before they start streaming and/or processing data - unless it's a real time WebSocket or Server Sent Events - browsers deal with that without buffering in my experience, so... what I'm saying is - if you do a little buffering on the network side and server side, in practice it might not be that bad, because clients are probably going to do at least some level of buffering too. In other words, even if you don't buffer anything on the server, the client may still choose to do that, and thus for Sinatra, the result might be the same in practice... unless they are doing WebSockets / SSE / full-duplex fetch / etc.

@dentarg
Copy link
Member Author

dentarg commented Oct 30, 2022

the issue is whether enumerable bodies that should be streaming bodies exist in the wild

That would be all of them? :) "streaming bodies" where just now introduced (#2740, rack/rack#1745). Sinatra has been around for 15 years.

Okay, so we have found the bug, using streaming in Sinatra shouldn't be buffered as there's no to_ary method at https://github.com/sinatra/sinatra/blob/v3.0.2/lib/sinatra/base.rb#L436-L491 nor in https://github.com/sinatra/sinatra/blob/v3.0.2/sinatra-contrib/lib/sinatra/streaming.rb.

Yes, I think the point of streaming in Sinatras was to enable SSE: https://github.com/sinatra/sinatra/tree/v3.0.2#streaming-responses

@MSP-Greg
Copy link
Member

MSP-Greg commented Oct 30, 2022

@ioquatix
Copy link
Contributor

ioquatix commented Oct 30, 2022

As you've clearly pointed out, there was no mechanism for indicating the buffering strategy that should be used before Rack 3, and thus in general, streaming with Rack was impossible without relying on optional or implementation specific behaviour. Even Rack::ETag couldn't do the right thing and I'd consider Rack 2's implementation broken with no possible fix w.r.t. streaming.

dentarg added a commit to dentarg/sinatra that referenced this issue Nov 8, 2022
Actually, test the master branch of Puma like we do with Rack.

Pin regular jobs to Puma 5.x due to the streaming bug: puma/puma#3000
@jarthod
Copy link

jarthod commented Nov 14, 2022

Hello, I just want to give a bit of feedback here. As I tried upgrading to puma 6 I noticed it broke streaming in my Rails app too. I'm using my render-later library which uses classic rails controller streaming (+a couple hacks to actually get it to work as it's mostly broken by default). It's easy to reproduce the problem as I wrote a spec already, so running render-later specs (rake) with puma 6 gives:

[render-later] (master) > rake
.......F
Failure:
RenderLater::EngineTest#test_streaming_works [/home/bigbourin/work/render-later/test/render-later/engine_test.rb:33]:
Not enough time between first and last chunk.
Expected 0.00048107299999999853 to be > 1.

Though it works fine with puma 5. But,


I already had a look at #3004 and tried with this branch (MSP-Greg:00-size-to-first-byte) and I can confirm that using this version (gem 'puma', github: 'MSP-Greg/puma', branch: '00-size-to-first-byte'), it seems to fix the problem for my use-case ✔️ :

[render-later] (master) > rake
..........
Finished in 2.366557s, 4.2255 runs/s, 8.0285 assertions/s.
10 runs, 19 assertions, 0 failures, 0 errors, 0 skips

So honestly I did not spend much time investigating the changes and how the MR fixes this, I can wait for 6.0.1
But as it seems my use-case was not puma 6.0.0 ready, if you need my opinion in some decisions do not hesitate (I can spend more time investigating if needed too).

Thank you all for your work 🙇

@MSP-Greg
Copy link
Member

@jarthod Thank you for testing PR #3004.

It's an update to #2896, which attempted to determine properties of the response body and adjust how the response body was assembled/transmitted. That came out of running benchmarks using various body type/size combinations and seeing where Puma's performance might be able to be improved.

#2896 was a bit too aggressive and didn't account for streamed bodies that were enumerations. #3004 has added more tests and CI jobs, so hopefully it's ready for production...

@jarthod
Copy link

jarthod commented Nov 15, 2022

Ok thanks for the summary! So I shall wait for this to me merged then.

And if I understand correctly, with Rack 3 the prefferred way to stream is now "streaming bodies" (callable) instead of enumerable, but as this was just implemented and requires Rack 3 I suppose this won't change before a while in Rails ?

@ioquatix
Copy link
Contributor

We will be working on updates to Rails pretty soon.

@jarthod
Copy link

jarthod commented Nov 15, 2022

@ioquatix Ok great, feel free to ping me to test the branch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants