Sinatra streaming broken in Puma 6.0.0 #3000

dentarg · 2022-10-25T21:46:25Z

See sinatra/sinatra#1832 for more details but this simple app https://github.com/sinatra/sinatra/blob/v3.0.2/examples/stream.ru works fine with Puma 5.6.5 but with 6.0.0 it waits until all sleeps have finished before the response is returned (this is more noticeable if you increase the sleep time in the example).

As noted in sinatra/sinatra#1832 (comment) the example runs as expected in falcon (v0.42.3).

The text was updated successfully, but these errors were encountered:

dentarg · 2022-10-25T22:38:30Z

If I change IO_BUFFER_LEN_MAX to a low value like 1 the streaming works

puma/lib/puma/request.rb

Lines 24 to 25 in 8159aa4

    
           # max size for io_buffer, force write when exceeded 
        
           IO_BUFFER_LEN_MAX = 1_024 * 1_024 * 4

Used here

puma/lib/puma/request.rb

Lines 303 to 325 in 03ed6c8

    
           # for array bodies, flush io_buffer to socket when size is greater than 
        
           # IO_BUFFER_LEN_MAX 
        
           if chunked 
        
             body.each do |part| 
        
               next if (byte_size = part.bytesize).zero? 
        
               io_buffer.append byte_size.to_s(16), LINE_END, part, LINE_END 
        
               if io_buffer.length > IO_BUFFER_LEN_MAX 
        
                 fast_write_str socket, io_buffer.to_s 
        
                 io_buffer.reset 
        
               end 
        
             end 
        
             io_buffer.write CLOSE_CHUNKED 
        
           else 
        
             body.each do |part| 
        
               next if part.bytesize.zero? 
        
               io_buffer.write part 
        
               if io_buffer.length > IO_BUFFER_LEN_MAX 
        
                 fast_write_str socket, io_buffer.to_s 
        
                 io_buffer.reset 
        
               end 
        
             end 
        
           end 
        
           fast_write_str(socket, io_buffer.to_s) unless io_buffer.length.zero?

The buffering was introduced in #2896

MSP-Greg · 2022-10-25T22:53:09Z

What about something like 64kB?

dentarg · 2022-10-26T07:26:50Z

Sure, it would probably help some cases. What about making it configurable? Is that a good idea?

I suspect a more reliable solution for Sinatra is to switch to the new Streaming body #2740, that wouldn’t be buffered, or? Need to try this myself

MSP-Greg · 2022-10-26T14:52:48Z

What about making it configurable? Is that a good idea?

Yes. Adding a PR with size_to_first_byte as an added DSL option. Not sure what the default should be. 32 kB for now.

nateberkopec · 2022-10-28T05:13:09Z

I don't like the idea of closing a bug with a config setting. You shouldn't need to configure Puma to make chunked responses work correctly. Am I reading this wrong?

dentarg · 2022-10-28T05:52:59Z

I don't think you are, and I agree.

Then the buffering should be opt-in? Or removed completely? Or is there another way?

dentarg · 2022-10-28T06:53:38Z

I suspect a more reliable solution for Sinatra is to switch to the new Streaming body #2740, that wouldn’t be buffered, or? Need to try this myself

I tried this dentarg/testssl.web@5e6b14b and then streaming works in Puma 6 (this code does not work in Puma 5), not sure everyone should need to make changes like this though?

MSP-Greg · 2022-10-29T16:00:07Z

You shouldn't need to configure Puma to make chunked responses work correctly. Am I reading this wrong?

Possibly. What is a chunked response? The Sinatra example is a somewhat contrived example of streaming, as it's sending 46 bytes in 1.5 seconds. Is that a realistic setup for streaming?

A long time ago, I believe I checked curl and wrk to make sure they would error on an incorrectly formatted chunked response. As I recall, both did show an error. Using the benchmark code in benchmarks/local and/or curl, you can send requests generating chunked responses and they all are correct.

I didn't start out to improve Puma, I started out to develop better metrics for response time. It became obvious that array/enum bodies were slowing down when the number of 'steps' increased, especially under load. Lots of Ruby code can essentially run at 'light speed', but writes to sockets do not.

So, the refactored code attempts to limit the number of writes. This may mean combining the headers and body if the body is small, performing one write as opposed to one for each, or accumulating the array/enum body, rather than writing each enumeration. As data from myself and others showed, this resulted in large improvements for response time.

But, if one is using an Enum body to mimic a streaming body, this can hinder the speed of delivery if the enum cannot be done at 'light speed'. This includes affecting TTFB.

Additionally, when discussions about Rack middleware were happening, the idea of computing the content length for a 'knowable' body was discussed and seemed like a good idea. Code below. Maybe this should be removed for Array/Enum bodies, and they should just be sent chunked? Added benefit would be that the full body would not need to be accumulated before sending.

puma/lib/puma/request.rb

Lines 141 to 154 in 1a3a46a

    
           # below converts app_body into body, dependent on app_body's characteristics, and 
        
           # resp_info[:content_length] will be set if it can be determined 
        
           if !resp_info[:content_length] && !resp_info[:transfer_encoding] && status != 204 
        
             if app_body.respond_to?(:to_ary) 
        
               length = 0 
        
               if array_body = app_body.to_ary 
        
                 body = array_body.map { |part| length += part.bytesize; part } 
        
               elsif app_body.is_a?(::File) && app_body.respond_to?(:size) 
        
                 length = app_body.size 
        
               elsif app_body.respond_to?(:each) 
        
                 body = [] 
        
                 app_body.each { |part| length += part.bytesize; body << part } 
        
               end 
        
               resp_info[:content_length] = length

At some point, optimizing this relates to the metrics of an app's Array/Enum bodies, as in how many members do they have and what is the bytesize range of the members.

dentarg · 2022-10-29T21:56:11Z

@ioquatix Sorry to bother you, but would you like to weigh in on this issue?

ioquatix · 2022-10-29T22:23:28Z

As discussed, buffering is critical to the latency/throughput trade offs of an application.

I believe that when streaming, you should not be buffering - it becomes an application concern to decide the throughput and latency trade off.

I model this in Falcon using the concept of ready?. If a body is ready, reading from it won't block.

A user stream can have a small internal buffer. If there is a chunk in the buffer, then ready? would be true (because a chunk is immediately available). If the user stream is empty (the application has not generated a chunk), ready? is false (reading would block until a chunk is available).

For an Array and other enumerable bodies (respond to to_ary) we should always consider it to be ready?. This is one of the reasons for explicitly modelling enumerable body vs streaming body. If you recall, Rack:ETag was not able to distinguish between these two cases and ended up breaking streaming on a fairly regular basis.

At the output layer, you can take advantage of ready? to avoid latency issues. For example, if the response body is always ready, we don't need to flush and can let TCP continue the buffering: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L288. You can search for other instances of ready? in that code to see how the buffering strategy works.

The actual implementation of ready? is non-trivial for Rack 2 but fairly straight forward in Rack 3.

For enumerable body: https://github.com/socketry/protocol-rack/blob/73705f364ba009eb1bbabbad88de1e627cb37c09/lib/protocol/rack/body/enumerable.rb#L50-L53 (now that I look on it, I realise the first check for Array is redundant :p).

For streaming body the default is used (from inheritance): https://github.com/socketry/protocol-http/blob/5f9c3d4fcda5bf89e8e033d8e1ff210f0dc8342d/lib/protocol/http/body/readable.rb#L47-L52 which is false.

The implication of this, as outlined above, is that streaming responses will map output chunks to TCP packets, and if the user does a poor job of buffering the output, that's on them (according to the implementation/design). You could obviously introduce your own buffering for streaming responses but the trade off would be latency, and I think, all things considered, that is the wrong trade off.

If you want to improve the buffering in this case, I'd advise you to check TCP_NODELAY and TCP_CORK.

ioquatix · 2022-10-29T22:37:06Z

Additionally, when discussions about Rack middleware were happening, the idea of computing the content length for a 'knowable' body was discussed and seemed like a good idea. Code below. Maybe this should be removed for Array/Enum bodies, and they should just be sent chunked? Added benefit would be that the full body would not need to be accumulated before sending.

My opinion is, you should ignore the content-length in the rack response headers. It's dangerous if incorrect (e.g. buggy code), it can cause security issues http://projects.webappsec.org/w/page/13246931/HTTP-Response-Splitting.

Falcon extracts the length and may use it in cases where it makes sense https://github.com/socketry/protocol-rack/blob/97ec51b0f015a7d2a6b70a283a4d6a34c08d5d0c/lib/protocol/rack/body.rb#L16-L20 but it's not used generally, i.e. it's only used for enumerable bodies: https://github.com/socketry/protocol-rack/blob/97ec51b0f015a7d2a6b70a283a4d6a34c08d5d0c/lib/protocol/rack/body/enumerable.rb#L20-L27.

When writing it out (protocol specific) we try to validate the response body length and the specified content-length is correct, and fail if not: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L283-L285.

It's less of an issue for HTTP/2 since the binary framing does not depend on the content-length header (which serves as more of a hint). That being said, there are some very odd proxy implementations which convert HTTP/1 -> HTTP/2 and completely fail at handling content-length headers. e.g. check https://portswigger.net/research/http2 specifically "H2.CL Desync on Netflix".

So... my advice is:

User/application provided content-length should not be trusted and can be a major security issue.
Generating content length makes total sense to me (e.g. when you know it's an Array, it's cheap).
You can decide the trade off - falcon accepts user provided content length but validates it at the protocol level, OR
You can never trust the user provided content-length and recompute it, or drop it completely (allowing you to use chunked encoding).

MSP-Greg · 2022-10-29T22:57:00Z

@ioquatix Thanks for your responses here and elsewhere.

Generating content length makes total sense to me (e.g. when you know it's an Array, it's cheap).

That's one thing that's bothered me about the Rack spec -'The Enumerable Body must respond to each. It must only be called once'. If the body is an Array, why the limitation to only call each once? Or, call each to determine content-length, then call each to transmit, buffering as desired? Obviously, both calls would occur before calling close.

JFYI, re content-length, I'm only referring to app provided content-length for use in the response.

ioquatix · 2022-10-29T23:06:23Z

That's one thing that's bothered me about the Rack spec -'The Enumerable Body must respond to each. It must only be called once'. If the body is an Array, why the limitation to only call each once?

The assumption is that the enumerable body can be any kind of object (e.g. a file) and you can't assume it will not be stateful when calling each. However, if you determine that it's an Array instance, it would stand to reason that you could call each as many times as you want without side effects (assuming that the user has not sub-classed it and done something funky).

If you are concerned about this (and potentially other) edge cases, there would be nothing wrong with calling body = body.to_ary. But I think the Array case is so common, and the most easily optimised, that it's obvious. However, I'd be happy to accept a PR to clarify this in the Rack spec.

The assumption is that to_ary isn't a (slow) blocking operation. Obviously, for a file, it might have internally blocking reads... calling to_ary could potentially require a lot of memory... etc.

MSP-Greg · 2022-10-29T23:35:15Z

Just to make sure all are aware, to_a and to_ary are similar, but different. Enumerable has a to_a method, Array has a to_ary method (returns self), and also a method to_a because Enumerable is in its instance chain.

for a file, it might have internally blocking reads

That, along with the 'text' based iteration was the reason for my PR in Rack...

ioquatix · 2022-10-29T23:43:27Z

Thinking about it further, File#each in binary mode should probably just yield page sized chunks.

MSP-Greg · 2022-10-30T00:07:29Z

Probably a good idea, but I wonder how many things might break...

Re this issue, I think it would be helpful for an app to define whether an enumerable body (not an Array) requires 'stream' based writing or whether the full enumeration should happen fast enough that buffering can be considered.

ioquatix · 2022-10-30T00:11:17Z

I think the current answer is:

Enumerable bodies can be buffered but don't need to be.
Streaming bodies shouldn't be buffered.

MSP-Greg · 2022-10-30T00:22:35Z

As in a streaming body responds to call?

ioquatix · 2022-10-30T00:30:35Z

Yes, as per the description in the rack spec (enumerable bodies respond to each, otherwise it's a streaming body and must respond to call).

MSP-Greg · 2022-10-30T01:26:25Z

Thanks. I thought as much. But, the issue is whether enumerable bodies that should be streaming bodies exist in the wild.

Or, the issue with the Sinatra file is that it's assuming enum bodies are not buffered, and hence, TTFB should be immediate. But since it's buffered, it is not...

ioquatix · 2022-10-30T04:57:40Z

Or, the issue with the Sinatra file is that it's assuming enum bodies are not buffered, and hence, TTFB should be immediate. But since it's buffered, it is not...

Sinatra should not be using enumerable bodies for streaming, or it should have a body that responds to each but not to_ary. This indicates that you can't buffer it, basically, IMHO, because the presence of to_ary indicates there is some way that you can generate a buffered representation of it. That doesn't mean you don't buffer in the network or TCP layer - but if you choose to do that, you are making an explicit throughput/latency trade off and maybe in this case, the application doesn't expect that.

Thinking bigger picture, one thing to be aware of, is web browsers typically buffer 8KiB more or less before they start streaming and/or processing data - unless it's a real time WebSocket or Server Sent Events - browsers deal with that without buffering in my experience, so... what I'm saying is - if you do a little buffering on the network side and server side, in practice it might not be that bad, because clients are probably going to do at least some level of buffering too. In other words, even if you don't buffer anything on the server, the client may still choose to do that, and thus for Sinatra, the result might be the same in practice... unless they are doing WebSockets / SSE / full-duplex fetch / etc.

dentarg · 2022-10-30T17:04:41Z

the issue is whether enumerable bodies that should be streaming bodies exist in the wild

That would be all of them? :) "streaming bodies" where just now introduced (#2740, rack/rack#1745). Sinatra has been around for 15 years.

Okay, so we have found the bug, using streaming in Sinatra shouldn't be buffered as there's no to_ary method at https://github.com/sinatra/sinatra/blob/v3.0.2/lib/sinatra/base.rb#L436-L491 nor in https://github.com/sinatra/sinatra/blob/v3.0.2/sinatra-contrib/lib/sinatra/streaming.rb.

Yes, I think the point of streaming in Sinatras was to enable SSE: https://github.com/sinatra/sinatra/tree/v3.0.2#streaming-responses

MSP-Greg · 2022-10-30T18:07:45Z

text/event-stream? See https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events-intro

EDIT: working on a fix

ioquatix · 2022-10-30T20:42:36Z

As you've clearly pointed out, there was no mechanism for indicating the buffering strategy that should be used before Rack 3, and thus in general, streaming with Rack was impossible without relying on optional or implementation specific behaviour. Even Rack::ETag couldn't do the right thing and I'd consider Rack 2's implementation broken with no possible fix w.r.t. streaming.

Actually, test the master branch of Puma like we do with Rack. Pin regular jobs to Puma 5.x due to the streaming bug: puma/puma#3000

jarthod · 2022-11-14T22:50:06Z

Hello, I just want to give a bit of feedback here. As I tried upgrading to puma 6 I noticed it broke streaming in my Rails app too. I'm using my render-later library which uses classic rails controller streaming (+a couple hacks to actually get it to work as it's mostly broken by default). It's easy to reproduce the problem as I wrote a spec already, so running render-later specs (rake) with puma 6 gives:

[render-later] (master) > rake
.......F
Failure:
RenderLater::EngineTest#test_streaming_works [/home/bigbourin/work/render-later/test/render-later/engine_test.rb:33]:
Not enough time between first and last chunk.
Expected 0.00048107299999999853 to be > 1.

Though it works fine with puma 5. But,

I already had a look at #3004 and tried with this branch (MSP-Greg:00-size-to-first-byte) and I can confirm that using this version (gem 'puma', github: 'MSP-Greg/puma', branch: '00-size-to-first-byte'), it seems to fix the problem for my use-case ✔️ :

[render-later] (master) > rake
..........
Finished in 2.366557s, 4.2255 runs/s, 8.0285 assertions/s.
10 runs, 19 assertions, 0 failures, 0 errors, 0 skips

So honestly I did not spend much time investigating the changes and how the MR fixes this, I can wait for 6.0.1 ☺
But as it seems my use-case was not puma 6.0.0 ready, if you need my opinion in some decisions do not hesitate (I can spend more time investigating if needed too).

Thank you all for your work 🙇

MSP-Greg · 2022-11-15T00:40:09Z

@jarthod Thank you for testing PR #3004.

It's an update to #2896, which attempted to determine properties of the response body and adjust how the response body was assembled/transmitted. That came out of running benchmarks using various body type/size combinations and seeing where Puma's performance might be able to be improved.

#2896 was a bit too aggressive and didn't account for streamed bodies that were enumerations. #3004 has added more tests and CI jobs, so hopefully it's ready for production...

jarthod · 2022-11-15T08:56:00Z

Ok thanks for the summary! So I shall wait for this to me merged then.

And if I understand correctly, with Rack 3 the prefferred way to stream is now "streaming bodies" (callable) instead of enumerable, but as this was just implemented and requires Rack 3 I suppose this won't change before a while in Rails ?

ioquatix · 2022-11-15T09:02:37Z

We will be working on updates to Rails pretty soon.

jarthod · 2022-11-15T09:22:17Z

@ioquatix Ok great, feel free to ping me to test the branch :)

see puma/puma#3000

dentarg mentioned this issue Oct 25, 2022

Something odd with Puma 6.0.0 / 6.0.1 #2998

Open

nateberkopec added the bug label Oct 26, 2022

nateberkopec mentioned this issue Oct 26, 2022

Correctly close app body for all code paths #3002

Merged

7 tasks

nateberkopec linked a pull request Oct 28, 2022 that will close this issue

Fixup response processing, enumerable bodies #3004

Merged

7 tasks

MSP-Greg mentioned this issue Oct 31, 2022

Fixup response processing, enumerable bodies #3004

Merged

7 tasks

dentarg added a commit to dentarg/sinatra that referenced this issue Nov 8, 2022

CI: Test with Puma 6 but allow it to fail

82b1d76

Actually, test the master branch of Puma like we do with Rack. Pin regular jobs to Puma 5.x due to the streaming bug: puma/puma#3000

dentarg mentioned this issue Nov 8, 2022

CI: Test with Puma 6 but allow it to fail sinatra/sinatra#1835

Merged

MSP-Greg closed this as completed in #3004 Nov 24, 2022

rdubigny added a commit to betagouv/datapass that referenced this issue Dec 1, 2022

fix: puma 6 break streaming

2e77bd0

see puma/puma#3000

jeffwidman mentioned this issue Dec 6, 2022

New release: 6.0.1? #3028

Closed

rdubigny added a commit to betagouv/datapass that referenced this issue Dec 21, 2022

chore: use latest puma version that fixes puma/puma#3000

a571a20

dentarg mentioned this issue Jan 10, 2023

Version 2.2.x breaks Rails Actioncontroller:Live streaming rack/rack#1619

Closed

dentarg mentioned this issue Feb 10, 2023

Sinatra::Streaming missing features sinatra/sinatra#1140

Closed

dentarg mentioned this issue Feb 22, 2023

Sinatra stream test fails with Puma master branch #3085

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sinatra streaming broken in Puma 6.0.0 #3000

Sinatra streaming broken in Puma 6.0.0 #3000

dentarg commented Oct 25, 2022

dentarg commented Oct 25, 2022

MSP-Greg commented Oct 25, 2022

dentarg commented Oct 26, 2022 •

edited

MSP-Greg commented Oct 26, 2022

nateberkopec commented Oct 28, 2022

dentarg commented Oct 28, 2022

dentarg commented Oct 28, 2022

MSP-Greg commented Oct 29, 2022

dentarg commented Oct 29, 2022

ioquatix commented Oct 29, 2022 •

edited

ioquatix commented Oct 29, 2022

MSP-Greg commented Oct 29, 2022

ioquatix commented Oct 29, 2022 •

edited

MSP-Greg commented Oct 29, 2022

ioquatix commented Oct 29, 2022

MSP-Greg commented Oct 30, 2022

ioquatix commented Oct 30, 2022

MSP-Greg commented Oct 30, 2022

ioquatix commented Oct 30, 2022

MSP-Greg commented Oct 30, 2022

ioquatix commented Oct 30, 2022

dentarg commented Oct 30, 2022

MSP-Greg commented Oct 30, 2022 •

edited

ioquatix commented Oct 30, 2022 •

edited

jarthod commented Nov 14, 2022

MSP-Greg commented Nov 15, 2022

jarthod commented Nov 15, 2022

ioquatix commented Nov 15, 2022

jarthod commented Nov 15, 2022

Sinatra streaming broken in Puma 6.0.0 #3000

Sinatra streaming broken in Puma 6.0.0 #3000

Comments

dentarg commented Oct 25, 2022

dentarg commented Oct 25, 2022

MSP-Greg commented Oct 25, 2022

dentarg commented Oct 26, 2022 • edited

MSP-Greg commented Oct 26, 2022

nateberkopec commented Oct 28, 2022

dentarg commented Oct 28, 2022

dentarg commented Oct 28, 2022

MSP-Greg commented Oct 29, 2022

dentarg commented Oct 29, 2022

ioquatix commented Oct 29, 2022 • edited

ioquatix commented Oct 29, 2022

MSP-Greg commented Oct 29, 2022

ioquatix commented Oct 29, 2022 • edited

MSP-Greg commented Oct 29, 2022

ioquatix commented Oct 29, 2022

MSP-Greg commented Oct 30, 2022

ioquatix commented Oct 30, 2022

MSP-Greg commented Oct 30, 2022

ioquatix commented Oct 30, 2022

MSP-Greg commented Oct 30, 2022

ioquatix commented Oct 30, 2022

dentarg commented Oct 30, 2022

MSP-Greg commented Oct 30, 2022 • edited

ioquatix commented Oct 30, 2022 • edited

jarthod commented Nov 14, 2022

MSP-Greg commented Nov 15, 2022

jarthod commented Nov 15, 2022

ioquatix commented Nov 15, 2022

jarthod commented Nov 15, 2022

dentarg commented Oct 26, 2022 •

edited

ioquatix commented Oct 29, 2022 •

edited

ioquatix commented Oct 29, 2022 •

edited

MSP-Greg commented Oct 30, 2022 •

edited

ioquatix commented Oct 30, 2022 •

edited