-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sinatra streaming broken in Puma 6.0.0 #3000
Comments
If I change Lines 24 to 25 in 8159aa4
Used here Lines 303 to 325 in 03ed6c8
The buffering was introduced in #2896 |
What about something like 64kB? |
Sure, it would probably help some cases. What about making it configurable? Is that a good idea? I suspect a more reliable solution for Sinatra is to switch to the new Streaming body #2740, that wouldn’t be buffered, or? Need to try this myself |
Yes. Adding a PR with |
I don't like the idea of closing a bug with a config setting. You shouldn't need to configure Puma to make chunked responses work correctly. Am I reading this wrong? |
I don't think you are, and I agree. Then the buffering should be opt-in? Or removed completely? Or is there another way? |
I tried this dentarg/testssl.web@5e6b14b and then streaming works in Puma 6 (this code does not work in Puma 5), not sure everyone should need to make changes like this though? |
Possibly. What is a chunked response? The Sinatra example is a somewhat contrived example of streaming, as it's sending 46 bytes in 1.5 seconds. Is that a realistic setup for streaming? A long time ago, I believe I checked curl and wrk to make sure they would error on an incorrectly formatted chunked response. As I recall, both did show an error. Using the benchmark code in I didn't start out to improve Puma, I started out to develop better metrics for response time. It became obvious that array/enum bodies were slowing down when the number of 'steps' increased, especially under load. Lots of Ruby code can essentially run at 'light speed', but writes to sockets do not. So, the refactored code attempts to limit the number of writes. This may mean combining the headers and body if the body is small, performing one write as opposed to one for each, or accumulating the array/enum body, rather than writing each enumeration. As data from myself and others showed, this resulted in large improvements for response time. But, if one is using an Enum body to mimic a streaming body, this can hinder the speed of delivery if the enum cannot be done at 'light speed'. This includes affecting TTFB. Additionally, when discussions about Rack middleware were happening, the idea of computing the content length for a 'knowable' body was discussed and seemed like a good idea. Code below. Maybe this should be removed for Array/Enum bodies, and they should just be sent chunked? Added benefit would be that the full body would not need to be accumulated before sending. Lines 141 to 154 in 1a3a46a
At some point, optimizing this relates to the metrics of an app's Array/Enum bodies, as in how many members do they have and what is the bytesize range of the members. |
@ioquatix Sorry to bother you, but would you like to weigh in on this issue? |
As discussed, buffering is critical to the latency/throughput trade offs of an application. I believe that when streaming, you should not be buffering - it becomes an application concern to decide the throughput and latency trade off. I model this in Falcon using the concept of A user stream can have a small internal buffer. If there is a chunk in the buffer, then For an At the output layer, you can take advantage of The actual implementation of For enumerable body: https://github.com/socketry/protocol-rack/blob/73705f364ba009eb1bbabbad88de1e627cb37c09/lib/protocol/rack/body/enumerable.rb#L50-L53 (now that I look on it, I realise the first check for For streaming body the default is used (from inheritance): https://github.com/socketry/protocol-http/blob/5f9c3d4fcda5bf89e8e033d8e1ff210f0dc8342d/lib/protocol/http/body/readable.rb#L47-L52 which is The implication of this, as outlined above, is that streaming responses will map output chunks to TCP packets, and if the user does a poor job of buffering the output, that's on them (according to the implementation/design). You could obviously introduce your own buffering for streaming responses but the trade off would be latency, and I think, all things considered, that is the wrong trade off. If you want to improve the buffering in this case, I'd advise you to check |
My opinion is, you should ignore the content-length in the rack response headers. It's dangerous if incorrect (e.g. buggy code), it can cause security issues http://projects.webappsec.org/w/page/13246931/HTTP-Response-Splitting. Falcon extracts the length and may use it in cases where it makes sense https://github.com/socketry/protocol-rack/blob/97ec51b0f015a7d2a6b70a283a4d6a34c08d5d0c/lib/protocol/rack/body.rb#L16-L20 but it's not used generally, i.e. it's only used for enumerable bodies: https://github.com/socketry/protocol-rack/blob/97ec51b0f015a7d2a6b70a283a4d6a34c08d5d0c/lib/protocol/rack/body/enumerable.rb#L20-L27. When writing it out (protocol specific) we try to validate the response body length and the specified content-length is correct, and fail if not: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L283-L285. It's less of an issue for HTTP/2 since the binary framing does not depend on the So... my advice is:
|
@ioquatix Thanks for your responses here and elsewhere.
That's one thing that's bothered me about the Rack spec -'The Enumerable Body must respond to each. It must only be called once'. If the body is an Array, why the limitation to only call JFYI, re content-length, I'm only referring to app provided content-length for use in the response. |
The assumption is that the enumerable body can be any kind of object (e.g. a file) and you can't assume it will not be stateful when calling each. However, if you determine that it's an If you are concerned about this (and potentially other) edge cases, there would be nothing wrong with calling The assumption is that |
Just to make sure all are aware,
That, along with the 'text' based iteration was the reason for my PR in Rack... |
Thinking about it further, |
Probably a good idea, but I wonder how many things might break... Re this issue, I think it would be helpful for an app to define whether an enumerable body (not an Array) requires 'stream' based writing or whether the full enumeration should happen fast enough that buffering can be considered. |
I think the current answer is:
|
As in a streaming body responds to |
Yes, as per the description in the rack spec (enumerable bodies respond to each, otherwise it's a streaming body and must respond to call). |
Thanks. I thought as much. But, the issue is whether enumerable bodies that should be streaming bodies exist in the wild. Or, the issue with the Sinatra file is that it's assuming enum bodies are not buffered, and hence, TTFB should be immediate. But since it's buffered, it is not... |
Sinatra should not be using enumerable bodies for streaming, or it should have a body that responds to Thinking bigger picture, one thing to be aware of, is web browsers typically buffer 8KiB more or less before they start streaming and/or processing data - unless it's a real time WebSocket or Server Sent Events - browsers deal with that without buffering in my experience, so... what I'm saying is - if you do a little buffering on the network side and server side, in practice it might not be that bad, because clients are probably going to do at least some level of buffering too. In other words, even if you don't buffer anything on the server, the client may still choose to do that, and thus for Sinatra, the result might be the same in practice... unless they are doing WebSockets / SSE / full-duplex fetch / etc. |
That would be all of them? :) "streaming bodies" where just now introduced (#2740, rack/rack#1745). Sinatra has been around for 15 years. Okay, so we have found the bug, using streaming in Sinatra shouldn't be buffered as there's no Yes, I think the point of streaming in Sinatras was to enable SSE: https://github.com/sinatra/sinatra/tree/v3.0.2#streaming-responses |
EDIT: working on a fix |
As you've clearly pointed out, there was no mechanism for indicating the buffering strategy that should be used before Rack 3, and thus in general, streaming with Rack was impossible without relying on optional or implementation specific behaviour. Even |
Actually, test the master branch of Puma like we do with Rack. Pin regular jobs to Puma 5.x due to the streaming bug: puma/puma#3000
Hello, I just want to give a bit of feedback here. As I tried upgrading to puma 6 I noticed it broke streaming in my Rails app too. I'm using my render-later library which uses classic rails controller streaming (+a couple hacks to actually get it to work as it's mostly broken by default). It's easy to reproduce the problem as I wrote a spec already, so running
Though it works fine with puma 5. But, I already had a look at #3004 and tried with this branch (
So honestly I did not spend much time investigating the changes and how the MR fixes this, I can wait for Thank you all for your work 🙇 |
@jarthod Thank you for testing PR #3004. It's an update to #2896, which attempted to determine properties of the response body and adjust how the response body was assembled/transmitted. That came out of running benchmarks using various body type/size combinations and seeing where Puma's performance might be able to be improved. #2896 was a bit too aggressive and didn't account for streamed bodies that were enumerations. #3004 has added more tests and CI jobs, so hopefully it's ready for production... |
Ok thanks for the summary! So I shall wait for this to me merged then. And if I understand correctly, with Rack 3 the prefferred way to stream is now "streaming bodies" ( |
We will be working on updates to Rails pretty soon. |
@ioquatix Ok great, feel free to ping me to test the branch :) |
See sinatra/sinatra#1832 for more details but this simple app https://github.com/sinatra/sinatra/blob/v3.0.2/examples/stream.ru works fine with Puma 5.6.5 but with 6.0.0 it waits until all
sleep
s have finished before the response is returned (this is more noticeable if you increase the sleep time in the example).As noted in sinatra/sinatra#1832 (comment) the example runs as expected in falcon (v0.42.3).
The text was updated successfully, but these errors were encountered: