New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop disk buffering and support for rewindable inputs. #1148
Comments
Yes. I plan on revising the spec for Rack 3 (master should be 3). It will remove the requirement for a rewindable body. If you have time to do that work, then please start sending PRs (starting with updates to the SPEC file). Thanks! |
I recently ran into this limitation when implementing streaming requests/responses with With HTTP/2, one can stream a request/response indefinitely, and the requirement for supporting I'd suggest that the |
I would be happy to work on the SPEC file but do you want to have a discussion here first? What, if any, progress/decisions have already been made? |
Just as an addendum to the first point, chunks of data returned by |
I noticed that here rack/lib/rack/multipart/parser.rb Line 68 in 4b33af1
rack.input must also respond to eof? which is not documented AFAIK.
|
Here is what I implemented to handle reading from a chunked data source (e.g. https://github.com/socketry/falcon/blob/master/lib/falcon/input.rb It works reasonably well but the requirement for rewinding means that all data must be cached... |
After implementing a client and server over HTTP/1.0, HTTP/1.1 and HTTP/2.0, I'd like to suggest that having When writing the response, it's useful to terminate the stream after the headers are written in HTTP/2.0 and in order to do this you need to ask |
I'd propose that the response body supports both Just for completeness, but outside the scope of rack, for writable bodies, the following definition is sufficient: |
I've been studying more about the various options available. I found some interesting information about It looks like the forward looking API is centred around streams ( |
I wanted to add two more links for reference: 1/ Here is a proposal for WebSockets over HTTP/2 which may or may not see practical usage: https://github.com/mcmanus/draft-h2ws/blob/master/draft-mcmanus-httpbis-h2-websockets-03.mkd 2/ Here is an interesting discussion about the state of bi-directional streaming, specifically with clients implemented within the browser (e.g. using It will be interesting to see how these issues pan out. |
@tenderlove How can we move forward with this? |
I think we all agree on dropping the Is there any place where Rack needs to read the request body more than once? If it reads it the first time and parses it, I don't see why it would need to read it again. I also think it's fine that it continues to automatically parse multipart form data encoded request bodies (handled by The pain point that I'm personally interested in are web servers needing to implement rewindability. The existing
I opened a pull request a while ago to eliminate the |
|
Agreed.
Agreed. I also think it would make more sense for the request body to be parsed in a similar fashion to rack middleware. e.g. a stack of parsers which transform the input body into something more useful.
Agreed.
Yes, that's unfortunately underspecified. +1 on your PR to fix this issue. |
Something like: def call(env)
if body = env['rack.body'] and multipart?(env)
env['rack.body'] = Multipart.new(body)
end
return @app.call(env)
end The same could be done for POST requests, e.g. detect and parse Middleware would obviously have to be aware of the input body format some how. I've deliberately used |
I guess I should mention that if we went down the route of adding |
@ioquatix I can see where Rack calls
Yeah, an opt-in middleware sounds nice. |
Ha, still looking for that :D |
Actually I found one place where we use it (but I gate the functionality around I'm not sure I want to find out what would happen if someone was uploading a multi-GB file. |
I've been doing some more work in this area. Another problem that becomes apparent is the mismatch of high level response and low level transport. There are at least two examples of this: The way I've been thinking about it, there are transport level headers such as the above, which really should be the responsibility of the web server. Rack should be providing the basic infrastructure on which to handle requests and responses, but the actual protocol/transport level headers/behaviour should be specified by the underlying server/protocol. In this case, there is a bit of a mash up between the layers. |
I worked around this problem. Essentially, I use a memory buffer to cache the input, and provide support for rewind, but only if it matches a content-type which rack internally calls rewind on. https://github.com/socketry/falcon/blob/master/lib/falcon/adapters/rewindable.rb I know this is a hack but it was the simplest way to work around this issue which has been outstanding for almost a year. I still believe that removing the need for rewind is ultimately a good idea. Legacy apps that need it could use a middleware to implement the above logic. |
@jeremyevans are we in agreement to remove the requirement for input to be rewindable? |
I'm OK with removing the requirement if we ship a middleware that will take a non-rewindable input and make it rewindable by buffering it. That way users depending on the previous behavior can use the middleware for backwards compatibility. |
Can we move such middleware to |
I think we should ship such a middleware in rack itself. We have |
This will automatically wrap rack.input with Rack::RewindableInput, for compatibility with middleware and applications that expect rewindable input. Related to rack#1148, but this does not contain any SPEC changes. It's possible for servers targetting Rack 2 compatibility to use this middleware to implement the compatibility.
This will automatically wrap rack.input with Rack::RewindableInput, for compatibility with middleware and applications that expect rewindable input. Related to rack#1148, but this does not contain any SPEC changes. It's possible for servers targetting Rack 2 compatibility to use this middleware to implement the compatibility.
I've just been thinking about what streaming multi-part might look like: input = env['rack.input']
multipart = Multipart.new(input)
# Streaming
multipart.each do |part|
part.name
contents = part.read
end |
This will automatically wrap rack.input with Rack::RewindableInput, for compatibility with middleware and applications that expect rewindable input. Related to #1148, but this does not contain any SPEC changes. It's possible for servers targetting Rack 2 compatibility to use this middleware to implement the compatibility.
This is basically implemented by #1804, so I'll close this now. |
I'm curious about Rack's requirement for rewindable input, and because of this requirement, the
Rack::Multipart
class will buffer a read-once stream to disk presumably sorewind
is available. Modern cloud-based NAS throughput imposes limitation on how well this feature scales. Also, RAM is wildly cheap these days. Considering how the entire Rack ecosystem could depend on rewindable streams, I was wondering...Is it a good time for future versions of Rack 2.x to drop disk buffering as a default behavior, and move it to an optional Rack middleware specifically for handling "large" bodies / file uploads?
I'd like to propose an alternative strategy for handling "large" bodies.
(0) Remove any use of
Tempfile
inRack::Multipart
or other default body parsers.(1) Provide a configurable post body buffer limit
max_body_buffer_length
. Ifcontent_length.present? && content_length <= max_body_buffer_length
then read it ASAP (ideally non-blocking) intorack.input
asStringIO
or other. Else, ifcontent_length.nil? || content_length > max_body_buffer_length
, then do not read the stream. This would mean that no chunked encodings are handled by default, and no buffering to disk for streams larger than the configured limit.(2) When
request.params
orrequest.body
is accessed, if body was not parsed due to length or chunked encoding, report a warning message that can be disabled.(3) Provide automatic "large body" parsing as a middleware module plugin that can be optionally installed.
The text was updated successfully, but these errors were encountered: