New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rack.response_finished to Rack::Lint #1802
Conversation
|
No. |
Is |
Other than the key, it seems to be compatible:
There is no advantage in adding a different name for behavior already de facto standardized in existing webservers. |
@jeremyevans I understand your point. However, since we are NOW formalising it, we could change the name. I don't think |
Unless we can identify significant problems, I believe we should adopt existing de facto standards as already implemented. @tenderlove agrees (#1701 (comment)):
Here the question is just about what env key to use, which is purely a bikeshed issue. The bikeshed is already painted, there is no point in painting it a different color, even if the different color looks nicer. |
@jeremyevans that's not what @tenderlove said: #1777 (comment) |
I would like to add, we CAN validate that these are called by the server, by wrapping them - the same way we validate the body was either called or enumerated IIRC. |
What I quoted is what @tenderlove said in issue #1701. It is a general statement about how we should treat a de facto spec. It looks like what @tenderlove states in #1777 is in favor of this particular case, even though this case goes against his advice in #1701. I will leave it to @tenderlove to explain why the general statement does not apply to this particular case.
I suppose we could do that, but I don't see the advantage. Lint is not about validating how servers handle responses, merely that the response provided to the server is in the correct format. Even assuming we wanted Lint to do this, when could you actually validate them? You could use a timeout that fails if it isn't called. How long would such a timeout be? Would it start when the body was closed? Not sure how such a timeout can reliably raise an appropriate exception, and I think it's a fools errand to try such an approach |
@jeremyevans I think these are all really good questions. Does the design being hard to test indicate a more fundamental issue? All the issues you mention seem like reasonable things... like, when does this execute? After 1 year? When the server shuts down? Should we allow it to be deferred? Should it run on a background thread? Does this indicate design issues we need to address? |
This updates Rack::Lint to validate that `rack.response_finished` is an array of callables when present in the `env`. e.g. procs, lambdas, or objects that respond to `call`. This validates that: * `rack.response_finished` is an array * The contents of the array all respond to `call`
3451331
to
0cc589a
Compare
I can attempt to answer some of these questions in the spec. I'll try adding some details about how/when these callables should be called under I'm also happy to revert the name back to |
@tenderlove I think your guidance is needed here to decide whether we should officially bless the de facto standard |
I'm fine with either name, but my issues would be:
The way I see it, this is a more efficient handler for Rack::BodyProxy. The typical usage I imagine would be things like closing database handles or indicating other resources can be freed. It's not semantically more complex than what we have, but it exposes a 2nd way to achieve the same thing which users should be concerned about. Every Considering Finally my last concern is one of responsibility. |
Co-authored-by: Samuel Williams <samuel.williams@oriontransfer.co.nz>
I like the name The usecase I think of is closing database connections after a request is finished. If some callback said everything should be skipped, it might mean leaked connections, and that wouldn't be a good thing. Open ended question: do we want to do anything with the return value of the callbacks, and do we want to pass anything to them? Or should we just punt that for later? @BlakeWilliams I think this is a great feature and I'd like to see it in Rack 3.0 (If you have the time of course!!) |
I started playing around with real implementations and realised we are missing a critically useful part which is the ability to feed any client errors back to the callback. I actually don’t mind what we call it but after actually trying it I want something better. Semantically this isn’t about a response or reply or even when it’s finished. It can also be when there is an error and for internal or virtual requests it has nothing to do with an actual response or reply. After investigating the naming scheme, the My updated proposal is this PR + replaces env with an options error argument. |
Also if it was up to me I’d seriously consider adding an argument to |
By the way I don't want to seem like I'm trying to make a unilateral decision, but I can't add commits to this PR. I was planning on merging this PR largely as is, but found that in practice it's not sufficient and ended up squashing all my commits on the new PR. Anyway, feel free to re-open this PR if we want to discuss it or work on this design further, that's totally fine, I just didn't feel comfortable asking @BlakeWilliams to continue working on something when I don't really know what I want to propose yet - I need to spend some time working through the implementation in Falcon too. |
Reopening as #1932 was closed. |
There has been a lot of discussion on this. I'm a strong believer that this needs to be compatible with the existing semantics where possible, and I'm no longer convinced this should be a required feature. I changed my opinion after implementing it in Falcon and reviewing existing use cases. The main use cases we have been informed of is:
Both use cases would be satisfied by an optional module Rack
def self.Ensure(env, app, &block)
if after_reply = env['rack.after_reply']
after_reply << block
return app.call(env)
else
begin
response = app.call(env)
response[2] = BodyProxy.new(response[2]) {block.call}
rescue => error
block.call(error)
raise error
end
end
end
end
response = Rack::Ensure(env, @app) do |error|
# Best effort to be called, error may be set if the response failed to be sent.
end Assuming no one will agree with The biggest limitation of this is the inability to provide error information to the callback when the This code above also hopefully demonstrates that there are semantic differences. We could have more elaborate exception handling in the after_reply branch to better match the BodyProxy implementation... not sure what is better. I would hypothesize that there is little performance difference between the use of |
If we are willing to bend the rules a little bit: response[2] = BodyProxy.new(response[2]) {block.call($!)} could be acceptable / best effort. |
I couldn't help but flesh this out. Not convinced by any of the method names or module Rack
# Usage:
# # In config.ru:
# use Rack::AfterReply # unless feature?(:after_reply)
#
# # In application:
# response = Rack::AfterReply.call(env, @app) do |error|
# # Best effort to be called, error may be set if the response failed to be sent.
# end
class AfterReply
def initialize(app)
@app = app
end
RACK_AFTER_REPLY = 'rack.after_reply'
# Expected to be called by the server.
def self.apply(env, error)
return unless after_reply = env[RACK_AFTER_REPLY]
while callback = after_reply.pop
begin
callback.call(error)
rescue => callback_error
env['rack.errors'].puts(callback_error.full_message)
end
end
end
# Provides a defacto implementation for servers that don't support it.
def call(env)
env[RACK_AFTER_REPLY] ||= []
@app.call(env)
rescue => error
ensure
self.class.apply(env, error)
end
# Expected to be used by the middleware/application:
def self.call(env, app, &block)
if after_reply = env[RACK_AFTER_REPLY]
after_reply << block
return app.call(env)
else
begin
response = app.call(env)
response[2] = BodyProxy.new(response[2]) {block.call($!)}
return response
rescue => error
block.call(error)
raise error
end
end
end
end
end |
GitHub is also using it in a few other places, like persisting some cache values at the end of a request which ended up making some of our pages a good bit more performant iirc. Puma has also been utilizing this since 2015 for logs using the
I'm not quite convinced that we need error information in the callbacks, at least based on the use-cases I've run into. Typically by the time you've created the callback I think it's reasonable to expect that the callback will run, regardless of error status. e.g. If we want to flush metrics at the end of a request, or close some kind of connection to a service, it doesn't matter if the client closed the connection.
I think it would reduce some of the value, but there's also value in having an explicit API for this behavior that's at least a bit more ergonomic than using Also, when I originally wanted to implement the stats flushing behavior I used
|
For what it's worth, both Puma and Unicorn support this functionality today, which results in a pretty large number of applications having The primary reasoning I had for opening this issue and this PR is that it would be ideal if we could make the implementation of those two webservers an official spec, since it is providing value to existing applications. By making it required/official, it means applications depending on that behavior are now able to more easily swap between server implementations if they desired (barring other issues, like thread safety) and have a slightly more consistent feature-set available. I'd find it unfortunate if we decided not to move forward with some variation on this functionality, but it would still exist in Puma/Unicorn and be usable as-is going forward. |
oh, there was also this gem, |
That's a totally reasonable position. However, at best, I feel this can be optional feature with a standard interface, and I don't think there is any point to introduce it in isolation. I would say that the value for me would be, defining a standard (optional) interface in Rack would allow us to update The counter point is, if we can't show a significant advantage, I'd prefer we don't introduce such a feature, because every optional feature or alternative way of doing things adds an extra dimension of complexity to applications and server implementations. Things like Regarding handling errors, it can be useful to know if the request was sent to the client completely or not. One simple use case would be logging the status of the response (sent successfully or not), another would be detecting unusual client behaviour (e.g. |
From the long discussion we had with @tenderlove, it is obvious both he and I are in favor of including If we keep The question at this point is not whether this feature will be in Rack 3, but what key will be used for it, and if switching to |
If we introduce an optional interface, but it's impossible to achieve the same behaviour with non-optional interfaces (i.e. An example of this would be GitHub's own application which apparently depends on this optional interface. Therefore, any server which does not implement it cannot host GitHub's application, because there is no abstraction which allows them to use "either rack.after_reply if it exists or BodyProxy". That's why my initial assessment was that this kind of feature should be non-optional, but I'm no longer of the opinion that this feature actually solves any real problem and instead makes things more complicated. I welcome someone showing some behaviour with this hook that cannot be achieved with |
My understanding is both @tenderlove and me are OK with it being optional or required. So if you would prefer it be required and not optional, we can make it required. |
GitHub's own application also isn't thread safe or fiber safe, meaning any webserver that uses either of those models can't host GitHub's application out of the box. In the Rack project, we should add specifications that are generally useful for users, and can be implemented by webservers. Whether a webserver chooses to implement an optional spec or not is up to them. But if users find that particular feature (be it threads, fibers, processes, or an
Making this feature required makes sense to me from a webserver compatibility perspective.
I think we should keep the new name and pass @BlakeWilliams sorry to keep dragging on with this, but could you update the spec such that the callbacks take 4 params, |
I would personally like to see I'm also not sure about retaining begin
response_finished = env['rack.response_finished']
status, headers, body = app.call(env)
# Assuming HTTP/1:
write_status(status)
write_headers(headers)
if response_finished.empty?
env = headers = nil # Allow the GC to free up resources.
end
write_body(body) # This might be a websocket that lasts for several hours and there might be several thousands of them.
rescue => error
ensure
invoke_callbacks(response_finished, env, status, headers, error)
end For this reason alone, I wouldn't support it in Falcon if I couldn't figure out a way to release those object as early as possible. I like the general idea, but what are the use cases for If we can't solve this problem, I'm completely against this being a required feature as it limits the number of in-flight connections we can handle. |
No worries! Happy to update this to match that signature. I do wonder if would make sense to avoid passing
Is there a benchmark that could measure how much extra memory is in-use to quantify the impact? I'd be curious to know what the baseline usage is vs the usage when headers and env are retained for each request. |
@BlakeWilliams yes I benchmarked this extensively and it's a non-trivial amount of data per request. I don't have the benchmarks in front of me but I'll endeavour to share this with you later this week. To give you some rough numbers (my memory is not great because it was like 2-3 years ago I did this), it was something like 20-30GiB of memory per client and server handling 1 million connections (so about 40-50GiB in total), and IIRC it was on the order of 50-100 objects per connection... so it doesn't take much to, say double the amount of memory used. An By the way, if you do the math, 20 gibibytes / 1 000 000 = ~20 kilobytes per connection, and that's including one fiber per client and one fiber per server connection... that's at least two pages (4kb * 2) + some heap storage. |
One usage for such a callback is for writing an access log, which will typically include the
If we do want to benchmark this, we should pick a benchmark that aims to reflect typical production workloads, such as Railsbench. Note that if the callback doesn't receive |
If this feature is not required, thus by implication not required on every request, if I see that the request likely to be long lived, I can avoid providing the callback array and avoid the memory costs, it's acceptable. I did the same thing for rewindability in Rack 2.x - I sniffed content type and only did it on requests where it was going to be used to avoid the overhead on every request. I think you'll have a hard time showing a performance impact on something like Rails, because it's lack of scalability inherent in the design, e.g. puma only handling 8-16 simultaneous requests. Falcon is designed to handle 10,000 or more where the impact is felt more heavily. My long term goal for Falcon is 100,000 active WebSockets per process. So, based on a napkin math, you'd see an overhead of about 1-2 GiB to keep these fields around, on a total expected memory usage of 2-4GiB, so it is about a 50% memory overhead per request... this is a total stab in the dark and I'll try to come back with actual numbers once I introduce a prototype of this callback. So, introducing this feature as optional sets the bar much lower and is probably more acceptable since server can opt in depending on the situation it expects. I'll also add that not all servers terminate the HTTP/1 socket with the rack request, i.e. in Falcon, because it supports different underlying protocols, it has different adapters and it's own internal middleware. |
Okay, this was merged in #1952. |
This updates Rack::Lint to validate that
rack.response_finished
is anarray of callables when present in the
env
. e.g. procs, lambdas, orobjects that respond to
call
.This validates that:
rack.response_finished
is an arraycall
Part of #1777