Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Rack::Builder::Config, and support middleware.rackup configuring middleware based on server configuration #1720

Merged
merged 11 commits into from Jan 27, 2022

Conversation

jeremyevans
Copy link
Contributor

This is an alternative approach to #1718. Instead of creating a new
builder for every middleware that responds to rackup, we have a
single configuration object that stores the server's configuration,
such as whether it is multithreaded. If middleware.rackup is defined,
we call it with the configuration object as the first object, and the
remaining arguments and block that would have been passed to new.
The middleware.rackup method can return the app itself if it not
needed, call new with the remaining arguments if it is needed, or
potentially other actions for more complex cases.

To handle the very rare case where a middleware would want to
delegate to other middleware in certain server configurations, and
doesn't know whether the other middleware supports the rackup
method or not, The configuration object supports a rackup method,
which will call rackup on the middleware if defined, or new
otherwise. So if middleware A wants to use external middleware B
in a certain server configuration, middleware A's rackup method
could be something like:

  def self.rackup(config, app)
    if config.multithread?
      config.rackup(MiddlewareB, app)
    else
      new(app)
    end
  end

The configuration rackup method is also used internally to implement
the builder.

This readds Rack::Lock, using the rackup method, which only uses
the Rack::Lock middleware if the configuration indicates the
server is multithreaded.

The advantage of this approach is that it doesn't require exposing
the entire builder API to the middleware, it only exposes the
server configuration, which is all the middleware should need to
appropriately configure itself.

@ioquatix
Copy link
Member

ioquatix commented Nov 3, 2021

I generally like this idea, I'm not strongly in favour of any particular approach, but I would like to explore a bit more about how this works in practice. The idea of a config object is nice since it isolates the responsibilities, but it also introduces additional complexities. However, I think the net gain for users is that it simplifies more complex construction of the application and provides a very well defined interface for the kinds of things people should depend on.

The only thing I wonder about is whether we should allow config to contain user specific keys/values. I imagine that such a model could be really useful but at the expense of opening up the design to more complexity. However, the advantage would be that users could depend on a configuration context for their own usage rather than having multiple sources of truth for configuration.

@ioquatix
Copy link
Member

@jeremyevans do you mind rebasing this on master?

Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Just some points for us to discuss and iron out.

CHANGELOG.md Outdated
@@ -34,6 +34,10 @@ All notable changes to this project will be documented in this file. For info on

- [[CVE-2020-8184](https://nvd.nist.gov/vuln/detail/CVE-2020-8184)] Do not allow percent-encoded cookie name to override existing cookie names. BREAKING CHANGE: Accessing cookie names that require URL encoding with decoded name no longer works. ([@fletchto99](https://github.com/fletchto99))

### Removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a note regarding the addition of the new configuration class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that. Note that most users won't be using it, only middleware/server authors.

README.rdoc Outdated
@@ -77,7 +77,6 @@ middleware:
* Rack::Files, for serving static files.
* Rack::Head, for returning an empty body for HEAD requests.
* Rack::Lint, for checking conformance to the \Rack API.
* Rack::Lock, for serializing requests using a mutex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't end up removing this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'll restore it.

# Config stores settings on what the server supports, such as whether it
# is multithreaded.
class Config
def initialize(multithread: true, reentrant: multithread)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have a more generic interface **options so that different servers can pack different information into this configuration.

If we choose to do this, maybe we should also expose attr :options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here is that server authors can subclass this class to provide a richer API if they need to. Currently, the only use for the configuration is for concurrency. The issue with **options is that it turns typos into silent failures.


# Re-entrancy is a feature of event-driven servers which may perform non-blocking operations. When
# an operation blocks, that particular request may yield and another request may enter the application stack.
def reentrant?; @reentrant; end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about this configuration option name?

I started wondering if we should call it multifiber or even replace multithread with parallel? and reentrant? with concurrent?. multithread? sounds like an implementation detail.

I think we should try to iron this interface out as it's going to be important going forward that it means exactly what we want it to mean for the servers that need to support it. We might want to add a table showing potential values for these configuration options w.r.t. servers and their respective configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multithread? is an implementation detail, so is multifiber?. parallel? is not accurate on CRuby, since Ruby code does not execute in parallel on CRuby (though that is also an implementation detail). Replacing reentrant? with concurrent? seems like a good idea to me, so I'll make that change.

I would like to get rid of multithread?, but I think it it is necessary until we drop Ruby 2.7 support. That is because on Ruby 2.7 and below, you cannot use Rack::Lock in the concurrent but not multithread case, since it uses Mutex internally.

builder = self.new(config: config)

# Create a top level scope with self as the builder instance:
binding = TOPLEVEL_BINDING.eval('->(builder){builder.instance_eval{binding}}').call(builder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not particularly important, but in another project I moved TOPLEVEL_BINDING.eval('->(builder){builder.instance_eval{binding}}') into a separate constant, i.e.

# Top level of file:
module Rack; end
Rack::BUILDER_CONTEXT = ->(builder){builder.instance_eval{binding}}

# ...
        binding = BUILDER_CONTEXT.call(builder)

@eregon this should be the same thing right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. I'd like the constant to be under Rack::Builder. So I plan to use Rack::Builder::EVAL_CONTEXT?

lib/rack/lock.rb Outdated
@mutex.unlock
@env[RACK_MULTITHREAD] = @old_rack_multithread
def self.rackup(config, app)
if config.multithread?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fiber scheduler now correctly handles locks so this might be relevant no matter what? i.e. we might prefer to use reentrant? as it's currently defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this for concurrent?, but I think only on Ruby 3+. I'll modify the code to handle that.

lib/rack/lock.rb Outdated Show resolved Hide resolved
Copy link
Contributor Author

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ioquatix I pushed a new commit and tried to respond to all of the points you raised.

CHANGELOG.md Outdated
@@ -34,6 +34,10 @@ All notable changes to this project will be documented in this file. For info on

- [[CVE-2020-8184](https://nvd.nist.gov/vuln/detail/CVE-2020-8184)] Do not allow percent-encoded cookie name to override existing cookie names. BREAKING CHANGE: Accessing cookie names that require URL encoding with decoded name no longer works. ([@fletchto99](https://github.com/fletchto99))

### Removed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that. Note that most users won't be using it, only middleware/server authors.

README.rdoc Outdated
@@ -77,7 +77,6 @@ middleware:
* Rack::Files, for serving static files.
* Rack::Head, for returning an empty body for HEAD requests.
* Rack::Lint, for checking conformance to the \Rack API.
* Rack::Lock, for serializing requests using a mutex.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'll restore it.

# Config stores settings on what the server supports, such as whether it
# is multithreaded.
class Config
def initialize(multithread: true, reentrant: multithread)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here is that server authors can subclass this class to provide a richer API if they need to. Currently, the only use for the configuration is for concurrency. The issue with **options is that it turns typos into silent failures.


# Re-entrancy is a feature of event-driven servers which may perform non-blocking operations. When
# an operation blocks, that particular request may yield and another request may enter the application stack.
def reentrant?; @reentrant; end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multithread? is an implementation detail, so is multifiber?. parallel? is not accurate on CRuby, since Ruby code does not execute in parallel on CRuby (though that is also an implementation detail). Replacing reentrant? with concurrent? seems like a good idea to me, so I'll make that change.

I would like to get rid of multithread?, but I think it it is necessary until we drop Ruby 2.7 support. That is because on Ruby 2.7 and below, you cannot use Rack::Lock in the concurrent but not multithread case, since it uses Mutex internally.

builder = self.new(config: config)

# Create a top level scope with self as the builder instance:
binding = TOPLEVEL_BINDING.eval('->(builder){builder.instance_eval{binding}}').call(builder)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. I'd like the constant to be under Rack::Builder. So I plan to use Rack::Builder::EVAL_CONTEXT?

lib/rack/lock.rb Outdated Show resolved Hide resolved
lib/rack/lock.rb Outdated
@mutex.unlock
@env[RACK_MULTITHREAD] = @old_rack_multithread
def self.rackup(config, app)
if config.multithread?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this for concurrent?, but I think only on Ruby 3+. I'll modify the code to handle that.

lib/rack/builder.rb Outdated Show resolved Hide resolved
@@ -31,6 +38,30 @@ module Rack
# You can use +map+ to construct a Rack::URLMap in a convenient way.

class Builder
# Config stores settings on what the server supports, such as whether it
# is multithreaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense for us to explain that we expect severs may sub-class this to provide additional details?

Also, considering this usage, does it make more sense to provide a generic options hash?

Otherwise, I imagine we might have:

def rackup(config, app)
  if config.respond_to?(:puma_thing)
    puma_thing = config.puma_thing
    ...

While I'm okay with the general idea of a strong interface, I'm a little concerned about how we should use it if puma/falcon/thin/unicorn start adding server-specific interfaces.

We should also consider the case where Rack is an interface rather than a gem. To this end, it feels like options is a stronger contender since it's simpler. But it's also less pleasant to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I prefer your design, but I think we need to consider some specific usage scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the configuration would only be used for Rack::Lock. I'm not aware of any other use case, so making it more flexible and prone to silent failures doesn't seem like a good trade-off. If you have ideas for how this will be used by servers, that information would definitely be helpful. That said, I'm not strongly against the options hash approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rack for the most part tries to separate "Rack the SPEC" and "Rack the GEM". This configuration interface touches both worlds. I personally think Rack would benefit from richer interfaces, but I also appreciate the original design which was servers could follow "Rack the SPEC" without depending on "Rack the GEM". IIRC, Passenger does not pull in "Rack the GEM" and instead just follows the spec. In that case, @FooBarWidget would need a bespoke implementation of Rack::Builder::Config but it's not specified anywhere except for the implementation in the GEM. Maybe this is a longer term project - we should define config.ru - i.e. use, run, the context for evaluation, the builder and config interfaces, etc.

Frankly it feels a lot harder to define an interface like Rack::Builder::Config - the building blocks we've used for "Rack the SPEC" has been Hash, Array, String, and other simple types. In any such spec, I guess we would leave the actual class as anonymous and just define the interface, e.g.

Your application may respond to `rackup` in which case it will be invoked with a configuration object which contains at least the following interface:
  multithread? -> true | false
  concurrent? -> true | false

I guess this all hinges on how much we define config.ru and Rack::Builder as "SPEC" vs "GEM". cc @tenderlove your input would be useful too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the middleware users and server implementers using hashes for this config would be better. It is easier to check for a hash value than it is to check if the config object respond to some method the only some server configs provide.

For what I get of this implementation, the only reason why a hash as config would not work right now is because the object allows servers to implement the rackup method, but do we see them needing this kind of feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Rack::Builder directly has the same issues that using Rack::Builder::Config does in terms of SPEC. I agree from a SPEC perspective, rack avoids custom objects. However, config.ru was never part of SPEC (which is only regarding the rack protocol), so I'm not sure it applies.

The issue with using a plain hash is that it won't have the equivalent of Config#rackup, so middleware cannot pass the configuration to other middleware loaded by that middleware. However, maybe there are no such middleware that actually need that. Switching to a plain hash seems fine, I can work on that change if that's the direction we want to go.

However, maybe we should rethink the idea of middleware autoconfiguration based on configuration parameters? Is it really needed? Maybe we can just have Rack::Lock always lock, and users who know their apps are not concurrency-safe can use it. If they use it on a non-concurrent webserver (e.g. Unicorn), it will use a lock when it doesn't need to, but uncontested mutex is not slow, so there is no real problem with doing so. What are your thoughts on abandoning the idea of middleware configuration? We would still remove rack.multithread/rack.multiprocess/rack.run_once.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on abandoning the idea of middleware configuration? We would still remove rack.multithread/rack.multiprocess/rack.run_once.

I'm positive for that idea. To be fair the only middleware that I saw which uses this is Rack::Lock, which Rails doesn't include anymore by default and uses it own config config.allow_concurrency to configure it.

Copy link
Member

@ioquatix ioquatix Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we still expose Builder#options for server configuration/specific details at load time. Otherwise there is no per-builder side channel for server configuration/details at all. We don't need to handle the rackup model in the same PR, if it's too hard, we can drop it. Users can write use MyMiddleware.rackup(self) if they want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for a per-builder side channel. The complexity isn't worth it, IMO. Looks like @rafaelfranca agrees. So unless another committer is in favor of adding it, we should probably just commit the removal of rack.multithread/rack.multiprocess/rack.run_once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's do that to start with.

ioquatix and others added 11 commits January 26, 2022 16:37
These variables generally come too late to be useful. Removed `Rack::Lock` which depends on these variables.
This adds Rack::Builder::Config, which stores information on the
server's configuration, such as whether it is multithreaded or
supports reentrancy.

Middleware can use the configuration by defining a rackup method
in addition to a new method.  If the rackup method is defined,
it is called instead of the new method, with the configuration
as the first argument and with the remaining arguments and block
the same as what would be passed to new.

In cases where the server's configuration indicates the middleware
is not needed, the middleware rackup method can be just return the
app itself.

To handle the very rare case where a middleware would want to
delegate to other middleware in certain server configurations, and
doesn't know whether the other middleware supports the rackup
method or not, The configuration object supports a rackup method,
which will call rackup on the middleware if defined, or new
otherwise.  So if middleware A wants to use external middleware B
in a certain server configuration, middleware A's rackup method
could be something like:

  def self.rackup(config, app)
    if config.multithread?
      config.rackup(MiddlewareB, app)
    else
      new(app)
    end
  end

The configuration rackup method is also used internally to implement
the builder.

This readds Rack::Lock, using the rackup method, which only uses
the Rack::Lock middleware if the configuration indicates the
server is multithreaded.

The advantage of this approach is that it doesn't require exposing
the entire builder API to the middleware, it only exposes the
server configuration, which is all the middleware should need to
appropriately configure itself.
Avoids the need to use TOPLEVEL_BINDING.
… on Ruby 2

Ruby 2 Mutex only works for multiple threads, not multiple fibers.

Such apps are probably still broken at runtime unless they are
using their own locks, but this matches the Rack 2 behavior.
This correctly doesn't release the lock until after the body is
closed. Restore the related tests as well.  Just make changes to
avoid use of rack.multithread.
Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.

@jeremyevans jeremyevans merged commit 0f8bb0f into rack:master Jan 27, 2022
czj added a commit to czj/flipper that referenced this pull request Nov 13, 2023
When using Rack >= 3.0.0 you get this error message on boot an app with Flipper UI :

```
warning: Rack::File is deprecated and will be removed in Rack 3.1
```

This PR detects if `Rack::Files` is defined and tries to use it instead.

Source : rack/rack#1720
albertski added a commit to albertski/lamby that referenced this pull request Jan 26, 2024
These constants were removed from Rack in rack/rack#1720
Stam098c referenced this pull request Feb 14, 2024
* Add rack.response_finished to Rack::Lint

This updates Rack::Lint to validate that `rack.response_finished` is an
array of callables when present in the `env`. e.g. procs, lambdas, or
objects that respond to `call`.

This validates that:

* `rack.response_finished` is an array
* The contents of the array all respond to `call`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants