Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing environments. #1546

Closed
ioquatix opened this issue Feb 3, 2020 · 32 comments
Closed

Managing environments. #1546

ioquatix opened this issue Feb 3, 2020 · 32 comments
Assignees
Milestone

Comments

@ioquatix
Copy link
Member

ioquatix commented Feb 3, 2020

RACK_ENV as a variable has many confusing use cases, and many of them don't completely align up.

  • In some cases, it's used to configure the middleware in config.ru. Value include :test, :development, and :production. Frameworks are migrating to use APP_ENV instead.
  • In Rack::Server it's used to control the default middleware. Values include :development and :deployment.

I propose that we do the following:

1/ Change Rack::Builder to take an optional environment parameter, which can be used during the construction of the config.ru.

# config.ru

if environment?(:production)
  use ReportErrors
else
  use LogErrors
end

# ...

Then:

Rack::Builder.new(environment: 'production')

We can still feed RACK_ENV into this if required, but we decouple it and provide a non-global state interface for users.

2/ Provide some kind of "load an app" function under Rack::Handler.

Rack::Handler.load_app(path, environment)

The point is, if Rack::Builder is not sufficient for generating the app, what is? Do we want to provide an interface for servers like Puma, Falcon, so they can essentially say: give me a rack middleware, for this environment, from this configuration file.

In particular, I was looking at the middleware situation. Based on what @tenderlove told me, we can almost remove environment from the server, but I feel like there is value in automatically adding lint and so on during development.

So, either we clearly define values for environment and what middleware they load, or we leave that up to downstream libraries, e.g. rack-test can add Rack::Lint.

Some discussion here would be good. However, I may try throwing together a PR just so we have something actionable.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 3, 2020

One more question I have, I was looking at the default middleware.

      def default_middleware_by_environment
        m = Hash.new {|h, k| h[k] = []}
        m["deployment"] = [
          [Rack::ContentLength],
          logging_middleware,
          [Rack::TempfileReaper]
        ]
        m["development"] = [
          [Rack::ContentLength],
          logging_middleware,
          [Rack::ShowExceptions],
          [Rack::Lint],
          [Rack::TempfileReaper]
        ]

        m
      end
  • Rack::ContentLength. The changes to Rack::Response revealed that many users are unable to set the content-length header correctly. That was surprising. We have Rack::ContentLength but IMHO it should be deprecated and removed. It's a bad idea and encourages inefficient code.

  • Rack::TempfileReaper. I don't understand this. To me, if you are handling multipart data, it almost seems like if you don't have it, you will eventually run out of storage space? But I checked, most times RACK_ENV is :production which means this would not be used. I also checked puma, unicorn and falcon and I cannot see any place where this middleware is used. If we have things that are critical to the reliable operation of Rack, should we not ensure this is enabled on all middleware stacks handed to the server?

  • logging_middleware. On CGI, this is skipped, I guess because CGI outputs to stdout and for some reason logging is not going to stderr? This is very confusing, but again, either this should be handled by the server, or the user supplied config.ru. Should Rack provide logging by default?

In falcon I implemented my own logging which is more detailed, so I'd probably want to skip this.

Looking at rack/server.rb I wonder if the entire thing should just be removed, along with all the handlers (maybe moved to a separate gem). I don't think servers are using that code. rackup is not even that useful... but I do see some value in it.

If we can figure out what the long term vision for Rack looks like, and how we want servers to integrate with it, I think we'd proabably be able to answer some of these questions.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 3, 2020

Here is an example of part of one of my own config.ru files:

#!/usr/bin/env rackup
# frozen_string_literal: true

require_relative 'config/environment'

require 'rack/freeze'

if RACK_ENV == :production
	# Handle exceptions in production with a error page and send an email notification:
	use Utopia::Exceptions::Handler
	use Utopia::Exceptions::Mailer
else
	# We want to propate exceptions up when running tests:
	use Rack::ShowExceptions unless RACK_ENV == :test
end

Interestingly enough, we could actually def RACK_ENV in builder and have it do something useful to mitigate this specific issue, but I've also seen RACK_ENV deeply embedded into application frameworks.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 3, 2020

What's even more crazy, is that puma includes an entire copy of Rack::Builder: https://github.com/puma/puma/blob/master/lib/puma/rack/builder.rb

@jeremyevans
Copy link
Contributor

Going over the various proposals in this issue:

  • Add Rack::Builder#environment?

This seems fine to me. However, we should definitely keep backwards compatibility and continue to use RACK_ENV everywhere it is currently used. case ENV['RACK_ENV'] should remain a way to configure applications based on environments.

  • Add Rack::Handler.load_app

I don't see the benefits of this, but I also don't maintain a webserver.

  • Rack::ContentLength middleware

This middleware doesn't overwrite an existing incorrect Content-Length, so it doesn't help people who set it incorrectly. However, I think we should keep it where it currently is for backwards compatibility.

  • Rack::TempfileReaper

Without this middleware, tempfiles are not closed after the request, so they will not be cleaned up until garbage collection (this behavior comes from Tempfile using ObjectSpace.define_finalizer). So in general not removing them after the request does not cause production issues. Now, if the process does not exit normally, they may remain in the file system, but that's probably the least of your worries at that point.

  • logging_middleware

I think Rack should continue to use this for the development and deployment environments for backwards compatibility.

  • Removing rack/server.rb

I'm against removing it. Now, In terms of handlers:

  • scgi is pretty much dead and could be removed. I added it in 2008. I think it still works, but every server that supports it probably supports fastcgi.
  • thin hasn't had a commit in about 2.5 years, though I think there are people still actively using it.
  • lsws I have no idea. It was added in 2007, and last significant update in 2009. The server still seems actively maintained, though.
  • cgi and fastcgi we should keep as those are commonly supported by external webservers.
  • webrick we should keep as it is the only server we can guarantee to be there.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 3, 2020

Thanks for your feedback.

Given that none of the middleware is strictly required, I think we should just remove all of them. The on the specific things that require lint, e.g. rack-test, they can build the middleware appropriately.

Thanks for explaining about TempfileReaper, since I didn't know about it. That being said, I don't think any production server is actually using it.

Regarding the handlers. I'm on the fence. I think the bar should be that we have active tests to confirm it's working. In that regard, having a separate repo, e.g. rack-handler-scgi might make more sense.

I also took a look at how we could modify Rack::Builder. The current Server::Options seems irrelevant, and I don't believe it's used in production. Right now, it appears that you can parse options from the first comment line in a config.ru file, but I don't think I've ever seen this even once in production.

I think what we should aim to do is:

  • Retain all current behaviour for a Rack 2.2 release.
  • Start work towards Rack 3.0 by chopping and changing some of these interfaces.

@ioquatix ioquatix added this to the v3.0.0 milestone Feb 3, 2020
@jeremyevans
Copy link
Contributor

I still think we should keep the current middleware by default for backwards compatibility. If users don't want the current middleware, they can use a non development/deployment environment. The alternative is adding deprecation warnings for all usage of the development/deployment environment, which seems less palatable.

I'm OK removing handlers without tests. We should add deprecation warnings to such handlers in 2.2, so that people that may be relying on them can write tests. There already exist tests for webrick, thin, and cgi.

Parsing command line options in config.ru files is part of the public Rack API. I don't have a strong opinion about that, so I'm not opposed to removing that feature as long as it is deprecated properly first.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 3, 2020

I don’t think anyone is using RACK_ENV=deployment thoughts. @tenderlove indicated a desire for removing the default middleware. I think it’s only being used by rackup so in practice no one is actually using it. More investigation required.

@jeremyevans
Copy link
Contributor

Regarding the default development/deployment middleware, if you and @tenderlove are in favor of removing them, I'm OK with that in 3.0 as long as it is documented.

However, you should reconsider the idea that nobody is using rackup. It is probably used more than you think. Additionally, other tools that don't use rackup may still follow rackup's conventions. For example, Unicorn follows rackup's handling of the development and deployment environments, so a change in rackup will likely trigger a change in Unicorn, which many people are using.

@tenderlove
Copy link
Member

I'm ok with removing the default middleware in v3.0.0. My overall "vision" is that Rack defines an interface that webservers and applications can use to communicate. It should have no opinions about what a default middleware is, as it's just defining an interface. Also, it means that theoretically apps that speak Rack could work with webservers that speak Rack and never depend on Rack itself.

That said, I think it would be nice for Rack to provide things like:

  • Ways to test things (mock request / response)
  • A default request / response object (of course you can define your own)
  • Multipart parsing

Basically things that are nice to have but are never between the app and the webserver. Does that make sense as a direction?

I think keeping the webrick handler around is fine as an example implementation for a webserver, but I do think we should rm the rest.

@tenderlove
Copy link
Member

I should have also mentioned: providing middleware seems fine too, but since my goal is for Rack to be "optional" in an app we shouldn't make defaults.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 4, 2020

Right - so if the server implements the SPEC, it can load apps any way it likes, it doesn't need to be Rack::Builder. That's an interesting POV I didn't consider before.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 4, 2020

I've been thinking about this over the past day and I came to some conclusions:

  • ENV variables are okay but need to be documented very clearly. We have a bit of a mess now with RACK_ENV and I also looked at my own code and it's not clean w.r.t. this variable.
  • The community seems to have standardised on APP_ENV, which means the environment the app is running in.
  • I've also seen and used DATABASE_ENV.

Firstly, let me state that I hate abbreviations, and _ENV is no exception. One reason for this is it's not necessarily clear to beginner and non-native speakers what _ENV means. So, a thread that runs through my design is trying to avoid this or at least begrudgingly accepting it because it's so entrenched. It also seems kind of weird to me because ENV is a hash, yet RACK_ENV is a string which indexes into a hash. So there is a bit of cognitive dissonance.

As we can see, there are different environments and we use a single key to provide indirection w.r.t. which environment we want to select, often used to load a configuration file with the same name, e.g. production.yaml or .env.development, or indexing into a list of connection parameters, or selecting a different database name.

There ARE some more things which are probably "global" state in rack which are passed as env variables per-request:

  • rack.early_hints
  • rack.version
  • rack.multithread
  • rack.multiprocess
  • rack.runonce
  • rack.hijack?

(I understand you could probably argue either way for some of them).

Regarding building a rack app, things like rack.multithread and rack.multiprocess are not known until the first request. If the middleware is trying to build some internal caching data structure, at app builder time, it needs to know whether it needs to be reentrant/thread-safe or can make optimisations regarding isolation/multi-process.

So, with that all in mind, I believe we should introduce a new interface for dealing with this.

Something like:

module Rack
  class Environment < Hash
    def self.build(env = ENV, multithread: false, multiprocess: false)
      environment = self.new
      
      environment['rack.env'] = env['RACK_ENV']&.to_sym
      
      if app_env = env['APP_ENV']
        environment['app.env'] = app_env.to_sym
      else
        environment['app.env'] = environment['rack.env']
      end
      
      environment['rack.multithread'] = multithread
      environment['rack.multiprocess'] = multiprocess
    end
    
    def multithreaded?
      self['rack.multithread']
    end
    
    def multiprocess?
      self['rack.multiprocess']
    end
    
    def production?
      self['app.env'] == :production
    end
    
    def development?
      self['app.env'] == :development
    end
    
    def test?
      self['app.env'] == :test
    end
  end
  
  class Builder
    def initialize(..., environment:)
      @environment = environment
    end
    
    attr :environment
  end
end

# config.ru

if environment.production?
  use SendEmailErrorReports
end

if environment.multithreaded?
  use Rack::Lock # oh no
end

However, if at all possible, I'd like to think about whether APP_ENV and similar abbreviations are a good way forward.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 4, 2020

Also, maybe multithread and multiprocess are not helpful... we should consider something that indicates whether middleware will be required to service more than one request at a time.

We could also use it for servers to advertise features, e.g.

rack.reentrant = true
server.websockets = true
server.http1 = true
server.http2 = true
server.this_feature = false
server.that_feature = true

etc

This would be used by middleware to ensure the returned app can function (or not) on the given server.

@jeremyevans
Copy link
Contributor

I'm against requiring Rack specific classes as part of the Rack API. As @tenderlove has mentioned, web servers and web applications need not require rack at all, as long as both conform to the rack API. I strongly feel the request env should remain a plain hash.

The only place where Rack currently checks RACK_ENV is in the server and handlers. Some handlers use RACK_ENV to decide which address to bind to, and I don't think we want to change that as it can cause security issues. If we drop the default middleware for development and deployment, the server doesn't need to care about RACK_ENV.

I'm against adding support for APP_ENV. I think Rack applications can respect it, but it should not effect Rack itself.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 4, 2020

I'm against requiring Rack specific classes as part of the Rack API.

I don't think it needs to be part of the SPEC. Because we don't include the definition of Rack::Builder in the SPEC either. So, I think it's okay.

The only place where Rack currently checks RACK_ENV is in the server and handlers.

As this is internal implementation detail, it can be reworked to avoid the use of a global variable IMHO, an would generally be a good idea.

If we drop the default middleware for development and deployment, the server doesn't need to care about RACK_ENV.

It depends on how we want to expose this to the servers, and what kind of interface we should expect servers to conform to. Rack::Builder and config.ru are entirely contraptions of Rack and not the Rack SPEC. So, I have no issue with adjusting how they work in Rack 3.0 if it makes sense and makes it easier for users and servers.

Regarding APP_ENV, I agree, except that to me, providing some standardisation around this will make middleware more compatible. It doesn't have to be a global. But yes, maybe it's introducing too much surface area.

@jeremyevans
Copy link
Contributor

I'm against requiring Rack specific classes as part of the Rack API.

I don't think it needs to be part of the SPEC. Because we don't include the definition of Rack::Builder in the SPEC either. So, I think it's okay.

I think that is fairly dangerous. If all users of Builder end up with env being a custom Rack object, most software designed to work with Rack will start relying on that interface. You end up with the de facto standard being the custom Rack object API, instead of the actual SPEC standard.

I'm fine with servers (e.g. Falcon) using custom Hash subclasses for env, but I strongly feel that Rack itself should not do so.

Note that the methods you are proposing for env I would be fine with adding to Rack::Request, though I don't think we should add app.env or APP_ENV. env keys specific to Rack should start with rack., and ENV variables rack uses should start with RACK_, IMO.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 5, 2020

I think that is fairly dangerous. If all users of Builder end up with env being a custom Rack object,

That's not my proposal though.

It's an entirely new interface to expose the state of the world to the builder instance, rather than depending on global variables.

Whatever you call it and however you implement it, I feel it should be certainly possible to do the following (the methods/arguments are just to illustrate the issue):

production_app = Rack::Builder.new(:production).load('config.ru').to_app
development_app = Rack::Builder.new(:development).load('config.ru).to_app
thread_safe_app = Rack::Builder.new(..., multithread: true).load('config.ru').to_app

Depending on global state, or some kind of per request state (in the case of multithread) just seems completely wrong to me.

So, do you have any ideas how we can solve this?

@jeremyevans
Copy link
Contributor

I think that is fairly dangerous. If all users of Builder end up with env being a custom Rack object,

That's not my proposal though.

Sorry, I got confused since you are using the same keys that are currently used in the request environment per SPEC, and it's named Environment. I think I understand now that this environment is just a helper for builder scripts/.ru files.

So, do you have any ideas how we can solve this?

Is this actually enough of a problem to be worth solving? I currently don't think the benefit of adding the API is worth the cost. It recreates much of what web frameworks do, and I think it is something best left to web frameworks. That said, I'm not strongly against it, just not in favor.

In terms of implementation, if it only supports the methods you are defining, there is no reason it should be a Hash subclass, a simple object with instance variables should be sufficient. Alternatively, it could be a Struct subclass. This is unless you want to support the following in .ru files:

if environment[:some_setting]
  use SomeMiddleware
end

@ioquatix
Copy link
Member Author

ioquatix commented Feb 5, 2020

Okay, I will spend some time thinking about what makes the most sense.

@boazsegev
Copy link

RACK_ENV as a variable has many confusing use cases, and many of them don't completely align up.

I agree with this assessment. I find that :deployment is rarely used anymore. I find myself preferring the use of :production, which is much more prevalent (and, I'm happy to report, has no default middleware)...

...It would be nice if this behavior was updated to reflect common community practices.

I should have also mentioned: providing middleware seems fine too, but since my goal is for Rack to be "optional" in an app we shouldn't make defaults.

I agree with @tenderlove on that one.

As both an app developer and a server maintainer, these defaults are sometimes surprising. I've had bug reports related to the default middleware producing changes to the response (and developers blaming my server).

Rack defines an interface that webservers and applications can use to communicate.

Thank you @tenderlove for pointing out that the Rack interface might be used without the Rack gem.

The iodine server supports this approach (a "rackless rack") and I used this approach before when I realized I wasn't using any Rack features.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 6, 2020

Summary of Server::Options handling.

Puma

https://github.com/puma/puma/blob/4c85facff3b54bae1468ddbafa677d2d53f4b3f0/lib/puma/rack/builder.rb#L158-L173

Assessment: Options are an instance of a custom Options class. First line is parsed.

Iodine

https://github.com/boazsegev/iodine/blob/71d4d6baf8feca7c78706497e220393d2cf6ab11/exe/iodine#L60-L78

Assessment: Options are an instance of Hash and are ignored.

Falcon

https://github.com/socketry/falcon/blob/b8e9ca5aeaf748bec550a54a8bcdeb7f7c3d3fe7/lib/falcon/command/serve.rb#L80

Assessment: Standard Rack::Builder usage. Options are parsed but ignored as not relevant.

Unicorn

https://github.com/defunkt/unicorn/blob/080bbe7a14bdb0373eaf25384003e51eec6ab748/lib/unicorn.rb#L41-L62

Assessment: Completely custom parser and options handling.

Passenger

https://github.com/phusion/passenger/blob/b30bdbe66b3f41d108f160e22778549ef82184d3/src/helper-scripts/rack-loader.rb#L84-L97

Assessment: Completely custom parser, ignores options.

@ioquatix
Copy link
Member Author

ioquatix commented Feb 6, 2020

I wanted to know if any code is actually using Server::Options as parsed by the first line in config.ru. It should start with #\.

Rack Codebase

koyoko% find . -name "*.ru" | xargs head -n 2
==> ./example/protectedlobster.ru <==
# frozen_string_literal: true


==> ./example/lobster.ru <==
# frozen_string_literal: true


==> ./test/cgi/sample_rackup.ru <==
# frozen_string_literal: true


==> ./test/cgi/test.ru <==
#!../../bin/rackup
# frozen_string_literal: true

==> ./test/rackup/config.ru <==
# frozen_string_literal: true


==> ./test/builder/frozen.ru <==
# frozen_string_literal: true


==> ./test/builder/end.ru <==
# frozen_string_literal: true


==> ./test/builder/options.ru <==
# frozen_string_literal: true


==> ./test/builder/bom.ru <==
run -> (env) { [200, { 'Content-Type' => 'text/plain' }, ['OK']] }

==> ./test/builder/line.ru <==
# frozen_string_literal: true


==> ./test/builder/comment.ru <==
# frozen_string_literal: true

Assessment: Not in use.

Does anyone have any suggestion how we can check all of GitHub?

@ioquatix ioquatix modified the milestones: v3.0.0, v2.3.0 Feb 7, 2020
@ioquatix
Copy link
Member Author

I realised this is probably going to introduce breaking changes, because of the way Rack::Builder works. I could not figure out a way around it. Maybe @eregon has some more meta-magic. But I couldn't see a way to take the current approach and use a pre-existing builder, e.g.

eval('Rack::Builder.new{ ...}')

vs

builder = Rack::Builder.new(options)
eval('???')

I've also been thinking about this in the context of servers.

Do we want config.ru to introduce global variables, etc. Should it really be running at TOPLEVEL_BINDING? Because in testing, if you try to load a config.ru file that does this multiple times, you end up clobbering globals and constants, etc. So I wonder if we should think about what we want for Rack 3.0.

@jeremyevans
Copy link
Contributor

I believe config.ru should continue to run at TOPLEVEL_BINDING. That is the expectation and I don't think any benefits of changing it are worth the loss of backwards compatibility. Note that if you provide a non-.ru file, it is just required, and required files are evaluated at TOPLEVEL_BINDING, so changing this would also introduce incompatibility between .ru and non-.ru files.

@ioquatix
Copy link
Member Author

That is the expectation and I don't think any benefits of changing it are worth the loss of backwards compatibility.

The benefits are:

  • Better isolation of application state which potentially aids graceful reloading, testing, etc.
  • Simpler implementation in Rack::Builder - maybe impossible to actually use options as shown above if we try to maintain current implementation.

In addition, I far prefer people using require for code/constants than trying to stick an entire web application into a config.ru file.

Although, I think this needs more experimentation. I don't disagree with you that it breaks some level of backwards compatibility.

@eregon
Copy link
Contributor

eregon commented Feb 11, 2020

I realised this is probably going to introduce breaking changes, because of the way Rack::Builder works. I could not figure out a way around it. Maybe @eregon has some more meta-magic. But I couldn't see a way to take the current approach and use a pre-existing builder

Just replacing
https://github.com/rack/rack/pull/1545/files#diff-ae0c3a877f25d11360996846d72db9acR114
with

builder = Rack::Builder.new(options)
binding = TOPLEVEL_BINDING.eval('builder.instance_eval { binding }')

should work, no?

Do we want config.ru to introduce global variables, etc. Should it really be running at TOPLEVEL_BINDING?

It's running in a copy of TOPLEVEL_BINDING, it doesn't add local variables to TOPLEVEL_BINDING.
Or is the config.ru actually using $global variables? That sounds wrong.

Because in testing, if you try to load a config.ru file that does this multiple times, you end up clobbering globals and constants, etc.

Right, constants in config.ru will be constants of Object.
You could wrap everything under a module, much like load(file, wrap=true) does, but that would likely cause incompatibilities.

Global variables are global no matter what, so there is no way to fix that except not using them.

@eregon
Copy link
Contributor

eregon commented Feb 11, 2020

It's worth noting than running code in a copy of TOPLEVEL_BINDING is no different than code running in a file loaded by require. Both have the same constant nesting (just Object).
Or maybe your point is that require loads a given file only once, but config.ru might be loaded multiple times in the same process?

@ioquatix
Copy link
Member Author

ioquatix commented Feb 11, 2020

module Rack
	class Builder
		def initialize(**options)
			@options = options
		end
		
		def self.load(script, **options)
			builder = self.new(options)
			binding = TOPLEVEL_BINDING.eval('builder.instance_eval { binding }')
			
			binding.eval(script)
		end
		
		def debug
			pp @options
		end
	end
end

Rack::Builder.load("self.debug", test: true)
# undefined local variable or method `builder' for main:Object (NameError)

@eregon
Copy link
Contributor

eregon commented Feb 11, 2020

Ah yes, of course, but we can fix that:

builder = self.new(options)
b = TOPLEVEL_BINDING.dup
b.local_variable_set :builder, builder
binding = b.eval('builder.instance_eval { binding }')

That will have the side effect to also have that builder variable in the final binding.
So you might want to give it a name less likely to conflict.
You can also set that variable to nil, but it will still be in Binding#local_variables.

@ioquatix
Copy link
Member Author

Haha, you are magician.

@eregon
Copy link
Contributor

eregon commented Feb 11, 2020

Could also pass the builder via a lambda, very similar:

builder = self.new(options)
binding = TOPLEVEL_BINDING.eval('-> builder { builder.instance_eval { binding } }').call(builder)
eval builder_script, binding, file
builder.to_app

It still leaks that one local variable though.

@jeremyevans
Copy link
Contributor

With the removal of the Rack::Builder::Config from #1720, and the closure of #1718, due to the general thought that the complexity is not needed, I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants