BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

cjlarose · 2020-02-28T00:53:07Z

Describe the bug

Before #1893, if you used prune_bundler and started bundle exec puma without explicitly setting BUNDLE_GEMFILE, workers would not have a value set for the environment variable BUNDLE_GEMFILE. This is desirable when you want workers to use the Gemfile located in the application directory, especially if that directory is different from the directory that you started the puma master process in (such as with a phased restart).

After #1893, workers have BUNDLE_GEMFILE unexpectedly set to the path of the Gemfile used by the master process. I think this forces bundler in the workers to use that Gemfile instead of the one in the worker's working directory as it would if the environment variable was unset.

I think the intent of that PR was that if you start your server with BUNDLE_GEMFILE=Gemfile.rails6 bundle exec puma, that workers would have BUNDLE_GEMFILE set to the same value. But it ignores the fact that Bundler itself sets a value for ENV['BUNDLE_GEMFILE'] even if it wasn't specified at startup time. Workers now end up inheriting this value, when they didn't before.

One consequence of this bug is that it is no longer possible to delete from your server the Gemfile that was used when you booted the puma master process. If you do, workers fail with the next phased restart. I imagine there are other bugs related to being able to change gems in the Gemfile for new versions of your application, but I haven't been able to reproduce those.

Puma config:

port 3000
workers 1
prune_bundler
directory '/usr/src/releases/current'

on_worker_boot do
  pp "BUNDLE_GEMFILE: #{ENV['BUNDLE_GEMFILE']}"
end

To Reproduce

I wrote a script that reproduces the error here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/regression-1893 (I reused a repo from an unrelated issue)

Before #1893 , workers boot and print BUNDLE_GEMFILE: . After that PR, workers boot and print BUNDLE_GEMFILE: /usr/src/releases/1582850222334837300/Gemfile, referring to the path of the Gemfile that was used by the puma master process.

Expected behavior

If BUNDLE_GEMFILE was unspecified when you started the master process, it should remain unspecified in the worker processes.

Desktop (please complete the following information):

OS: Linux
Puma Version 8c9b3eb

The text was updated successfully, but these errors were encountered:

nateberkopec · 2020-02-28T03:10:20Z

Er, 4.3.2 only contains the security issue fix mentioned in the History.

https://github.com/puma/puma/compare/v4.3.1..v4.3.2

cjlarose · 2020-02-28T03:18:23Z

Oh yeah, that's my bad. The issue is in master, but not yet in a released version of puma.

cjlarose · 2020-02-28T03:24:33Z

I updated my repro app to demonstrate that the old Gemfile seems to be active in the new workers--removing activesupport from the Gemfile and Gemfile.lock has no effect since the new workers read the old Gemfile.

seven1m · 2020-03-04T16:17:45Z

Yikes, I can have a go at fixing this since I broke it in #1893.

seven1m · 2020-03-04T17:10:49Z

@cjlarose I pulled down your repo and switched to the regression-1893 branch you linked.

I see that the Gemfile path does not update as expected, so that seems like a good reproduction.

However, when I comment out the one line of substance from #1893, your reproduction fails with:

Starting phased restart
[1753] - Starting phased worker restart, phase: 1
[1753] + Changing to /usr/src/releases/current
Command phased-restart sent success
[1753] - Stopping 1755 for phased upgrade...
[1753] - TERM sent to 1755...
[1776] + Gemfile in context: /usr/src/releases/1583341290066060201/Gemfile
"BUNDLE_GEMFILE: "
[1753] - Worker 0 (pid: 1776) booted, phase: 1
Testing connection to puma master
2020-03-04 17:01:41 +0000: Rack app error handling request { GET / }
#<LoadError: cannot load such file -- active_support>
config.ru:5:in `require'
config.ru:5:in `block (2 levels) in <main>'
/usr/src/releases/1583341290066060201/.bundle/ruby/2.7.0/gems/rack-2.0.9/lib/rack/content_length.rb:15:in `call'
/usr/src/releases/1583341290066060201/.bundle/ruby/2.7.0/gems/rack-2.0.9/lib/rack/reloader.rb:45:in `call'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/configuration.rb:228:in `call'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/server.rb:688:in `handle_request'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/server.rb:476:in `process_client'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/server.rb:332:in `block in run'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/thread_pool.rb:134:in `block in spawn_thread'

I think I do see what you are saying, and I will continue to try to write a good test to prove the problem, but I just wanted to let you know that the reproduction isn't quite good enough to go on at the moment.

cjlarose · 2020-03-04T19:26:15Z

@seven1m Yeah I'm sorry that I made the repro kinda hastily. I figured it was better to document what I found quickly before the regression made it into a puma release to limit impact.

That said, I think that failure you're seeing is actually the expected behavior. The first version of the application contains activesupport in the Gemfile and Gemfile.lock. This is the version of the application that workers boot up in when the puma master process is started.

Then the script prepares a second version of the application that removes activesupport from the Gemfile and performs a bundle install. Then, we perform a phased restart.

If the new workers are using the new Gemfile, then I expect them to fail when they require 'active_support'. This is the behavior I experience with Puma prior to #1893. However, if you include the changes from #1893, workers just boot up successfully (they're able to require 'active_support', even though it should not be accessible).

In retrospect, it probably would have been more clear if I made it so that the new version of the application instead required a new gem that wasn't present in the old version. That's my mistake. Sorry for the confusion.

seven1m · 2020-03-04T19:35:19Z

Oh, I see! OK that makes sense, thank you. I understand not wanting this bug in a release. 😬

I'll keep poking at it!

cjlarose · 2020-03-04T21:52:05Z

Thanks for taking the time to look into it. I have some notes about what I think might be happening that I thought I could share in case it helps.

Puma master process starts with no explicit value set for BUNDLE_GEMFILE
Assuming the master process started with bundle exec puma, Bundler::SharedHelpers#set_bundle_variables sets the environment variable BUNDLE_GEMFILE to the path of the Gemfile it found.
In prune_bundler, the value of BUNDLE_GEMFILE is propagated to the new puma-wild process.
Workers are forked from the puma-wild process, so they still have the value of BUNDLE_GEMFILE set.

I think (haven't verified) that Bundler might preserve the original value of BUNDLE_GEMFILE in the environment variable BUNDLER_ORIG_BUNDLE_GEMFILE. I think one thing that might work is to capture the value of ENV['BUNDLER_ORIG_BUNDLE_GEMFILE'] in the master process before it execs, and use that value (if it was present) to set BUNDLE_GEMFILE in the workers, but I'm not 100% sure.

seven1m · 2020-03-04T23:46:45Z

@cjlarose I took your approach in this commit and it seems to work. It makes your reproduction repo do what we expect (raise LoadError: cannot load such file -- active_support) and it still passes the test_preserve_bundler_env.rb test.

But wow this is a kludge. 😓 And I don't have a good simple testcase to add to the test suite either.

@nateberkopec do you have any advice here? Adding a new BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE env var seems like a kludge on top of a kludge.

nateberkopec · 2020-03-05T17:16:19Z

BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE

Haha, I don't have a better suggestion as you two have both dug into this issue far more than I have. I agree that this is not a great solution.

I'd also point out that #2120 is related here and will have to get merged eventually when working.

AlexWayfer · 2020-03-05T23:02:52Z

Hello all!

I glad to work on #2120 and resolve this… stuff with Bundler.

Please, describe for me, what to do and what you're expecting.

As I understood, you want to:

Have a commit aaa with Gemfile v1
Start a puma with workers and prune_bundler (with bundle exec or not? 🤔 without it'd be hard to resolve some git dependencies)
Pull a commit bbb with Gemfile v2
Phased restart puma with updated bundle for workers (master process is untouched)

Am I right?

The main question for me: is it OK? What if Gemfile v2 has puma update? And master process should be restarted? Is it something for manual resolution ("if we're updating bundle with puma let's restart puma completely")?

cjlarose · 2020-03-05T23:32:30Z

@AlexWayfer That summary sounds correct. That's how I typically use puma when I want to update my application in a environment that uses mutable infrastructure (reusing existing live servers).

What if Gemfile v2 has puma update?

This doesn't work. I think what happens is workers will try to activate the new version of puma, but since an old version is already activated, RubyGems throws an exception. Workers continue to die and spawn and die and spawn. This is actually true of any change in Gemfile v2 that changes the version of puma or any gem that is a runtime dependency of your puma master process (nio4r especially), but also anything else you choose to run in your puma master process (such as puma_worker_killer). This is a separate problem documented in #2018 and again in #1875.

Is it something for manual resolution?

Yes, typically. This is something devs might have to communicate to operators, unfortunately. If there's a puma upgrade, we have to perform normal or "hot" restarts, not "phased" restarts.

Also add tests for `GEM_HOME` environment variable preserving. Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133

cjlarose · 2020-03-06T00:49:35Z

@seven1m What is the value of the BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE environment variable? It is possible to just capture the value of ENV['BUNDLER_ORIG_BUNDLE_GEMFILE'] in prune_bundler as a local variable instead?

cjlarose@e6eb765

AlexWayfer · 2020-03-06T00:52:16Z

@seven1m What is the value of the BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE environment variable?

They suggest to add such variable. There is no value of it yet.

Also add tests for `GEM_HOME` environment variable preserving. Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133 Also disable local `path` for `bundle install` in CI, it makes `nio4r` unavailable.

Also add tests for `GEM_HOME` environment variable preserving. Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133 Also disable local `path` for `bundle install` in CI, it makes `nio4r` unavailable. Also install (update) `bundler` in CI, old versions are incompatible.

seven1m · 2020-03-06T01:55:50Z

@cjlarose your commit e6eb765 seems to work for my test. If it works for your use case, then awesome!

nateberkopec · 2020-03-07T00:14:08Z

I'd love to test this locally, but prune-bundler is looping infinitely for me.

 speedshop $ bundle exec ruby -Ilib bin/puma -b tcp://127.0.0.1:32112 -w 1 --prune-bundler test/rackup/hello.ru 
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment

cjlarose · 2020-03-07T16:21:45Z

@nateberkopec I wasn't able to reproduce that behavior locally on commit e6eb765.

Can you try again on my new PR (#2154)?

If it fails for you, can you list your Ruby version and Bundler version?

nateberkopec added the bug label Feb 28, 2020

nateberkopec mentioned this issue Mar 5, 2020

Replace deprecated Bundler.with_clean_env with with_unbundled_env #2120

Closed

8 tasks

AlexWayfer added a commit to AlexWayfer/puma that referenced this issue Mar 6, 2020

Replace deprecated Bundler.with_clean_env with with_unbundled_env

3e5b813

Also add tests for `GEM_HOME` environment variable preserving. Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133

cjlarose mentioned this issue Mar 7, 2020

Leave BUNDLE_GEMFILE unset in workers if it was unset in the master #2154

Merged

8 tasks

nateberkopec closed this as completed in #2154 Mar 10, 2020

dentarg mentioned this issue Mar 27, 2020

Regression in 4.3.3 with require json #2206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

cjlarose commented Feb 28, 2020 •

edited

nateberkopec commented Feb 28, 2020

cjlarose commented Feb 28, 2020

cjlarose commented Feb 28, 2020

seven1m commented Mar 4, 2020

seven1m commented Mar 4, 2020

cjlarose commented Mar 4, 2020 •

edited

seven1m commented Mar 4, 2020 •

edited

cjlarose commented Mar 4, 2020 •

edited

seven1m commented Mar 4, 2020

nateberkopec commented Mar 5, 2020

AlexWayfer commented Mar 5, 2020

cjlarose commented Mar 5, 2020

cjlarose commented Mar 6, 2020

AlexWayfer commented Mar 6, 2020

seven1m commented Mar 6, 2020

nateberkopec commented Mar 7, 2020

cjlarose commented Mar 7, 2020

BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

Comments

cjlarose commented Feb 28, 2020 • edited

nateberkopec commented Feb 28, 2020

cjlarose commented Feb 28, 2020

cjlarose commented Feb 28, 2020

seven1m commented Mar 4, 2020

seven1m commented Mar 4, 2020

cjlarose commented Mar 4, 2020 • edited

seven1m commented Mar 4, 2020 • edited

cjlarose commented Mar 4, 2020 • edited

seven1m commented Mar 4, 2020

nateberkopec commented Mar 5, 2020

AlexWayfer commented Mar 5, 2020

cjlarose commented Mar 5, 2020

cjlarose commented Mar 6, 2020

AlexWayfer commented Mar 6, 2020

seven1m commented Mar 6, 2020

nateberkopec commented Mar 7, 2020

cjlarose commented Mar 7, 2020

cjlarose commented Feb 28, 2020 •

edited

cjlarose commented Mar 4, 2020 •

edited

seven1m commented Mar 4, 2020 •

edited

cjlarose commented Mar 4, 2020 •

edited