Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

Closed
cjlarose opened this issue Feb 28, 2020 · 17 comments · Fixed by #2154
Labels

Comments

@cjlarose
Copy link
Member

cjlarose commented Feb 28, 2020

Describe the bug

Before #1893, if you used prune_bundler and started bundle exec puma without explicitly setting BUNDLE_GEMFILE, workers would not have a value set for the environment variable BUNDLE_GEMFILE. This is desirable when you want workers to use the Gemfile located in the application directory, especially if that directory is different from the directory that you started the puma master process in (such as with a phased restart).

After #1893, workers have BUNDLE_GEMFILE unexpectedly set to the path of the Gemfile used by the master process. I think this forces bundler in the workers to use that Gemfile instead of the one in the worker's working directory as it would if the environment variable was unset.

I think the intent of that PR was that if you start your server with BUNDLE_GEMFILE=Gemfile.rails6 bundle exec puma, that workers would have BUNDLE_GEMFILE set to the same value. But it ignores the fact that Bundler itself sets a value for ENV['BUNDLE_GEMFILE'] even if it wasn't specified at startup time. Workers now end up inheriting this value, when they didn't before.

One consequence of this bug is that it is no longer possible to delete from your server the Gemfile that was used when you booted the puma master process. If you do, workers fail with the next phased restart. I imagine there are other bugs related to being able to change gems in the Gemfile for new versions of your application, but I haven't been able to reproduce those.

Puma config:

port 3000
workers 1
prune_bundler
directory '/usr/src/releases/current'

on_worker_boot do
  pp "BUNDLE_GEMFILE: #{ENV['BUNDLE_GEMFILE']}"
end

To Reproduce

I wrote a script that reproduces the error here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/regression-1893 (I reused a repo from an unrelated issue)

Before #1893 , workers boot and print BUNDLE_GEMFILE: . After that PR, workers boot and print BUNDLE_GEMFILE: /usr/src/releases/1582850222334837300/Gemfile, referring to the path of the Gemfile that was used by the puma master process.

Expected behavior

If BUNDLE_GEMFILE was unspecified when you started the master process, it should remain unspecified in the worker processes.

Desktop (please complete the following information):

@nateberkopec
Copy link
Member

Er, 4.3.2 only contains the security issue fix mentioned in the History.

https://github.com/puma/puma/compare/v4.3.1..v4.3.2

@cjlarose
Copy link
Member Author

Oh yeah, that's my bad. The issue is in master, but not yet in a released version of puma.

@cjlarose
Copy link
Member Author

I updated my repro app to demonstrate that the old Gemfile seems to be active in the new workers--removing activesupport from the Gemfile and Gemfile.lock has no effect since the new workers read the old Gemfile.

@seven1m
Copy link
Contributor

seven1m commented Mar 4, 2020

Yikes, I can have a go at fixing this since I broke it in #1893.

@seven1m
Copy link
Contributor

seven1m commented Mar 4, 2020

@cjlarose I pulled down your repo and switched to the regression-1893 branch you linked.

I see that the Gemfile path does not update as expected, so that seems like a good reproduction.

However, when I comment out the one line of substance from #1893, your reproduction fails with:

Starting phased restart
[1753] - Starting phased worker restart, phase: 1
[1753] + Changing to /usr/src/releases/current
Command phased-restart sent success
[1753] - Stopping 1755 for phased upgrade...
[1753] - TERM sent to 1755...
[1776] + Gemfile in context: /usr/src/releases/1583341290066060201/Gemfile
"BUNDLE_GEMFILE: "
[1753] - Worker 0 (pid: 1776) booted, phase: 1
Testing connection to puma master
2020-03-04 17:01:41 +0000: Rack app error handling request { GET / }
#<LoadError: cannot load such file -- active_support>
config.ru:5:in `require'
config.ru:5:in `block (2 levels) in <main>'
/usr/src/releases/1583341290066060201/.bundle/ruby/2.7.0/gems/rack-2.0.9/lib/rack/content_length.rb:15:in `call'
/usr/src/releases/1583341290066060201/.bundle/ruby/2.7.0/gems/rack-2.0.9/lib/rack/reloader.rb:45:in `call'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/configuration.rb:228:in `call'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/server.rb:688:in `handle_request'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/server.rb:476:in `process_client'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/server.rb:332:in `block in run'
/usr/src/releases/1583341279318312627/.bundle/ruby/2.7.0/bundler/gems/puma-317c1cf63931/lib/puma/thread_pool.rb:134:in `block in spawn_thread'

I think I do see what you are saying, and I will continue to try to write a good test to prove the problem, but I just wanted to let you know that the reproduction isn't quite good enough to go on at the moment.

@cjlarose
Copy link
Member Author

cjlarose commented Mar 4, 2020

@seven1m Yeah I'm sorry that I made the repro kinda hastily. I figured it was better to document what I found quickly before the regression made it into a puma release to limit impact.

That said, I think that failure you're seeing is actually the expected behavior. The first version of the application contains activesupport in the Gemfile and Gemfile.lock. This is the version of the application that workers boot up in when the puma master process is started.

Then the script prepares a second version of the application that removes activesupport from the Gemfile and performs a bundle install. Then, we perform a phased restart.

If the new workers are using the new Gemfile, then I expect them to fail when they require 'active_support'. This is the behavior I experience with Puma prior to #1893. However, if you include the changes from #1893, workers just boot up successfully (they're able to require 'active_support', even though it should not be accessible).

In retrospect, it probably would have been more clear if I made it so that the new version of the application instead required a new gem that wasn't present in the old version. That's my mistake. Sorry for the confusion.

@seven1m
Copy link
Contributor

seven1m commented Mar 4, 2020

Oh, I see! OK that makes sense, thank you. I understand not wanting this bug in a release. 😬

I'll keep poking at it!

@cjlarose
Copy link
Member Author

cjlarose commented Mar 4, 2020

Thanks for taking the time to look into it. I have some notes about what I think might be happening that I thought I could share in case it helps.

  1. Puma master process starts with no explicit value set for BUNDLE_GEMFILE
  2. Assuming the master process started with bundle exec puma, Bundler::SharedHelpers#set_bundle_variables sets the environment variable BUNDLE_GEMFILE to the path of the Gemfile it found.
  3. In prune_bundler, the value of BUNDLE_GEMFILE is propagated to the new puma-wild process.
  4. Workers are forked from the puma-wild process, so they still have the value of BUNDLE_GEMFILE set.

I think (haven't verified) that Bundler might preserve the original value of BUNDLE_GEMFILE in the environment variable BUNDLER_ORIG_BUNDLE_GEMFILE. I think one thing that might work is to capture the value of ENV['BUNDLER_ORIG_BUNDLE_GEMFILE'] in the master process before it execs, and use that value (if it was present) to set BUNDLE_GEMFILE in the workers, but I'm not 100% sure.

@seven1m
Copy link
Contributor

seven1m commented Mar 4, 2020

@cjlarose I took your approach in this commit and it seems to work. It makes your reproduction repo do what we expect (raise LoadError: cannot load such file -- active_support) and it still passes the test_preserve_bundler_env.rb test.

But wow this is a kludge. 😓 And I don't have a good simple testcase to add to the test suite either.

@nateberkopec do you have any advice here? Adding a new BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE env var seems like a kludge on top of a kludge.

@nateberkopec
Copy link
Member

BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE

Haha, I don't have a better suggestion as you two have both dug into this issue far more than I have. I agree that this is not a great solution.

I'd also point out that #2120 is related here and will have to get merged eventually when working.

@AlexWayfer
Copy link
Contributor

Hello all!

I glad to work on #2120 and resolve this… stuff with Bundler.

Please, describe for me, what to do and what you're expecting.

As I understood, you want to:

  1. Have a commit aaa with Gemfile v1
  2. Start a puma with workers and prune_bundler (with bundle exec or not? 🤔 without it'd be hard to resolve some git dependencies)
  3. Pull a commit bbb with Gemfile v2
  4. Phased restart puma with updated bundle for workers (master process is untouched)

Am I right?

The main question for me: is it OK? What if Gemfile v2 has puma update? And master process should be restarted? Is it something for manual resolution ("if we're updating bundle with puma let's restart puma completely")?

@cjlarose
Copy link
Member Author

cjlarose commented Mar 5, 2020

@AlexWayfer That summary sounds correct. That's how I typically use puma when I want to update my application in a environment that uses mutable infrastructure (reusing existing live servers).

What if Gemfile v2 has puma update?

This doesn't work. I think what happens is workers will try to activate the new version of puma, but since an old version is already activated, RubyGems throws an exception. Workers continue to die and spawn and die and spawn. This is actually true of any change in Gemfile v2 that changes the version of puma or any gem that is a runtime dependency of your puma master process (nio4r especially), but also anything else you choose to run in your puma master process (such as puma_worker_killer). This is a separate problem documented in #2018 and again in #1875.

Is it something for manual resolution?

Yes, typically. This is something devs might have to communicate to operators, unfortunately. If there's a puma upgrade, we have to perform normal or "hot" restarts, not "phased" restarts.

AlexWayfer added a commit to AlexWayfer/puma that referenced this issue Mar 6, 2020
Also add tests for `GEM_HOME` environment variable preserving.

Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133
@cjlarose
Copy link
Member Author

cjlarose commented Mar 6, 2020

@seven1m What is the value of the BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE environment variable? It is possible to just capture the value of ENV['BUNDLER_ORIG_BUNDLE_GEMFILE'] in prune_bundler as a local variable instead?

cjlarose@e6eb765

@AlexWayfer
Copy link
Contributor

@seven1m What is the value of the BUNDLER_ORIG_ORIG_BUNDLE_GEMFILE environment variable?

They suggest to add such variable. There is no value of it yet.

AlexWayfer added a commit to AlexWayfer/puma that referenced this issue Mar 6, 2020
Also add tests for `GEM_HOME` environment variable preserving.

Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133

Also disable local `path` for `bundle install` in CI,
it makes `nio4r` unavailable.
AlexWayfer added a commit to AlexWayfer/puma that referenced this issue Mar 6, 2020
Also add tests for `GEM_HOME` environment variable preserving.

Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133

Also disable local `path` for `bundle install` in CI,
it makes `nio4r` unavailable.

Also install (update) `bundler` in CI,
old versions are incompatible.
AlexWayfer added a commit to AlexWayfer/puma that referenced this issue Mar 6, 2020
Also add tests for `GEM_HOME` environment variable preserving.

Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133

Also disable local `path` for `bundle install` in CI,
it makes `nio4r` unavailable.

Also install (update) `bundler` in CI,
old versions are incompatible.
AlexWayfer added a commit to AlexWayfer/puma that referenced this issue Mar 6, 2020
Also add tests for `GEM_HOME` environment variable preserving.

Also drop `BUNDLE_GEMFILE` preserving, resolve puma#2133

Also disable local `path` for `bundle install` in CI,
it makes `nio4r` unavailable.

Also install (update) `bundler` in CI,
old versions are incompatible.
@seven1m
Copy link
Contributor

seven1m commented Mar 6, 2020

@cjlarose your commit e6eb765 seems to work for my test. If it works for your use case, then awesome!

@nateberkopec
Copy link
Member

I'd love to test this locally, but prune-bundler is looping infinitely for me.

 speedshop $ bundle exec ruby -Ilib bin/puma -b tcp://127.0.0.1:32112 -w 1 --prune-bundler test/rackup/hello.ru 
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment
[8016] * Pruning Bundler environment

@cjlarose
Copy link
Member Author

cjlarose commented Mar 7, 2020

@nateberkopec I wasn't able to reproduce that behavior locally on commit e6eb765.

Can you try again on my new PR (#2154)?

If it fails for you, can you list your Ruby version and Bundler version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants