New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new worker processes in separate memory spaces from master process #2407
Conversation
Re last CI run, the only MRI failure (ubuntu-18.04 2.6) was due to incorrect code, but never noticed before. See b17579010ab. Due to timing & luck, it's passed quite a bit previously... |
@MSP-Greg Thanks for that! I rebased to include your commit. |
...in clusters that support phased restarts. If worker processes are created from the puma master process with just a `fork`, then any gems loaded by the puma master are already "activated" in the workers. This is mitigated in part through the use of the `prune_bundler` configuration option, which ejects the puma master from the application bundler context on startup, unloading all gems except those used by the puma master process (typically, that's nio4r as well as anything specified in `extra_runtime_dependencies`). A fairly typical release strategy demonstrates the incompleteness of `prune_bundler`. Imagine that in order to release a new version of an application, users add the new version to the server, change a release symlink to point the new version, initiate a phased restart on the cluster, and then delete any old version of the application from disk (including the gems). When a worker starts up and executes `Bundler.setup`, it'll try to verify that each gem with native extensions has been compiled by inspecting the gem directory on disk. Since workers necessarily share the gems activated by the puma master, Bundler will look for the gems on disk asscociated with the release that was active at the time the puma master process started. If those gems have since been deleted, Bundler raises `GemNotFound` and the workers fail to boot. This happens most frequently with the nio4r gem since it is 1) loaded by the puma master process (even when `prune_bundler` is enabled), and 2) it has native extensions. But the same can happen to any gem that matches those two conditions, such as the `ffi` gem if the puma master is configured to use `puma_worker_killer`. The fix introduced in this commit creates workers using `spawn` instead of `fork`. `spawn` creates a new subprocess with a separate memory space (`spawn` uses `fork` then `exec` on Unix-like systems). The result is that workers share no gems with the puma master. Workers can therefore start up successfully even if the master's gems have been removed from disk. It's also possible to use different versions of gems in the workers than what's loaded into the master. The cost is some additional memory usage since workers that fork from the master process can't leverage COW memory optimization. However, the old `fork`-based method is still used in two cases: 1) If `preload_app` is enabled, since if `preload_app` is enabled, phased restarts aren't supported anyway. This option behaves as it did before: it forgoes the ability to perform phased restarts but maximizes the opportunity to use COW memory. 2) When workers fork from other workers (when `fork_worker` is enabled). Similar to `preload_app`, the main benefit of this option is to reduce memory by leveraging COW memory. The new spawn-based strategy is used only when creating `worker 0`, the process from which other workers fork.
Workers created by `fork`ing from the puma master process inherit all of the signal handlers established by in `Launcher#setup_signals`. But since workers can now be created via `spawn`, those workers do not automatically inherit those signal handlers, so we set up SIGINFO here explicitly. Other signal handlers are established already by `Worker#run`.
Because workers are created in a new process image via `spawn`, they can't easily read an in-memory Configuration object in the puma master process. Instead, the workers need to read the configuration from the command-line args or from a config file.
The original motivation for `extra_runtime_dependencies` was to activate gems in the puma master process after "pruning" the master process with `prune_bundler`. This is useful for activating gems that need to be loaded in the master process, such as `puma_worker_killer`. The change to use `spawn` to create new worker processes does _not_ break this: `extra_runtime_dependencies` still works in that the gem will still be activated as expected in the puma master process. However, a side-effect of `extra_runtime_dependencies` was that those dependencies would be loaded in puma worker processes as well (since workers were just forked from the master). The integration test was testing that `extra_runtime_dependencies` were loaded in the worker processes instead of in the master process. This change modifies the test to make assertions on the master processes' gems directly.
It's no longer necessary to support phased restarts.
Is that even worth it? The amount of code/headache from that feature is considerable. We could eject this feature. |
What effects will this have on how signal handling currently works in Puma? I'm particularly thinking around sending signals to the master process. |
Yeah I think the removal of Importantly, though, I think this PR doesn't introduce any breaking changes as far as semver goes. Removing |
I'll look more into this. |
It feels like this is a downstream fix to an upstream problem. Is there not some bundler or Ruby API that we can use to tell our process to do the right thing?
Could you reproduce this behavior outside of puma using only bundler? If so then we could report it upstream to bundler, and maybe we can also brainstorm on some workarounds here. My initial gut reaction is: I want shared memory whenever possible. Processes are expensive (from a memory standpoint) so I want to benefit from CoW whenever possible. |
Yeah I can create a small reproducible script for this and report it to Bundler.
I think you might be right. This change is essentially a trade-off that'll cost puma more memory, but gives us the following:
Fixing the bug in Bundler (or finding an appropriate workaround in puma) wouldn't give us these other benefits, but it would help make it possible for folks who are stuck on 3.x to upgrade. I'm totally cool with accepting that the proposed change might not be worth the cost. Also, I should mention that it's also totally possible to fix the original problem described in #2018 by just not loading the |
Opened rubygems/rubygems#4004 for the Bundler bug ( |
Just in general, removing |
That sounds good. Let's close this and I'll take another stab at #2125 (I'll open a separate PR I think and fix the tests and rebase and everything). If the bundler bug gets fixed, too, that's great, but we don't need to make that our blocker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sss
Description
Fixes #2018
A fairly common and reasonable deployment strategy has been broken since the introduction of
nio4r
in puma 4.0. Imagine that a user is using puma in cluster mode. In order to deploy a new version of an application, a user's deployment tool does this:directory
in the config file)During the next deployment, workers will fail to start. They'll encounter the error
Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
. This is, in part, due to a quirk in Bundler where Bundler will raise this error if you're trying to activate a gem whose native extensions are no longer present on disk (even if the gem is already loaded into memory). However, this exception reveals a deeper problem related to the intimate relationship between the gems used by the puma master process and those used by the puma worker processes.Users can, in part, separate the gems used in workers from those used in the puma master by turning on
prune_bundler
. That option will "eject" the puma master process from the full Bundler context, re-activating only the small number of gems used by the puma master process, includingpuma
,nio4r
, and gems added toextra_runtime_dependencies
. TheGemNotFound
issue, however reveals that evenprune_bundler
isn't always sufficient in severing the relationship between gems used by the master and worker processes. In addition to being unable to delete gems from old releases, the existingprune_bundler
-based solution doesn't allow a user to upgrade the version ofnio4r
or any gem inextra_runtime_dependencies
with a phased restart.The solution proposed in this PR is to place worker processes in a completely separate memory space. The proposed change uses
spawn
to spin up new workers (this is essentially afork
followed by anexec
on Unix-like systems). Since workers use a separate memory space, the gems they load can be totally different than those loaded in the puma master process. The drawback, of course, is that thisspawn
-based stategy doesn't allow one to take advantage of COW memory between the puma master and the puma workers. For this reason, we still use the oldfork
method in two cases:preload_app
is enabled, since ifpreload_app
is enabled, phased restarts aren't supported anyway. This option behaves as it did before: it forgoes the ability to perform phased restarts in exchange for minimizing the amount of memory used by fully leveraging the benefit of COW memory.fork_worker
is enabled). Similar topreload_app
, the main benefit of this option is to reduce memory by leveraging COW memory. The new spawn-based strategy is used only when creatingworker 0
, the process from which other workers fork. Subsequent workers areforked
in the traditional way fromworker 0
.What does this mean for
prune_bundler
? Before this change, it was necessary to useprune_bundler
if you wanted phased restarts and the ability to change gem versions between different releases of your application. After this change,prune_bundler
won't be necessary in this case.prune_bundler
will essentially just be a memory optimization for the puma master process since it'll shed some gems from the puma master.Your checklist for this pull request
[changelog skip]
or[ci skip]
to the pull request title.[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.