Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new worker processes in separate memory spaces from master process #2407

Closed
wants to merge 8 commits into from

Conversation

cjlarose
Copy link
Member

@cjlarose cjlarose commented Oct 2, 2020

Description

Fixes #2018

A fairly common and reasonable deployment strategy has been broken since the introduction of nio4r in puma 4.0. Imagine that a user is using puma in cluster mode. In order to deploy a new version of an application, a user's deployment tool does this:

  1. Add new version of the application's source to the server in a new directory
  2. Update the "current release" symlink to point to the new release directory (the symlink is configured as puma's directory in the config file)
  3. Issue a phased restart
  4. After verifying that the release was successful, delete the old version of the application from disk

During the next deployment, workers will fail to start. They'll encounter the error Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound). This is, in part, due to a quirk in Bundler where Bundler will raise this error if you're trying to activate a gem whose native extensions are no longer present on disk (even if the gem is already loaded into memory). However, this exception reveals a deeper problem related to the intimate relationship between the gems used by the puma master process and those used by the puma worker processes.

Users can, in part, separate the gems used in workers from those used in the puma master by turning on prune_bundler. That option will "eject" the puma master process from the full Bundler context, re-activating only the small number of gems used by the puma master process, including puma, nio4r, and gems added to extra_runtime_dependencies. The GemNotFound issue, however reveals that even prune_bundler isn't always sufficient in severing the relationship between gems used by the master and worker processes. In addition to being unable to delete gems from old releases, the existing prune_bundler-based solution doesn't allow a user to upgrade the version of nio4r or any gem in extra_runtime_dependencies with a phased restart.

The solution proposed in this PR is to place worker processes in a completely separate memory space. The proposed change uses spawn to spin up new workers (this is essentially a fork followed by an exec on Unix-like systems). Since workers use a separate memory space, the gems they load can be totally different than those loaded in the puma master process. The drawback, of course, is that this spawn-based stategy doesn't allow one to take advantage of COW memory between the puma master and the puma workers. For this reason, we still use the old fork method in two cases:

  1. If preload_app is enabled, since if preload_app is enabled, phased restarts aren't supported anyway. This option behaves as it did before: it forgoes the ability to perform phased restarts in exchange for minimizing the amount of memory used by fully leveraging the benefit of COW memory.
  2. When workers fork from other workers (when fork_worker is enabled). Similar to preload_app, the main benefit of this option is to reduce memory by leveraging COW memory. The new spawn-based strategy is used only when creating worker 0, the process from which other workers fork. Subsequent workers are forked in the traditional way from worker 0.

What does this mean for prune_bundler? Before this change, it was necessary to use prune_bundler if you wanted phased restarts and the ability to change gem versions between different releases of your application. After this change, prune_bundler won't be necessary in this case. prune_bundler will essentially just be a memory optimization for the puma master process since it'll shed some gems from the puma master.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added an entry to History.md if this PR fixes a bug or adds a feature. If it doesn't need an entry to HISTORY.md, I have added [changelog skip] or [ci skip] to the pull request title.
  • I have added appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@MSP-Greg
Copy link
Member

MSP-Greg commented Oct 3, 2020

Re last CI run, the only MRI failure (ubuntu-18.04 2.6) was due to incorrect code, but never noticed before. See b17579010ab. Due to timing & luck, it's passed quite a bit previously...

@cjlarose
Copy link
Member Author

cjlarose commented Oct 3, 2020

@MSP-Greg Thanks for that! I rebased to include your commit.

...in clusters that support phased restarts.

If worker processes are created from the puma master process with just a
`fork`, then any gems loaded by the puma master are already "activated"
in the workers. This is mitigated in part through the use of the
`prune_bundler` configuration option, which ejects the puma master from
the application bundler context on startup, unloading all gems except
those used by the puma master process (typically, that's nio4r as well
as anything specified in `extra_runtime_dependencies`).

A fairly typical release strategy demonstrates the incompleteness of
`prune_bundler`. Imagine that in order to release a new version of an
application, users add the new version to the server, change a release
symlink to point the new version, initiate a phased restart on the
cluster, and then delete any old version of the application from disk
(including the gems). When a worker starts up and executes
`Bundler.setup`, it'll try to verify that each gem with native
extensions has been compiled by inspecting the gem directory on disk.
Since workers necessarily share the gems activated by the puma master,
Bundler will look for the gems on disk asscociated with the release that
was active at the time the puma master process started. If those gems
have since been deleted, Bundler raises `GemNotFound` and the workers
fail to boot.

This happens most frequently with the nio4r gem since it is 1) loaded
by the puma master process (even when `prune_bundler` is enabled), and
2) it has native extensions. But the same can happen to any gem that
matches those two conditions, such as the `ffi` gem if the puma master
is configured to use `puma_worker_killer`.

The fix introduced in this commit creates workers using `spawn`
instead of `fork`. `spawn` creates a new subprocess with a separate
memory space (`spawn` uses `fork` then `exec` on Unix-like
systems). The result is that workers share no gems with the puma master.
Workers can therefore start up successfully even if the master's gems
have been removed from disk. It's also possible to use different
versions of gems in the workers than what's loaded into the master.

The cost is some additional memory usage since workers that fork from
the master process can't leverage COW memory optimization. However, the
old `fork`-based method is still used in two cases:

1) If `preload_app` is enabled, since if `preload_app` is enabled,
   phased restarts aren't supported anyway. This option behaves as it
   did before: it forgoes the ability to perform phased restarts but
   maximizes the opportunity to use COW memory.
2) When workers fork from other workers (when `fork_worker` is enabled).
   Similar to `preload_app`, the main benefit of this option is to
   reduce memory by leveraging COW memory. The new spawn-based strategy
   is used only when creating `worker 0`, the process from which other
   workers fork.
Workers created by `fork`ing from the puma master process inherit all of
the signal handlers established by in `Launcher#setup_signals`. But
since workers can now be created via `spawn`, those workers do not
automatically inherit those signal handlers, so we set up SIGINFO here
explicitly. Other signal handlers are established already by
`Worker#run`.
Because workers are created in a new process image via `spawn`, they
can't easily read an in-memory Configuration object in the puma master
process.

Instead, the workers need to read the configuration from the
command-line args or from a config file.
The original motivation for `extra_runtime_dependencies` was to
activate gems in the puma master process after "pruning" the master
process with `prune_bundler`. This is useful for activating gems that
need to be loaded in the master process, such as `puma_worker_killer`.

The change to use `spawn` to create new worker processes does _not_
break this: `extra_runtime_dependencies` still works in that the gem
will still be activated as expected in the puma master process. However,
a side-effect of `extra_runtime_dependencies` was that those
dependencies would be loaded in puma worker processes as well (since
workers were just forked from the master).

The integration test was testing that `extra_runtime_dependencies` were
loaded in the worker processes instead of in the master process. This
change modifies the test to make assertions on the master processes'
gems directly.
It's no longer necessary to support phased restarts.
@nateberkopec
Copy link
Member

prune_bundler will essentially just be a memory optimization for the puma master process since it'll shed some gems from the puma master.

Is that even worth it? The amount of code/headache from that feature is considerable. We could eject this feature.

@nateberkopec
Copy link
Member

What effects will this have on how signal handling currently works in Puma? I'm particularly thinking around sending signals to the master process.

@cjlarose
Copy link
Member Author

cjlarose commented Oct 7, 2020

prune_bundler will essentially just be a memory optimization for the puma master process since it'll shed some gems from the puma master.

Is that even worth it? The amount of code/headache from that feature is considerable. We could eject this feature.

Yeah I think the removal of prune_bundler is worth considering. We can do some more testing and measuring separately on whether or not it really is much of a memory optimization, but my guess is that the cost of maintaining it would not longer be worth the benefit it provides.

Importantly, though, I think this PR doesn't introduce any breaking changes as far as semver goes. Removing prune_bundler would, so I think we'd have to wait at least until 6.0 to do it.

@cjlarose
Copy link
Member Author

cjlarose commented Oct 7, 2020

What effects will this have on how signal handling currently works in Puma? I'm particularly thinking around sending signals to the master process.

I'll look more into this.

@schneems
Copy link
Contributor

schneems commented Oct 7, 2020

It feels like this is a downstream fix to an upstream problem. Is there not some bundler or Ruby API that we can use to tell our process to do the right thing?

This is, in part, due to a quirk in Bundler where Bundler will raise this error if you're trying to activate a gem whose native extensions are no longer present on disk (even if the gem is already loaded into memory)

Could you reproduce this behavior outside of puma using only bundler? If so then we could report it upstream to bundler, and maybe we can also brainstorm on some workarounds here.

My initial gut reaction is: I want shared memory whenever possible. Processes are expensive (from a memory standpoint) so I want to benefit from CoW whenever possible.

@cjlarose
Copy link
Member Author

cjlarose commented Oct 7, 2020

Could you reproduce this behavior outside of puma using only bundler? If so then we could report it upstream to bundler, and maybe we can also brainstorm on some workarounds here.

Yeah I can create a small reproducible script for this and report it to Bundler.

My initial gut reaction is: I want shared memory whenever possible. Processes are expensive (from a memory standpoint) so I want to benefit from CoW whenever possible.

I think you might be right. This change is essentially a trade-off that'll cost puma more memory, but gives us the following:

  • The ability for users to upgrade gems in worker processes like nio4r without downtime (using a phased restart on cluster-mode puma)
  • Related: prevent some kinds of issues related to the application using different versions of gems from the puma master (especially the json gem, see Regression in 4.3.3 with require json #2206)
  • The potential to remove code related to prune_bundler and relieve ourselves of the maintenance burden of supporting it. This can simplify configuration for users, especially those that rely on extra_runtime_dependencies

Fixing the bug in Bundler (or finding an appropriate workaround in puma) wouldn't give us these other benefits, but it would help make it possible for folks who are stuck on 3.x to upgrade. I'm totally cool with accepting that the proposed change might not be worth the cost. Also, I should mention that it's also totally possible to fix the original problem described in #2018 by just not loading the nio4r gem in the master at all (workers need the gem, but the master never needs it). I proposed this in #2125 and it was initially rejected, but I think it's probably worth reconsidering since it'll help real people migrate their apps to puma 4.x or 5.x basically immediately.

@cjlarose
Copy link
Member Author

cjlarose commented Oct 8, 2020

Opened rubygems/rubygems#4004 for the Bundler bug (bundler shares a repository with rubygems these days, who knew?). We can see what the maintainers say there and decide how to move forward based on that.

@nateberkopec
Copy link
Member

Just in general, removing fork from Puma's base behavior is going to have to clear a very high bar to make me not nervous. It's up there with "we parse HTTP w/the old mongrel extension".

@nateberkopec
Copy link
Member

@cjlarose I think you're right about re-opening #2125. What do you think - close this, fix the bundler bug (or see where that goes), and reopen/merge 2125?

@cjlarose
Copy link
Member Author

That sounds good. Let's close this and I'll take another stab at #2125 (I'll open a separate PR I think and fix the tests and rebase and everything). If the bundler bug gets fixed, too, that's great, but we don't need to make that our blocker.

Copy link

@S3od22 S3od22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sss

@rmacklin
Copy link
Contributor

Let's close this and I'll take another stab at #2125 (I'll open a separate PR I think and fix the tests and rebase and everything).

For reference, that newer PR is #2427

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature waiting-for-review Waiting on review from anyone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
6 participants