Skip to content

Commit

Permalink
Clean up and format markdown documentation (puma#2714)
Browse files Browse the repository at this point in the history
* Revise and format architecture docs [ci skip]

* Fix typo and format compile options doc [ci skip]

* Revise and format deployment doc [ci skip]

* Revise and format plugins doc [ci skip]

* Fix mechanical typos [ci skip]

* Revise restart.md [ci skip]

* Revise signals.md [ci skip]

* Revise stats.md [ci skip]

* Revise and format systemd.md [ci skip]
  • Loading branch information
jacobherrington authored and JuanitoFatas committed Sep 9, 2022
1 parent 43050f6 commit d4a2967
Show file tree
Hide file tree
Showing 9 changed files with 210 additions and 179 deletions.
65 changes: 49 additions & 16 deletions docs/architecture.md
Expand Up @@ -4,38 +4,71 @@

![https://bit.ly/2iJuFky](images/puma-general-arch.png)

Puma is a threaded Ruby HTTP application server, processing requests across a TCP or UNIX socket.
Puma is a threaded Ruby HTTP application server processing requests across a TCP
and/or UNIX socket.


Puma processes (there can be one or many) accept connections from the socket via a thread (in the [`Reactor`](../lib/puma/reactor.rb) class). The connection, once fully buffered and read, moves in to the `todo` list, where it will be picked up by a free/waiting thread in the threadpool (the [`ThreadPool`](../lib/puma/thread_pool.rb) class).
Puma processes (there can be one or many) accept connections from the socket via
a thread (in the [`Reactor`](../lib/puma/reactor.rb) class). The connection,
once fully buffered and read, moves into the `todo` list, where an available
thread will pick it up (in the [`ThreadPool`](../lib/puma/thread_pool.rb)
class).

Puma works in two main modes: cluster and single. In single mode, only one Puma process is booted. In cluster mode, a `master` process is booted, which prepares (and may boot) the application, and then uses the `fork()` system call to create 1 or more `child` processes. These `child` processes all listen to the same socket. The `master` process does not listen to the socket or process requests - its purpose is mostly to manage and listen for UNIX signals and possibly kill or boot `child` processes.
Puma works in two main modes: cluster and single. In single mode, only one Puma
process boots. In cluster mode, a `master` process is booted, which prepares
(and may boot) the application and then uses the `fork()` system call to create
one or more `child` processes. These `child` processes all listen to the same
socket. The `master` process does not listen to the socket or process requests -
its purpose is primarily to manage and listen for UNIX signals and possibly kill
or boot `child` processes.

We sometimes call `child` processes (or Puma processes in `single` mode) _workers_, and we sometimes call the threads created by Puma's [`ThreadPool`](../lib/puma/thread_pool.rb) _worker threads_.
We sometimes call `child` processes (or Puma processes in `single` mode)
_workers_, and we sometimes call the threads created by Puma's
[`ThreadPool`](../lib/puma/thread_pool.rb) _worker threads_.

## How Requests Work

![https://bit.ly/2zwzhEK](images/puma-connection-flow.png)

* Upon startup, Puma listens on a TCP or UNIX socket.
* The backlog of this socket is configured (with a default of 1024). This determines the size of the queue for unaccepted connections. Generally, this setting is unimportant and will never be hit in production use. If the backlog is full, the connection will be refused by the operating system.
* This socket backlog is distinct from the `backlog` of work as reported by `Puma.stats` or the control server. The backlog as reported by Puma is the number of connections in the process' `todo` set waiting for a thread from the [`ThreadPool`](../lib/puma/thread_pool.rb).
* By default, a single, separate thread (created by the [`Reactor`](../lib/puma/reactor.rb) class) is used to read and buffer requests from the socket.
* When at least one worker thread is available for work, the reactor thread listens to the socket and accepts a request, if one is waiting.
* The backlog of this socket is configured (with a default of 1024). The
backlog determines the size of the queue for unaccepted connections.
Generally, you'll never hit the backlog cap in production. If the backlog is
full, the operating system refuses new connections.
* This socket backlog is distinct from the `backlog` of work as reported by
`Puma.stats` or the control server. The backlog that `Puma.stats` refers to
represents the number of connections in the process' `todo` set waiting for
a thread from the [`ThreadPool`](../lib/puma/thread_pool.rb).
* By default, a single, separate thread (created by the
[`Reactor`](../lib/puma/reactor.rb) class) reads and buffers requests from the
socket.
* When at least one worker thread is available for work, the reactor thread
listens to the socket and accepts a request (if one is waiting).
* The reactor thread waits for the entire HTTP request to be received.
* The time spent waiting for the HTTP request body to be received is exposed to the Rack app as `env['puma.request_body_wait']` (milliseconds).
* Once fully buffered and received, the connection is pushed into the "todo" set.
* Puma exposes the time spent waiting for the HTTP request body to be
received to the Rack app as `env['puma.request_body_wait']`
(milliseconds).
* Once fully buffered and received, the connection is pushed into the "todo"
set.
* Worker threads pop work off the "todo" set for processing.
* The worker thread processes the request via `call`ing the configured Rack application. The Rack application generates the HTTP response.
* The worker thread writes the response to the connection. Note that while Puma buffers requests via a separate thread, it does not use a separate thread for responses.
* Once done, the thread become available to process another connection in the "todo" set.
* The worker thread processes the request via `call`ing the configured Rack
application. The Rack application generates the HTTP response.
* The worker thread writes the response to the connection. While Puma buffers
requests via a separate thread, it does not use a separate thread for
responses.
* Once done, the thread becomes available to process another connection in the
"todo" set.

### `queue_requests`

![https://bit.ly/2zxCJ1Z](images/puma-connection-flow-no-reactor.png)

The `queue_requests` option is `true` by default, enabling the separate reactor thread used to buffer requests as described above.
The `queue_requests` option is `true` by default, enabling the separate reactor
thread used to buffer requests as described above.

If set to `false`, this buffer will not be used for connections while waiting for the request to arrive.
If set to `false`, this buffer will not be used for connections while waiting
for the request to arrive.

In this mode, when a connection is accepted, it is added to the "todo" queue immediately, and a worker will synchronously do any waiting necessary to read the HTTP request from the socket.
In this mode, when a connection is accepted, it is added to the "todo" queue
immediately, and a worker will synchronously do any waiting necessary to read
the HTTP request from the socket.
6 changes: 4 additions & 2 deletions docs/compile_options.md
@@ -1,10 +1,12 @@
# Compile Options

There are some `cflags` provided to change Puma's default configuration for its C extension.
There are some `cflags` provided to change Puma's default configuration for its
C extension.

## Query String, `PUMA_QUERY_STRING_MAX_LENGTH`

By default, the max length of `QUERY_STRING` is `1024 * 10`. But you may want to adjust it to allow accept larger queries in GET requests.
By default, the max length of `QUERY_STRING` is `1024 * 10`. But you may want to
adjust it to accept longer queries in GET requests.

For manual install, pass the `PUMA_QUERY_STRING_MAX_LENGTH` option like this:

Expand Down
105 changes: 53 additions & 52 deletions docs/deployment.md
@@ -1,43 +1,40 @@
# Deployment engineering for Puma

Puma is software that is expected to be run in a deployed environment eventually.
You can certainly use it as your dev server only, but most people look to use
it in their production deployments as well.
Puma expects to be run in a deployed environment eventually. You can use it as
your development server, but most people use it in their production deployments.

To that end, this is meant to serve as a foundation of wisdom how to do that
in a way that increases happiness and decreases downtime.
To that end, this document serves as a foundation of wisdom regarding deploying
Puma to production while increasing happiness and decreasing downtime.

## Specifying Puma

Most people want to do this by putting `gem "puma"` into their Gemfile, so we'll
go ahead and assume that. Go add it now... we'll wait.
Most people will specify Puma by including `gem "puma"` in a Gemfile, so we'll
assume this is how you're using Puma.

Welcome back!
## Single vs. Cluster mode

## Single vs Cluster mode
Initially, Puma was conceived as a thread-only web server, but support for
processes was added in version 2.

Puma was originally conceived as a thread-only web server, but grew the ability to
also use processes in version 2.
To run `puma` in single mode (i.e., as a development environment), set the
number of workers to 0; anything higher will run in cluster mode.

To run `puma` in single mode (e.g. for a development environment) you will need to
set the number of workers to 0, anything above will run in cluster mode.

Here are some rules of thumb for cluster mode:
Here are some tips for cluster mode:

### MRI

* Use cluster mode and set the number of workers to 1.5x the number of cpu cores
in the machine, minimum 2.
* Set the number of threads to desired concurrent requests / number of workers.
Puma defaults to 5 and that's a decent number.
* Use cluster mode and set the number of workers to 1.5x the number of CPU cores
in the machine, starting from a minimum of 2.
* Set the number of threads to desired concurrent requests/number of workers.
Puma defaults to 5, and that's a decent number.

#### Migrating from Unicorn

* If you're migrating from unicorn though, here are some settings to start with:
* Set workers to half the number of unicorn workers you're using
* Set threads to 2
* Enjoy 50% memory savings
* As you grow more confident in the thread safety of your app, you can tune the
* As you grow more confident in the thread-safety of your app, you can tune the
workers down and the threads up.

#### Ubuntu / Systemd (Systemctl) Installation
Expand All @@ -48,54 +45,58 @@ See [systemd.md](systemd.md)

**How do you know if you've got enough (or too many workers)?**

A good question. Due to MRI's GIL, only one thread can be executing Ruby code at a time.
But since so many apps are waiting on IO from DBs, etc., they can utilize threads
to make better use of the process.
A good question. Due to MRI's GIL, only one thread can be executing Ruby code at
a time. But since so many apps are waiting on IO from DBs, etc., they can
utilize threads to use the process more efficiently.

The rule of thumb is you never want processes that are pegged all the time. This
means that there is more work to do than the process can get through. On the other
hand, if you have processes that sit around doing nothing, then they're just eating
up resources.
Generally, you never want processes that are pegged all the time. That can mean
there is more work to do than the process can get through. On the other hand, if
you have processes that sit around doing nothing, then they're just eating up
resources.

Watch your CPU utilization over time and aim for about 70% on average. This means
you've got capacity still but aren't starving threads.
Watch your CPU utilization over time and aim for about 70% on average. 70%
utilization means you've got capacity still but aren't starving threads.

**Measuring utilization**

Using a timestamp header from an upstream proxy server (eg. nginx or haproxy), it's
possible to get an indication of how long requests have been waiting for a Puma
thread to become available.
Using a timestamp header from an upstream proxy server (e.g., `nginx` or
`haproxy`) makes it possible to indicate how long requests have been waiting for
a Puma thread to become available.

* Have your upstream proxy set a header with the time it received the request:
* nginx: `proxy_set_header X-Request-Start "${msec}";`
* haproxy >= 1.9: `http-request set-header X-Request-Start t=%[date()]%[date_us()]`
* haproxy >= 1.9: `http-request set-header X-Request-Start
t=%[date()]%[date_us()]`
* haproxy < 1.9: `http-request set-header X-Request-Start t=%[date()]`
* In your Rack middleware, determine the amount of time elapsed since `X-Request-Start`.
* To improve accuracy, you will want to subtract time spent waiting for slow clients:
* `env['puma.request_body_wait']` contains the number of milliseconds Puma spent
waiting for the client to send the request body.
* haproxy: `%Th` (TLS handshake time) and `%Ti` (idle time before request) can
can also be added as headers.
* In your Rack middleware, determine the amount of time elapsed since
`X-Request-Start`.
* To improve accuracy, you will want to subtract time spent waiting for slow
clients:
* `env['puma.request_body_wait']` contains the number of milliseconds Puma
spent waiting for the client to send the request body.
* haproxy: `%Th` (TLS handshake time) and `%Ti` (idle time before request)
can can also be added as headers.

## Should I daemonize?

Daemonization was removed in Puma 5.0. For alternatives, continue reading.
The Puma 5.0 release removed daemonization. For older versions and alternatives,
continue reading.

I prefer to not daemonize my servers and use something like `runit` or `systemd` to
monitor them as child processes. This gives them fast response to crashes and
I prefer not to daemonize my servers and use something like `runit` or `systemd`
to monitor them as child processes. This gives them fast response to crashes and
makes it easy to figure out what is going on. Additionally, unlike `unicorn`,
puma does not require daemonization to do zero-downtime restarts.
Puma does not require daemonization to do zero-downtime restarts.

I see people using daemonization because they start puma directly via capistrano
task and thus want it to live on past the `cap deploy`. To these people I say:
You need to be using a process monitor. Nothing is making sure puma stays up in
this scenario! You're just waiting for something weird to happen, puma to die,
and to get paged at 3am. Do yourself a favor, at least the process monitoring
your OS comes with, be it `sysvinit` or `systemd`. Or branch out
and use `runit` or hell, even `monit`.
I see people using daemonization because they start puma directly via Capistrano
task and thus want it to live on past the `cap deploy`. To these people, I say:
You need to be using a process monitor. Nothing is making sure Puma stays up in
this scenario! You're just waiting for something weird to happen, Puma to die,
and to get paged at 3 AM. Do yourself a favor, at least the process monitoring
your OS comes with, be it `sysvinit` or `systemd`. Or branch out and use `runit`
or hell, even `monit`.

## Restarting

You probably will want to deploy some new code at some point, and you'd like
puma to start running that new code. There are a few options for restarting
puma, described separately in our [restart documentation](restart.md).
Puma to start running that new code. There are a few options for restarting
Puma, described separately in our [restart documentation](restart.md).
30 changes: 15 additions & 15 deletions docs/plugins.md
Expand Up @@ -3,36 +3,36 @@
Puma 3.0 added support for plugins that can augment configuration and service
operations.

2 canonical plugins to look to aid in development of further plugins:
There are two canonical plugins to aid in the development of new plugins:

* [tmp\_restart](https://github.com/puma/puma/blob/master/lib/puma/plugin/tmp_restart.rb):
Restarts the server if the file `tmp/restart.txt` is touched
* [heroku](https://github.com/puma/puma-heroku/blob/master/lib/puma/plugin/heroku.rb):
Packages up the default configuration used by puma on Heroku (being sunset with the release of Puma 5.0)
Packages up the default configuration used by Puma on Heroku (being sunset
with the release of Puma 5.0)

Plugins are activated in a puma configuration file (such as `config/puma.rb'`)
Plugins are activated in a Puma configuration file (such as `config/puma.rb'`)
by adding `plugin "name"`, such as `plugin "heroku"`.

Plugins are activated based simply on path requirements so, activating the
`heroku` plugin will simply be doing `require "puma/plugin/heroku"`. This
allows gems to provide multiple plugins (as well as unrelated gems to provide
puma plugins).
Plugins are activated based on path requirements so, activating the `heroku`
plugin is much like `require "puma/plugin/heroku"`. This allows gems to provide
multiple plugins (as well as unrelated gems to provide Puma plugins).

The `tmp_restart` plugin is bundled with puma, so it can always be used.
The `tmp_restart` plugin comes with Puma, so it is always available.

To use the `heroku` plugin, add `puma-heroku` to your Gemfile or install it.

### API

## Server-wide hooks

Plugins can use a couple of hooks at server level: `start` and `config`.
Plugins can use a couple of hooks at the server level: `start` and `config`.

`start` runs when the server has started and allows the plugin to start other
functionality to augment puma.
`start` runs when the server has started and allows the plugin to initiate other
functionality to augment Puma.

`config` runs when the server is being configured and is passed a `Puma::DSL`
object that can be used to add additional configuration.
`config` runs when the server is being configured and receives a `Puma::DSL`
object that is useful for additional configuration.

Any public methods in [`Puma::Plugin`](../lib/puma/plugin.rb) are the public API that any plugin may
use.
Public methods in [`Puma::Plugin`](../lib/puma/plugin.rb) are treated as a
public API for plugins.
5 changes: 2 additions & 3 deletions docs/rails_dev_mode.md
Expand Up @@ -2,16 +2,15 @@

## "Loopback requests"

Be cautious of "loopback requests", where a Rails application executes a request to a server that in turn, results in another request back to the same Rails application before the first request is completed. Having a loopback request will trigger [Rails' load interlock](https://guides.rubyonrails.org/threading_and_code_execution.html#load-interlock) mechanism. The load interlock mechanism prevents a thread from using Rails autoloading mechanism to load constants while the application code is still running inside another thread.
Be cautious of "loopback requests," where a Rails application executes a request to a server that, in turn, results in another request back to the same Rails application before the first request completes. Having a loopback request will trigger [Rails' load interlock](https://guides.rubyonrails.org/threading_and_code_execution.html#load-interlock) mechanism. The load interlock mechanism prevents a thread from using Rails autoloading mechanism to load constants while the application code is still running inside another thread.

This issue only occurs in the development environment as Rails' load interlock is not used in production environments. Although we're not sure, we believe this issue may not occur with the new `zeitwerk` code loader.

### Solutions


#### 1. Bypass Rails' load interlock with `.permit_concurrent_loads`

Wrap the first request inside a block that will allow concurrent loads, [`ActiveSupport::Dependencies.interlock.permit_concurrent_loads`](https://guides.rubyonrails.org/threading_and_code_execution.html#permit-concurrent-loads). Anything wrapped inside the `.permit_concurrent_loads` block will bypass the load interlock mechanism, allowing new threads to access the Rails environment and boot properly.
Wrap the first request inside a block that will allow concurrent loads: [`ActiveSupport::Dependencies.interlock.permit_concurrent_loads`](https://guides.rubyonrails.org/threading_and_code_execution.html#permit-concurrent-loads). Anything wrapped inside the `.permit_concurrent_loads` block will bypass the load interlock mechanism, allowing new threads to access the Rails environment and boot properly.

###### Example

Expand Down

0 comments on commit d4a2967

Please sign in to comment.