Puma 5 Experimental Feature Reports #2258

nateberkopec · 2020-05-12T01:30:43Z

Puma 5 contains 3 new experimental performance features:

fork_worker option and refork command for reduced memory usage by forking from a worker process instead of the master process. (Experiment: Fork from existing worker process to optimize copy-on-write performance #2099) Additional docs on this feature are here. Intended result: If enabled, should reduce memory usage.
wait_for_less_busy_worker config. This may reduce latency on MRI through inserting a small delay before re-listening on the socket if worker is busy (Inject small delay for busy workers to improve requests distribution #2079). Intended result: If enabled, should reduce latency in high-load (>50% utilization) Puma clusters.
Added nakayoshi_fork config option. Reduce memory usage in preloaded cluster-mode apps by GCing before fork and compacting, where available. (Manually compact GC before fork #2093, Added nakayoshi_fork option. #2256) Intended result: If enabled, should reduce memory usage.

Note that all of these experiments are only for cluster mode Puma configs running on MRI.

Part of the reason we're calling them experimental is because we're not sure if they'll actually have any benefit. People's workloads in the real world are often not what we anticipate, and synthetic benchmarks are usually not of any help in figuring out if a change will be beneficial or not.

We do not believe any of the new features will have a negative effect or impact the stability of your application. This is either a "it works" or "it does nothing" experiment.

If any of the features turn out to be particularly beneficial, we may make them defaults in future versions of Puma.

If you are using any of the 3 new features, please post before and after results or screenshots to this issue. Note that "it didn't do anything" is still a useful report. Posting ~24 hours of "before" and ~24 hours of "after" data is preferred.

The text was updated successfully, but these errors were encountered:

nateberkopec · 2020-05-12T03:53:44Z

Codetriage memory before/after adding nakayoshi_fork seems more or less unaffected

Codetriage initially had a wee latency spike (2nd to last deploy on the right here) but since calmed down, I'm going to chalk it up to random noise or possibly a bad neighbor (it runs on Heroku 1x dynos)

wjordan · 2020-05-15T20:32:06Z

Here's memory usage of a production Rails app on a large EC2 instance (m5.12xlarge = 48 vCPUs / 192GB RAM), testing out the fork_worker option (with the fix listed in #2267) comparing against another baseline server running the same workload:

Triggering refork at 12:36 (and at 12:30 for a smaller app hosted on the same server) resulted in a ~20-30% reduction in memory utilization (starting at ~30%, narrowing to ~17% below baseline after about an hour). Triggering again at 14:15 reduces the memory utilization again, which starts to increase again at roughly the same pace over the next 45min.

There might be different ways to use refork to optimize memory utilization- invoke once after n requests for a baseline improvement (that may level off a bit over time as ongoing GC eats away at the copy-on-write savings), or invoke frequently: after every n requests, or whenever utilization exceeds a target threshold, or whenever the server load dips below a target threshold. Since refork is a relatively lightweight operation (compared to a full app reload), depending on the workload the performance impact might actually be minimal to run it rather frequently (kind of like a server-wide GC operation).

nateberkopec · 2020-05-19T00:49:49Z

After fixing the config bug in nakayoshi_fork, Codetriage is now showing about a 10% reduction in memory usage 👍

fluke · 2020-06-06T05:32:46Z

After setting nakayoshi_fork(true) on Puma 5.0.0.beta1 I'm seeing a ~17% reduction in memory. I'm able to run another worker within the same Heroku 2x dyno.

Edit: I went from Puma 4 directly to Puma 5 with nakayoshi_fork so some of that might just be from the version upgrade.

danmayer · 2020-07-09T14:36:58Z

No difference seen with wait_for_less_busy_worker but CPU utilization was already less than 40%... So load might not have been high enough.

nateberkopec · 2020-07-16T01:34:57Z

Yeah, you probably won't see a big difference if utilization isn't higher than ~50-60%.

danmayer · 2020-07-16T02:25:29Z

We were looking at using Puma 5 and this feature because we have been seeing some odd perf issues related to Puma queues... I can provide more data and capture more metrics, but probably best to file that in a separate issue, it is safe to say the tail latency issues we have seen are not affected by these performance features.

oskarpearson · 2020-09-25T14:57:27Z

We attempted to use nakayoshi_fork today, and found some very strange behaviour, seemingly related to
https://github.com/rgeo/activerecord-postgis-adapter and rgeo-activerecord and the rgeo gem.

Most notably, we received an exception in https://github.com/rgeo/rgeo/blob/v2.1.1/lib/rgeo/wkrep/wkb_generator.rb#L159 where the exception raised by raise Error::ParseError, "Unrecognized Geometry Type: #{type}" was this bizarre string:

Unrecognized Geometry Type: # this method if you wish to change how action methods are called,

It turns out that # this method if you wish to change how action methods are called, is from https://github.com/rails/rails/blob/070d4afacd3e9721b7e3a4634e4d026b5fa2c32c/actionpack/lib/abstract_controller/base.rb#L199

So.. the parameter passed to wkb_generator was part of some actionpack comment code from a related gem (!!!)

This points at some sort of memory corruption - most likely related to the fact that rgeo uses an underlying c implementation: https://github.com/rgeo/rgeo/tree/master/ext/geos_c_impl

nateberkopec · 2020-09-25T17:39:34Z

@oskarpearson Sounds like a bug in GC.compact... @tenderlove how should people report that? Ruby tracker I would guess?

veganstraightedge · 2020-10-27T05:27:35Z

In the @crimethinc website Rails app, the Puma 5 experimental features seem to have gained us ~2gb of memory back.

Before

After

Past 72 hours

nateberkopec · 2020-10-27T13:50:47Z

@veganstraightedge Are you running all three?

h0jeZvgoxFepBQ2C · 2020-10-27T22:48:27Z

I tried the fork_worker option on one of our production machines and it's really really really really amazing. I see the normal CPU spike for loading the first worker - and then no CPU spikes anymore, just the log entries telling me the workers are starting. I tried to load 8 puma processes on a 2 CPU machine, and it worked fine without any problem and perfect performance (before fork_worker this would have blocked the whole machine for 3-4 minutes). Amazing, thank you so much for this ❤️

veganstraightedge · 2020-10-28T18:54:41Z

@veganstraightedge Are you running all three?

Yep!

https://github.com/crimethinc/website/pull/1829/files

benhalpern · 2020-10-29T14:56:34Z

Hey everybody, we added this config and saw ~2% reduction. More info here.

https://forem.dev/foremteam/results-of-implementing-puma-s-nakayoshifork-on-forem-3pdl

Forem is open source at https://github.com/forem/forem so folks are welcome to propose other experiments that might be worth trying.

fdelache · 2021-02-26T21:24:49Z

We applied the wait_for_less_busy_worker configuration on one of Shopify's webhook proxy application.
It successfully reduced the puma utilization spikes we had (it was often going above 80% utilization for the same volume of incoming requests).

For reference, we compute the puma utilization as ( ( puma_running - puma_pool_capacity ) + puma_backlog ) / puma_max_thread

Before

After

See the different scale for the Y-axis

gsaks123 · 2021-07-02T20:24:21Z

I captured a ~10 day view of memory usage in our app (tapology) before enabling two of these features. Then I turned on nakayoshi_fork and wait_for_less_busy_worker, and left it running for about the same amount of time.

I can't control the scales of these charts, but on our two servers with puma workers, after ~10 days of uptime:

Server 1 before was 37.6% memory usage, Server 1 after was at 33.7% memory usage
Server 2 before was 39.3% memory usage, Server 2 after was at 34.7% memory usage

The 2 features seem to have had a positive impact but this is not a perfectly controlled experiment! Out of curiosity I'm actually considering turning the features off again and letting it go for a bit, to see if it clearly reverts to the prior level.

Server 1 Before:

Server 1 After:

Server 2 Before:

Server 2 After:

nateberkopec · 2021-07-05T00:24:17Z

Thanks! To update everyone on my current thinking:

wait_for_less_busy_worker is all upside with no downside. It should become the default in Puma 6.
refork needs more fleshing out, but looks promising.
nakayoshi_fork has very minimal gain and can trigger nasty bugs in c-extensions due to usage of GC.compact. Considering removal in Puma 6.

sbocinec · 2022-01-17T08:09:01Z

Sharing my experience with enabling nakayoshi_fork after upgrading our RoR app to puma v5 (5.2.2), running with ruby 2.7.3:

Shortly after enabling the config option in production puma config we started receiving NotImplementedError exception with method 'new_with_args' called on unexpected T_IMEMO object (0x00005643e8e79208 flags=0x1607a) error message sporadically when running MySQL ActiveRecord create queries (using the latest available mysql gem v0.5.3). After disabling the config option 7 days ago, so far no more exception like this reported.

It follows the @nateberkopec claim of the nakayoshi_fork might be triggering bugs in C-extensions when GC.compact is executed.

nateberkopec · 2022-09-09T00:45:15Z

Closing. We have removed Nakayoshi Fork, and wait_for_less_busy_worker will be made the default in Puma 6.

Puma 6 dropped support for nakayoshi_fork (puma/puma#2258). GitLab has been running Puma 6 since GitLab 16.1 (https://gitlab.com/gitlab-org/gitlab/-/merge_requests/119135). Changelog: changed

nateberkopec added perf question labels May 12, 2020

nateberkopec pinned this issue May 12, 2020

danmayer mentioned this issue Jul 16, 2020

Tail Latency with queue_requests in Single Threaded mode #2311

Open

nateberkopec unpinned this issue Sep 5, 2020

veganstraightedge mentioned this issue Oct 26, 2020

Add Puma 5 experiments crimethinc/website#1829

Merged

rhymes mentioned this issue Oct 27, 2020

Enable Puma's nakayoshi_fork feature forem/forem#11121

Merged

5 tasks

dfuentes77 mentioned this issue Jan 5, 2022

Not seeing 'PumaWorkerKiller: Consuming XXX mb' stdout logging after SIGUSR1 signal to master pid zombocom/puma_worker_killer#99

Closed

pmcnano mentioned this issue Jun 27, 2022

[Bug] Concurrency issues bokmann/business_time#209

Closed

nateberkopec closed this as completed Sep 9, 2022

nateberkopec mentioned this issue Oct 14, 2022

Remove nakayoshi GC #2925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puma 5 Experimental Feature Reports #2258

Puma 5 Experimental Feature Reports #2258

nateberkopec commented May 12, 2020 •

edited

nateberkopec commented May 12, 2020

wjordan commented May 15, 2020 •

edited

nateberkopec commented May 19, 2020

fluke commented Jun 6, 2020 •

edited

danmayer commented Jul 9, 2020 •

edited

nateberkopec commented Jul 16, 2020

danmayer commented Jul 16, 2020

oskarpearson commented Sep 25, 2020 •

edited

nateberkopec commented Sep 25, 2020

veganstraightedge commented Oct 27, 2020 •

edited

nateberkopec commented Oct 27, 2020

h0jeZvgoxFepBQ2C commented Oct 27, 2020

veganstraightedge commented Oct 28, 2020

benhalpern commented Oct 29, 2020

fdelache commented Feb 26, 2021

gsaks123 commented Jul 2, 2021

nateberkopec commented Jul 5, 2021

sbocinec commented Jan 17, 2022

nateberkopec commented Sep 9, 2022

Puma 5 Experimental Feature Reports #2258

Puma 5 Experimental Feature Reports #2258

Comments

nateberkopec commented May 12, 2020 • edited

nateberkopec commented May 12, 2020

wjordan commented May 15, 2020 • edited

nateberkopec commented May 19, 2020

fluke commented Jun 6, 2020 • edited

danmayer commented Jul 9, 2020 • edited

nateberkopec commented Jul 16, 2020

danmayer commented Jul 16, 2020

oskarpearson commented Sep 25, 2020 • edited

nateberkopec commented Sep 25, 2020

veganstraightedge commented Oct 27, 2020 • edited

Before

After

Past 72 hours

nateberkopec commented Oct 27, 2020

h0jeZvgoxFepBQ2C commented Oct 27, 2020

veganstraightedge commented Oct 28, 2020

benhalpern commented Oct 29, 2020

fdelache commented Feb 26, 2021

Before

After

gsaks123 commented Jul 2, 2021

nateberkopec commented Jul 5, 2021

sbocinec commented Jan 17, 2022

nateberkopec commented Sep 9, 2022

nateberkopec commented May 12, 2020 •

edited

wjordan commented May 15, 2020 •

edited

fluke commented Jun 6, 2020 •

edited

danmayer commented Jul 9, 2020 •

edited

oskarpearson commented Sep 25, 2020 •

edited

veganstraightedge commented Oct 27, 2020 •

edited