Puma 4.3.6 - Rails 6.0.3.4 - TLS brings in memory leak #2425

Melchizedech · 2020-10-13T02:17:35Z

Describe the bug
Creating a "hello world" rack application and configuring puma to run using TLS, excessive memory is used by the application.

Puma config:

threads 1, 1
workers 1
ssl_bind '0.0.0.0', 3000, {
    key: "certificates/certificate.key",
    cert: "certificates/certificate.crt",
    verify_mode: 'none',
    no_tlsv1: true,
    no_tlsv1_1: true
  }

bind 'tcp://0.0.0.0:3500'

# Specifies the `pidfile` that Puma will use.
pidfile ENV.fetch("PIDFILE") { "server.pid" }

CLI : docker run --rm -p 3000:3000 -p 3500:3500 puma_tls_memory_leak

To Reproduce

I've created a repo and a dockerfile to be able to reproduce, using your hello world rack :
https://github.com/Melchizedech/rails_puma_tls_memory_leak

Running while :; do curl -k --location 'https://localhost:3000' -v ; done will increase the RAM used by the application from 25MB (RSS) to 60MB (RSS). Then every minute or so, another extra MB (RSS) will be consumed.

If we instead run while :; do curl --location 'http://localhost:3000/' -v ; done, the RAM used by the app increase from 25 to 32MB (RSS) and eventually, the app will start consuming extra MB of RAM.

I definitely think there is an issue on Puma with TLS.

Here is the graph of one application that we use, it is a small Rails app that receives close to no user traffic. The "real" traffic it receives comes from the liveness probe.

Cluster A :

Cluster B :

Here are some screenshots from the ram consumption of a brand new rails app only running the API mode, while we hit a non-existing route, behind a similar configuration.

Just after boot :

After 6 min of constant cURL

After 28min of constant cURL

While I was testing the brand new rails API to reproduce the issue (before swwitching for a Rack hello world app), I tried to use jemalloc, I've seen a strong improvement, though, it was still getting more RAM used (but a much, much, slower rate).

Expected behavior
I would expect the RAM consumption to not be increasing forever when I use TLS on Puma.

Desktop (please complete the following information):

OS: ArchLinux, (and also Debian via Dockerization)
Puma Version : 4.3.6 (but could also reproduce on 5.0.2)

Any help to debug/investigate that is welcomed.

The text was updated successfully, but these errors were encountered:

nateberkopec · 2020-10-13T13:38:36Z

Is there any way you could post a graph of what your cluster looks like without TLS enabled? Maybe put an SSL terminator in front of Puma, like nginx? Of course, we'll also be interested if this reproduces on 5.x.

Your production graphs are compelling. I'll take a look at the repro.

Melchizedech · 2020-10-13T23:13:08Z

I can try to put that app without TLS in place on one of our staging environment and give you the result in several days.

Using such a solution won't work. (TL;DR: this is an app that requires encrypted traffic in our internal network. traffic between [outside, internal] <=> nginx will be encrypted but not nginx <=> puma). We will most likely use service mesh or a similar technology to remove the need for TLS within puma.

Here is three screenshots for puma 4.3.6 with HTTP
At boot time

After 7min

After 27min

here are 3 screenshots of puma 5.0.2 with HTTPS

At boot time

After 6 minutes

After 33 minutes

These screenshots are using the same hello world rack app.
only difference was a change in the gemfile

diff --git a/Gemfile b/Gemfile
index 8418cc4..781766f 100644
--- a/Gemfile
+++ b/Gemfile
@@ -3,4 +3,4 @@ git_source(:github) { |repo| "https://github.com/#{repo}.git" }
 
 ruby '2.7.2'
 
-gem 'puma', '~> 4.1'
+gem 'puma', '~> 5.0'
diff --git a/Gemfile.lock b/Gemfile.lock
index 50eda8b..b55ce66 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -2,14 +2,14 @@ GEM
   remote: https://rubygems.org/
   specs:
     nio4r (2.5.4)
-    puma (4.3.6)
+    puma (5.0.2)
       nio4r (~> 2.0)
 
 PLATFORMS
   ruby
 
 DEPENDENCIES
-  puma (~> 4.1)
+  puma (~> 5.0)
 
 RUBY VERSION
    ruby 2.7.2p137

nateberkopec · 2020-10-14T00:46:41Z

Using such a solution won't work.

No problem, it was a long shot. Just figured I'd ask.

I'm less interested in memory usage after 20 to 30 minutes than I am in rate of growth after 6 to 12 hours of uptime.

If you have the same rate of growth in memory T+0 to T+1h as you do T+5h to T+6h, you have a leak. If it slows down over time, you don't.

Melchizedech · 2020-10-14T00:49:04Z

Okay, I'll deploy this hello world rack application with TLS, and I'll let it live for a day or two. I'll come back with the monitoring results.

MSP-Greg · 2020-10-14T14:51:03Z

@Melchizedech

Thanks for all the info. Any info as to TLS protocols that are most likely being used?

In #1735, several people had issues with AWS not supporting TLSv1.3, which forced downgrade negotiation to TLSv1.2.
There wasn't any mention of memory leaks. Hence, I'm wondering if this could be TLSv1.3 related.

I'll try to send several thousand connections to a server locally and see if I can repo a rise in memory.

Melchizedech · 2020-10-15T00:14:25Z

Hi @MSP-Greg, the TLS conf disables TLS 1.0 and TLS 1.1, leaving us with TLS 1.2 and TLS 1.3.

I don't know what is the protocol used by the liveness probe, but my guess would be 1.3.

The tests I performed were using TLS 1.3.
I have run another test, similar to the one previously, I've got the same increase using only TLS1.2
(while :; do curl --tls-max 1.2 -k --location 'https://localhost:3000' ; done) on a period over 30min.

Melchizedech · 2020-10-15T01:03:41Z

FYI : using curl with TLS 1.2 instead of TLS 1.3 (so most likely having one downgrade phase).

at boot

at 9 min

at 29 min

regarding the HTTP-only test, it is ongoing.

Melchizedech · 2020-10-15T07:13:39Z

Here is the plot for the same app as described in "Cluster A", but using HTTP. We can see there is no change in ram consumption compared to TLS.

nateberkopec · 2020-10-15T13:23:24Z

@Melchizedech Was that app receiving any traffic? I assume your memory graphs from the TLS-on app was constantly receiving traffic.

Melchizedech · 2020-10-15T20:45:10Z

The app is receveing traffic from the liveness probe. The real user traffic has not been simulated. Though I can give it another go if needed.

…

On October 15, 2020 1:23:44 PM UTC, Nate Berkopec ***@***.***> wrote: @Melchizedech Was that app receiving any traffic? I assume your memory graphs from the TLS-on app was constantly receiving traffic. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #2425 (comment)

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

MSP-Greg · 2020-10-16T14:35:23Z

@Melchizedech

I tested to see if there were any Ruby objects that GC was having issues with, and that doesn't seem to be an issue, in that running GC released all of them.

This weekend I'll run a few thousand TLS connections at a server and check the memory stats.

dentarg · 2021-02-06T16:34:31Z

@Melchizedech do you want to try again with the newly released Puma 5.2 series?

l33z3r · 2021-10-23T13:46:08Z

Hi guys, has anyone seen this issue resolved with the Puma 5.2 series?
I am having the same issue described here. thanks

dentarg · 2021-10-31T13:54:26Z

@l33z3r maybe you can find out for us?

l33z3r · 2021-11-01T15:06:22Z

Hi guys, just to chime in here. The issue I was having with leaking memory was not related to what is described in this issue.

DaryMerckensGT · 2022-02-15T21:18:13Z

Hi guys, just to chime in here. The issue I was having with leaking memory was not related to what is described in this issue.

what was it related to?

l33z3r · 2022-02-16T08:11:29Z

Hi guys, just to chime in here. The issue I was having with leaking memory was not related to what is described in this issue.

what was it related to?

It was actually related to a memory leak in this gem: https://github.com/typhoeus/ethon

nateberkopec added bug ssl labels Oct 13, 2020

MSP-Greg mentioned this issue Jan 31, 2021

Bug Fix - MiniSSL::Socket#write - use data.byteslice(wrote..-1) #2543

Merged

8 tasks

puma deleted a comment from S3od22 Feb 6, 2021

MSP-Greg mentioned this issue Mar 9, 2021

Scatter/gather IO with writev #2568

Closed

8 tasks

nateberkopec closed this as completed Sep 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puma 4.3.6 - Rails 6.0.3.4 - TLS brings in memory leak #2425

Puma 4.3.6 - Rails 6.0.3.4 - TLS brings in memory leak #2425

Melchizedech commented Oct 13, 2020 •

edited

nateberkopec commented Oct 13, 2020

Melchizedech commented Oct 13, 2020

nateberkopec commented Oct 14, 2020

Melchizedech commented Oct 14, 2020

MSP-Greg commented Oct 14, 2020

Melchizedech commented Oct 15, 2020

Melchizedech commented Oct 15, 2020

Melchizedech commented Oct 15, 2020

nateberkopec commented Oct 15, 2020

Melchizedech commented Oct 15, 2020 via email

MSP-Greg commented Oct 16, 2020

dentarg commented Feb 6, 2021 •

edited

l33z3r commented Oct 23, 2021

dentarg commented Oct 31, 2021

l33z3r commented Nov 1, 2021

DaryMerckensGT commented Feb 15, 2022

l33z3r commented Feb 16, 2022

Puma 4.3.6 - Rails 6.0.3.4 - TLS brings in memory leak #2425

Puma 4.3.6 - Rails 6.0.3.4 - TLS brings in memory leak #2425

Comments

Melchizedech commented Oct 13, 2020 • edited

nateberkopec commented Oct 13, 2020

Melchizedech commented Oct 13, 2020

nateberkopec commented Oct 14, 2020

Melchizedech commented Oct 14, 2020

MSP-Greg commented Oct 14, 2020

Melchizedech commented Oct 15, 2020

Melchizedech commented Oct 15, 2020

Melchizedech commented Oct 15, 2020

nateberkopec commented Oct 15, 2020

Melchizedech commented Oct 15, 2020 via email

MSP-Greg commented Oct 16, 2020

dentarg commented Feb 6, 2021 • edited

l33z3r commented Oct 23, 2021

dentarg commented Oct 31, 2021

l33z3r commented Nov 1, 2021

DaryMerckensGT commented Feb 15, 2022

l33z3r commented Feb 16, 2022

Melchizedech commented Oct 13, 2020 •

edited

dentarg commented Feb 6, 2021 •

edited