Persistent Connections monopolize a thread #1835

twalpole · 2019-07-08T18:06:29Z

No description provided.

nateberkopec · 2019-07-08T18:15:27Z

Linking #1565 on a hunch

Jesus · 2019-07-18T22:49:35Z

I've been debugging this. I don't have a fix yet but I have some findings.

This is because the MiniSSL implementation of #read_nonblock blocks here:

puma/lib/puma/minissl.rb

Lines 60 to 66 in 099a447

    
           if @socket.to_io.respond_to?(data) 
        
             @socket.to_io.__send__(data) 
        
           elsif data == :wait_readable 
        
             IO.select([@socket.to_io]) 
        
           else 
        
             IO.select(nil, [@socket.to_io]) 
        
           end

This is particularly bad because the method is being called from inside the thread pool, which as the OP accurately reports monopolices the thread. At least this only happens when using SSL.

I'm tempted to update that piece of code so it'll raise Errno::EAGAIN instead of blocking. Doing that fixes the issue.

But I want to understand better the reasoning that lead to the current behaviour.

nateberkopec · 2019-07-18T23:14:39Z

Linking @Jesus's comment on #1439 #1439 (comment)

MSP-Greg · 2019-07-19T01:21:38Z

I ran up essentially 4.0.1 with -t 1:1 locally, then ran both Chrome & Firefox, and both were responsive. Couldn't find the info in Firefox, but Chrome was connected via TLSv1.3. Interesting. I'll look more later...

Jesus · 2019-07-19T19:20:03Z

That's weird, I could reproduce the issue easily using the gist above.

I'll try to write a test in #1857 to catch this.

ezekg · 2019-07-19T22:28:35Z

I believe I'm seeing this (similar?) behavior as well on Heroku - Puma 4.0.1, Rails 5.2, Ruby 2.6. I did load tests with and without keep-alives, respectively, to try and figure out why my active thread count was so high even though request volume had returned to normal after a quick spike.

When using keep-alives, it looks like Puma maintains idle threads that can't be used by other requests, which I'd imagine reduces throughput until those sockets time out.

Another question I have is why a good portion of these idle threads (sockets?) never timed out (as you can see from the chart — some stayed around for almost an hour requiring a forced restart of all dynos), even though my persistent_timeout is set to 30. These hanging threads only seem to happen during high load.

When forcefully restarting the dynos, I also saw some R12 errors, which means the process didn't gracefully exit within 30s, but that may be unrelated (I had never seen an R12 before today).

nateberkopec · 2019-07-20T14:15:15Z

@ezekg Can you post the config you used to produce that behavior?

ezekg · 2019-07-20T19:49:45Z

@nateberkopec here are the relevant parts of my Puma config:

environment 'production'
persistent_timeout 30
threads 32, 32
workers 12

I was load testing using 2 Professional-M dynos. My load test consisted of ~60 RPS for 2 minutes. I saw some timeouts due to using rack-timeout and the timeout being set at 15s (expected timeouts due to this volume). Just thought I'd mention, as it may happen to be related here. (Though I'm rescuing Rack::Timeout::Error in some custom middleware, so not sure if Puma would ever see that error or if it would have any effect on these zombie threads.)

ezekg · 2019-07-20T21:37:57Z

@twalpole I think you're right. My bad. I saw that #1565 was linked and figured they were related, but I guess they're not actually related after all. I'll open up a new issue when I get some time to create a reproducible test app that exhibits the behavior I'm seeing. Carry on!

nateberkopec added question perf labels Jul 8, 2019

Jesus mentioned this issue Jul 19, 2019

Avoid blocking on SSL's #read_nonblock #1857

Merged

Jesus added a commit to Jesus/puma that referenced this issue Jul 23, 2019

Adds a test to capture puma#1835

54417d1

Jesus added a commit to Jesus/puma that referenced this issue Jul 23, 2019

Adds a test to capture puma#1835

1e7bebe

Jesus added a commit to Jesus/puma that referenced this issue Jul 24, 2019

Adds a test to capture puma#1835

b996821

Jesus added a commit to Jesus/puma that referenced this issue Jul 25, 2019

Adds a test to capture puma#1835

77468fb

twalpole closed this as completed Aug 2, 2019

nateberkopec added bug and removed question labels Aug 2, 2019

nateberkopec reopened this Aug 2, 2019

evanphx closed this as completed in #1857 Aug 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent Connections monopolize a thread #1835

Persistent Connections monopolize a thread #1835

twalpole commented Jul 8, 2019 •

edited

nateberkopec commented Jul 8, 2019

Jesus commented Jul 18, 2019

nateberkopec commented Jul 18, 2019

MSP-Greg commented Jul 19, 2019 •

edited

Jesus commented Jul 19, 2019

ezekg commented Jul 19, 2019 •

edited

nateberkopec commented Jul 20, 2019

ezekg commented Jul 20, 2019 •

edited

ezekg commented Jul 20, 2019 •

edited

Persistent Connections monopolize a thread #1835

Persistent Connections monopolize a thread #1835

Comments

twalpole commented Jul 8, 2019 • edited

nateberkopec commented Jul 8, 2019

Jesus commented Jul 18, 2019

nateberkopec commented Jul 18, 2019

MSP-Greg commented Jul 19, 2019 • edited

Jesus commented Jul 19, 2019

ezekg commented Jul 19, 2019 • edited

nateberkopec commented Jul 20, 2019

ezekg commented Jul 20, 2019 • edited

ezekg commented Jul 20, 2019 • edited

twalpole commented Jul 8, 2019 •

edited

MSP-Greg commented Jul 19, 2019 •

edited

ezekg commented Jul 19, 2019 •

edited

ezekg commented Jul 20, 2019 •

edited

ezekg commented Jul 20, 2019 •

edited