Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent Connections monopolize a thread #1835

Closed
twalpole opened this issue Jul 8, 2019 · 9 comments · Fixed by #1857
Closed

Persistent Connections monopolize a thread #1835

twalpole opened this issue Jul 8, 2019 · 9 comments · Fixed by #1857

Comments

@twalpole
Copy link
Contributor

twalpole commented Jul 8, 2019

No description provided.

@nateberkopec
Copy link
Member

Linking #1565 on a hunch

@Jesus
Copy link
Contributor

Jesus commented Jul 18, 2019

I've been debugging this. I don't have a fix yet but I have some findings.

This is because the MiniSSL implementation of #read_nonblock blocks here:

puma/lib/puma/minissl.rb

Lines 60 to 66 in 099a447

if @socket.to_io.respond_to?(data)
@socket.to_io.__send__(data)
elsif data == :wait_readable
IO.select([@socket.to_io])
else
IO.select(nil, [@socket.to_io])
end

This is particularly bad because the method is being called from inside the thread pool, which as the OP accurately reports monopolices the thread. At least this only happens when using SSL.

I'm tempted to update that piece of code so it'll raise Errno::EAGAIN instead of blocking. Doing that fixes the issue.

But I want to understand better the reasoning that lead to the current behaviour.

@nateberkopec
Copy link
Member

Linking @Jesus's comment on #1439 #1439 (comment)

@MSP-Greg
Copy link
Member

MSP-Greg commented Jul 19, 2019

I ran up essentially 4.0.1 with -t 1:1 locally, then ran both Chrome & Firefox, and both were responsive. Couldn't find the info in Firefox, but Chrome was connected via TLSv1.3. Interesting. I'll look more later...

@Jesus
Copy link
Contributor

Jesus commented Jul 19, 2019

That's weird, I could reproduce the issue easily using the gist above.

I'll try to write a test in #1857 to catch this.

@ezekg
Copy link

ezekg commented Jul 19, 2019

I believe I'm seeing this (similar?) behavior as well on Heroku - Puma 4.0.1, Rails 5.2, Ruby 2.6. I did load tests with and without keep-alives, respectively, to try and figure out why my active thread count was so high even though request volume had returned to normal after a quick spike.

When using keep-alives, it looks like Puma maintains idle threads that can't be used by other requests, which I'd imagine reduces throughput until those sockets time out.

image

Another question I have is why a good portion of these idle threads (sockets?) never timed out (as you can see from the chart — some stayed around for almost an hour requiring a forced restart of all dynos), even though my persistent_timeout is set to 30. These hanging threads only seem to happen during high load.

When forcefully restarting the dynos, I also saw some R12 errors, which means the process didn't gracefully exit within 30s, but that may be unrelated (I had never seen an R12 before today).

@nateberkopec
Copy link
Member

@ezekg Can you post the config you used to produce that behavior?

@ezekg
Copy link

ezekg commented Jul 20, 2019

@nateberkopec here are the relevant parts of my Puma config:

environment 'production'
persistent_timeout 30
threads 32, 32
workers 12

I was load testing using 2 Professional-M dynos. My load test consisted of ~60 RPS for 2 minutes. I saw some timeouts due to using rack-timeout and the timeout being set at 15s (expected timeouts due to this volume). Just thought I'd mention, as it may happen to be related here. (Though I'm rescuing Rack::Timeout::Error in some custom middleware, so not sure if Puma would ever see that error or if it would have any effect on these zombie threads.)

@ezekg
Copy link

ezekg commented Jul 20, 2019

@twalpole I think you're right. My bad. I saw that #1565 was linked and figured they were related, but I guess they're not actually related after all. I'll open up a new issue when I get some time to create a reproducible test app that exhibits the behavior I'm seeing. Carry on!

Jesus added a commit to Jesus/puma that referenced this issue Jul 23, 2019
Jesus added a commit to Jesus/puma that referenced this issue Jul 23, 2019
Jesus added a commit to Jesus/puma that referenced this issue Jul 24, 2019
Jesus added a commit to Jesus/puma that referenced this issue Jul 25, 2019
@twalpole twalpole closed this as completed Aug 2, 2019
@nateberkopec nateberkopec added bug and removed question labels Aug 2, 2019
@nateberkopec nateberkopec reopened this Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants