New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ctrl+C takes a long time to quit in 5.0.x #2398
Comments
One thing that could help w/debugging here would be to use the new |
I don't mind helping to debug it on my system, where it reproduces easily, but I don't know how to do that. Searching |
Whoopsie, it's a dash, not an underscore: #2054 |
Ok, so here is what I did (I hope I did it right): Using the same minimal # config.ru
run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] } Step 1: I executed: $ puma config.ru -p 3000 --control-url="unix:///tmp/puma.sock" --control-token="token" Step 2: I then accessed http://localhost:3000 in chrome - this is important (more on this later). Step 3: I pressed Ctrl+C to sto ppuma Step 4: In another terminal, while I am waiting for the shutdown to complete, I executed: $ pumactl thread-backtraces --control-url="unix:///tmp/puma.sock" --control-token="token" > trace.txt Which provided me with this output:
It is important to note that:
Let me know if I can provide any additional information. |
@DannyBen do you mind also seeing if it reproduces with Puma 5.0.2? |
Tested with puma 5.0.2, same behavior as 5.0.0 Output of
|
Thanks for that output, that's just the information required. So, the reactor is sleeping, waiting for more to come down the socket Your threadpool is waiting for threads to finish. And both threads in the threadpool are blocking on more data. My guess: Chrome opens keepalive requests but Curl doesn't. A keepalive is probably changing Puma's behavior here as to when it decides to cut-off a connection and shut down. If you issue a request with |
Yes it does. |
Ok. Next step will probably be for me or others to repro. I'm wondering what Chrome is doing differently then. It feels like a "slow client" type situation. It's possible that we learn something here that helps stabilize our shutdown behavior for everyone. |
I'm able to reproduce the problem on macOS Chrome Version 85.0.4183.121. I don't see the same behavior in Firefox. In Firefox, a single TCP connection is opened when requesting In Chrome, two TCP connections are opened when requesting the same page. The client makes two requests, but both on a single TCP connection. One of the connection is therefore unused (Chrome never writes a request to it). During shutdown (since #2122), puma doesn't immediately close any connections that it hasn't written a response to. In general, this provides more reliable behavior for clients that connected to the server just before the shutdown started. One side-effect is that if a client opens a connection but never actually intends on writing to it, it can keep the server up and running until an inactivity timeout. IMHO puma is behaving as it is expected to since the introduction of #2122. Chrome's behavior is a little weird. I found this Chromium bug report that provides some insight: https://bugs.chromium.org/p/chromium/issues/detail?id=116982
The unused TCP connection in Chrome's case seems like it's a feature. Chrome opens it up because most sites require at least two concurrent connections to load the page quickly. I wouldn't expect this to be an issue for most web apps. The "hello world" app exhibits this behavior since it doesn't force the browser to make many additional requests for things like stylesheets, JavaScript, or images. |
Well, good to know it is reproducible. For what it's worth:
|
This is good to know. It's possible something else is going on in that case. It'd be nice to get a reproducible test case for a more real-world app, but I understand if that's difficult if it's proprietary.
Is this an issue mostly for developers running the puma server on their own machines? I could see that being annoying. In your case, I would definitely recommend taking a look at the Lines 241 to 260 in 4be4069
I could see this also being useful. |
I suggest modifying
I agree it's an annoyance in development.
CTRL-C sends SIGINT to the process. We have a pretty hard contract and docs around what Puma does when that is sent, because 99% of people manage Puma processes in production with these signals. For me, three action items:
|
In my own experience, I don't often need to kill and restart puma when I'm working on a Rails app, since Rails has support for autoloading whenever you change controllers/models/etc. I could see it being worth mentioning for non-Rails apps, though. |
Actually, fourth concern: how does this Chrome behavior affect single-threaded apps in production? It effectively halves one's capacity. That's kind of severe. For people running Puma with many threads, there's no issue, because nothing ever comes down the pipe so that thread in the threadpool just idles and doesn't use any GVL time. |
I can try and create a sample that serves more assets in its root page.
Yes and no. From developer standpoint, the issue is obvious. But I also suspect this might cause delays when updating the apps in production. Is there a documentation on how to use And for @nateberkopec : in regards to the Ctrl+C - notice I mentioned subsequent Ctrl+C. Not sure it changes anything, but just in case you missed it. |
Noted. SIGINT is currently an idempotent operation in Puma, and I think all of our signals probably will remain so for production stability and predictability. |
And BTW, the last rename of the issue title will make it harder for people to find the same problem. |
I was thinking about this a little bit too, but I don't think it's that bad. Even if you have puma running with |
Unfortunately I think this is not the case here. Take a look at @DannyBen's thread-backtraces output, you'll see he has two threads active in the threadpool. |
I think that's just because the server is shutting down. As soon as the Reactor is shut down, it pushes all connections that it had alive into the threadpool. Under normal operating conditions, idle connections don't occupy threads in the threadpool. You're right, though, that if a server had many idle connections at shutdown time, all of them would be pushed into the threadpool. And if that threadpool has only one thread, you basically have to wait for each thread to timeout (according to |
@schneems suddenly a connection to The Old H13 Problem ^^ https://devcenter.heroku.com/articles/error-codes#h13-connection-closed-without-response |
I believe the below is a good stand-in for a "real app"? The root page serves multiple images, so it should - according to the above discussion - occupy this additional connection? From my tests, it is even worse. Shutdown takes much longer than it took with the simpler app. Just save it and run # server.rb
require 'bundler/inline'
require 'sinatra'
gemfile do
source "https://rubygems.org"
gem 'sinatra'
gem 'puma', '~> 5.0'
gem 'icodi'
end
set :port, 3000
set :bind, '0.0.0.0'
get '/' do
(1..10).map { |i| "<img width=80 src='/image#{i}'>" }.join "\n"
end
get '/image*' do
content_type 'image/svg+xml'
Icodi.new.render
end EDIT Interesting observations:
So, it seems like the reliability of the production environment suffers from this change as well (consistently reproducible). |
What if we simplified |
Actually, I don't like how much that rolls back #2122. But maybe even a decrease in the default first data timeout might be appropriate? 30 seconds is a long time to wait for a client to say something. |
Yeah, I think the changes in #2122 were valuable because they fixed a bunch of concurrency bugs related to what happens to requests that were in the Reactor at the time a shutdown begins. The problem this #2122 though is that pushing any connections into the ThreadPool that haven't yet made a request and will never make a request makes it take a long time for the ThreadPool to shut down. My first thought was that we can potentially be more discerning about which requests we send into the ThreadPool from the Reactor at shutdown time. I'm tempted to do something as aggressive as just removing the diff --git a/lib/puma/server.rb b/lib/puma/server.rb
index fd78f3b1..49c21c54 100644
--- a/lib/puma/server.rb
+++ b/lib/puma/server.rb
@@ -291,7 +291,7 @@ module Puma
# will wake up and again be checked to see if it's ready to be passed to the thread pool.
def reactor_wakeup(client)
shutdown = !@queue_requests
- if client.try_to_finish || (shutdown && !client.can_close?)
+ if client.try_to_finish
@thread_pool << client
elsif shutdown || client.timeout == 0
client.timeout! But of course that would mean that if the server had received some bytes of the request, but not all of them, we'd close the connection immediately during shutdown which isn't desirable. Instead, I think it might be worth reexamining whether or not it really makes sense for us to even force requests into the ThreadPool from the Reactor at shutdown time at all. We can instead let the Reactor do what it does best (waiting for the appropriate amount of time until a connection sends data, then finally passing buffered clients to the ThreadPool) until it has no more connections. I'd love for The benefit of that approach is that clients aren't treated differently just because they made a request close to the time the server was about to shut down (as a reminder, this could just be during a hot restart or a phased restart, not necessarily someone killing the server entirely with a In practice, though, this can be annoying because phased restarts on a single worker, hot restarts, and explicit shutdowns ( Before #2122, when a shutdown started, we effectively closed all connections in the Reactor immediately. One way of thinking about this is that the Reactor was behaving as though I think one kinda elegant solution to this problem is that during a shutdown, the Reactor should just set the An improvement on this system would be use to separate Hopefully that makes sense. Also curious about @wjordan 's thoughts. |
Here's a few thoughts/responses:
I actually think this change could be interesting:
Also worth noting that
I think this would probably be acceptable- although RFC 2616 does say 'Servers SHOULD always respond to at least one request per connection, if at all possible,' it looks like this requirement was removed from the updated RFC 7230, with the note that "some extraneous requirements about when servers are allowed to close connections prematurely have been removed." As far as rolling back #2122, it's probably more important to wait on open connections that already sent part of a request than it is to wait for open connections that haven't sent any data at all.
Varnish defaults its
Interesting and probably feasible to simplify the server logic along these lines, but not really relevant to the current issue- graceful shutdown would wait on open connections just the same whether they are idling in the Reactor or the ThreadPool.
I think dropping open+unused connections at shutdown should be absolutely fine, no extra shutdown-specific (and/or interrupt-specific) timeouts should be necessary. |
Makes sense! I can open a PR that just modifies |
Ok I'm caught up here - let's do it @cjlarose |
Just leaving this here for reference: It appears the Chrome pre-connects that may be causing this issue have been made more aggressive in the February release, which may be why it wasn't noticed earlier. https://bugs.chromium.org/p/chromium/issues/detail?id=85229 |
@nateberkopec would you mind changing the title of this issue back to "Ctrl+C takes a long time to quit in 5.0.x" I think it's more likely to be the issue title folks search for (#2484 is an example of someone bringing up the same issue). |
@cjlarose Just to clarify, my issue doesn't appear to be the same as this as my issue(s) is still present in the |
Describe the bug:
Since version 5.0.0 it seems like Ctrl+C just hangs for a few seconds (~5-15) before quitting. Hitting Ctrl+C again while it hangs, does not terminate. This is quite reproducible, and happens only after there was at least a single request to the server.
Puma config:
I first experienced it with Sinatra, but noticed this is reproducible with plain puma, no config.
To Reproduce:
Using the same minimal example provided in the issue template:
1. Create this
config.ru
file2. Run it with:
3. Access the server in a browser
4. Press Ctrl+C to stop the server
Expected behavior
As with 4.x version, Ctrl+C is expected to exit quickly.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: