Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctrl+C takes a long time to quit in 5.0.x #2398

Closed
DannyBen opened this issue Oct 1, 2020 · 32 comments · Fixed by #2460
Closed

Ctrl+C takes a long time to quit in 5.0.x #2398

DannyBen opened this issue Oct 1, 2020 · 32 comments · Fixed by #2460

Comments

@DannyBen
Copy link

DannyBen commented Oct 1, 2020

Describe the bug:

Since version 5.0.0 it seems like Ctrl+C just hangs for a few seconds (~5-15) before quitting. Hitting Ctrl+C again while it hangs, does not terminate. This is quite reproducible, and happens only after there was at least a single request to the server.

Puma config:

I first experienced it with Sinatra, but noticed this is reproducible with plain puma, no config.

To Reproduce:

Using the same minimal example provided in the issue template:

1. Create this config.ru file

# config.ru
run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }

2. Run it with:

$ rackup -p 3000 -o 0.0.0.0

3. Access the server in a browser

open http://localhost:3000/

4. Press Ctrl+C to stop the server

Expected behavior

As with 4.x version, Ctrl+C is expected to exit quickly.

Desktop (please complete the following information):

  • OS: Ubuntu 18.04.5 LTS (headless via Vagrant, on a Windows host)
  • Puma Version: 5.0.0
  • Ruby Version: 2.7.0p0
  • Rack Version: 2.2.3
@nateberkopec
Copy link
Member

One thing that could help w/debugging here would be to use the new thread_backtraces command to print the stack as you're waiting for it to shut down.

@DannyBen
Copy link
Author

DannyBen commented Oct 1, 2020

One thing that could help w/debugging here would be to use the new thread_backtraces command

I don't mind helping to debug it on my system, where it reproduces easily, but I don't know how to do that.

Searching thread_backtraces shows 0 results in the code.

@nateberkopec
Copy link
Member

Whoopsie, it's a dash, not an underscore: #2054

@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

Ok, so here is what I did (I hope I did it right):

Using the same minimal config.ru:

# config.ru
run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }

Step 1: I executed:

$ puma config.ru -p 3000 --control-url="unix:///tmp/puma.sock" --control-token="token"

Step 2: I then accessed http://localhost:3000 in chrome - this is important (more on this later).

Step 3: I pressed Ctrl+C to sto ppuma

Step 4: In another terminal, while I am waiting for the shutdown to complete, I executed:

$ pumactl thread-backtraces --control-url="unix:///tmp/puma.sock" --control-token="token" > trace.txt

Which provided me with this output:

Command thread-backtraces sent success
[{"name":"Thread: TID-14k","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/single.rb:60:in `join'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/single.rb:60:in `run'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/launcher.rb:171:in `run'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/cli.rb:80:in `run'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/bin/puma:10:in `<top (required)>'","/usr/share/rvm/gems/ruby-2.7.0/bin/puma:23:in `load'","/usr/share/rvm/gems/ruby-2.7.0/bin/puma:23:in `<main>'"]},{"name":"Thread: TID-154 puma reactor","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/reactor.rb:136:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/reactor.rb:136:in `run_internal'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/reactor.rb:318:in `block in run_in_thread'"]},{"name":"Thread: TID-15o puma threadpool reaper","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:304:in `sleep'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:304:in `block in start!'"]},{"name":"Thread: TID-168 puma threadpool trimmer","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:304:in `sleep'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:304:in `block in start!'"]},{"name":"Thread: TID-16s puma server","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:290:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:290:in `handle_servers'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:262:in `block in run'"]},{"name":"Thread: TID-17c puma server","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:345:in `join'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:345:in `each'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:345:in `shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:920:in `graceful_shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:325:in `handle_servers'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:262:in `block in run'"]},{"name":"Thread: TID-17w puma threadpool 001","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/client.rb:250:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/client.rb:250:in `finish'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:209:in `block in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:141:in `block in spawn_thread'"]},{"name":"Thread: TID-18g puma threadpool 002","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/client.rb:250:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/client.rb:250:in `finish'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:209:in `block in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:141:in `block in spawn_thread'"]},{"name":"Thread: TID-190 puma threadpool 001","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/launcher.rb:216:in `backtrace'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/launcher.rb:216:in `block in thread_status'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/launcher.rb:212:in `each'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/launcher.rb:212:in `thread_status'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/app/status.rb:63:in `call'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:620:in `handle_request'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:377:in `process_client'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/server.rb:231:in `block in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.0/lib/puma/thread_pool.rb:141:in `block in spawn_thread'"]}]

It is important to note that:

  1. When accessing the server through curl instead of chrome, the Ctrl+C exits immediately
  2. When accessing the server through chrome, and then pressing Ctrl+C - while waiting, if I shut down chrome, puma exits immediately.
  3. Just to verify, I also tested the same flow with puma 4.3.6 and the problem was not present.

Let me know if I can provide any additional information.

@dentarg
Copy link
Member

dentarg commented Oct 2, 2020

@DannyBen do you mind also seeing if it reproduces with Puma 5.0.2?

@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

Tested with puma 5.0.2, same behavior as 5.0.0

Output of pumactl thread-backtraces:

Command thread-backtraces sent success
[{"name":"Thread: TID-1po","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/single.rb:61:in `join'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/single.rb:61:in `run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/launcher.rb:171:in `run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/cli.rb:80:in `run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/bin/puma:10:in `<top (required)>'","/store/gems/ruby-2.7.0/bin/puma:23:in `load'","/store/gems/ruby-2.7.0/bin/puma:23:in `<top (required)>'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `load'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `kernel_load'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:28:in `run'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:476:in `exec'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:30:in `dispatch'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:24:in `start'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/exe/bundle:46:in `block in <top (required)>'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors'","/store/gems/ruby-2.7.0/gems/bundler-2.1.4/exe/bundle:34:in `<top (required)>'","/usr/share/rvm/gems/ruby-2.7.0/bin/bundle:23:in `load'","/usr/share/rvm/gems/ruby-2.7.0/bin/bundle:23:in `<main>'"]},{"name":"Thread: TID-1q8 puma reactor","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/reactor.rb:136:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/reactor.rb:136:in `run_internal'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/reactor.rb:307:in `block in run_in_thread'"]},{"name":"Thread: TID-1qs puma threadpool reaper","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:308:in `sleep'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:308:in `block in start!'"]},{"name":"Thread: TID-1rc puma threadpool trimmer","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:308:in `sleep'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:308:in `block in start!'"]},{"name":"Thread: TID-1rw puma server","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:305:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:305:in `handle_servers'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:277:in `block in run'"]},{"name":"Thread: TID-1sg puma server","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:362:in `join'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:362:in `each'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:362:in `shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:939:in `graceful_shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:342:in `handle_servers'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:277:in `block in run'"]},{"name":"Thread: TID-1t0 puma threadpool 001","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/client.rb:254:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/client.rb:254:in `finish'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:222:in `block (2 levels) in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:337:in `with_force_shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:221:in `block in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:145:in `block in spawn_thread'"]},{"name":"Thread: TID-1tk puma threadpool 002","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/client.rb:254:in `select'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/client.rb:254:in `finish'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:222:in `block (2 levels) in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:337:in `with_force_shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:221:in `block in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:145:in `block in spawn_thread'"]},{"name":"Thread: TID-1u4 puma threadpool 001","backtrace":["/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/launcher.rb:219:in `backtrace'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/launcher.rb:219:in `block in thread_status'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/launcher.rb:215:in `each'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/launcher.rb:215:in `thread_status'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/app/status.rb:63:in `call'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:638:in `block in handle_request'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:337:in `with_force_shutdown'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:637:in `handle_request'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:394:in `process_client'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/server.rb:250:in `block in run'","/store/gems/ruby-2.7.0/gems/puma-5.0.2/lib/puma/thread_pool.rb:145:in `block in spawn_thread'"]}]

@DannyBen DannyBen changed the title Ctrl+C takes a long time to quit in 5.0.0 Ctrl+C takes a long time to quit in 5.0.x Oct 2, 2020
@nateberkopec
Copy link
Member

Thanks for that output, that's just the information required.

So, the reactor is sleeping, waiting for more to come down the socket

Your threadpool is waiting for threads to finish.

And both threads in the threadpool are blocking on more data.

My guess: Chrome opens keepalive requests but Curl doesn't. A keepalive is probably changing Puma's behavior here as to when it decides to cut-off a connection and shut down.

If you issue a request with --keepalive-time 30 and curl, does Puma still shut down immediately?

@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

If you issue a request with --keepalive-time 30 and curl, does Puma still shut down immediately?

Yes it does.

@nateberkopec
Copy link
Member

Ok. Next step will probably be for me or others to repro.

I'm wondering what Chrome is doing differently then. It feels like a "slow client" type situation. It's possible that we learn something here that helps stabilize our shutdown behavior for everyone.

@nateberkopec nateberkopec changed the title Ctrl+C takes a long time to quit in 5.0.x Ctrl+C takes a long time to quit in 5.0.x if request from Chrome is open Oct 2, 2020
@cjlarose
Copy link
Member

cjlarose commented Oct 2, 2020

I'm able to reproduce the problem on macOS Chrome Version 85.0.4183.121. I don't see the same behavior in Firefox.

In Firefox, a single TCP connection is opened when requesting http://localhost:9292. The client makes two requests, one for GET / and another for GET /favicon.ico on the same persistent HTTP connection.

In Chrome, two TCP connections are opened when requesting the same page. The client makes two requests, but both on a single TCP connection. One of the connection is therefore unused (Chrome never writes a request to it).

During shutdown (since #2122), puma doesn't immediately close any connections that it hasn't written a response to. In general, this provides more reliable behavior for clients that connected to the server just before the shutdown started. One side-effect is that if a client opens a connection but never actually intends on writing to it, it can keep the server up and running until an inactivity timeout.

IMHO puma is behaving as it is expected to since the introduction of #2122. Chrome's behavior is a little weird.

I found this Chromium bug report that provides some insight: https://bugs.chromium.org/p/chromium/issues/detail?id=116982

Statistics seem to show that roughly 64% of all servers require at least a second connection to fulfill a page load. Hence in onyl about 1/3 of those "could be wasted" cases is the connection probably wasted. This means that we probably "waste" about 1/3 - 2/3% of all connections, but gain about 30-60ms average latency (3 seconds times 1-2%). We think this is a very positive tradeoff for our users, and a minimal(?) resource impact on servers (that would usually like to see their users responded to faster!).

The unused TCP connection in Chrome's case seems like it's a feature. Chrome opens it up because most sites require at least two concurrent connections to load the page quickly. I wouldn't expect this to be an issue for most web apps. The "hello world" app exhibits this behavior since it doesn't force the browser to make many additional requests for things like stylesheets, JavaScript, or images.

@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

Well, good to know it is reproducible.

For what it's worth:

  1. I observed this behavior (prior to reporting) on a standard Sinatra app. So it is not just a "hello world" symptom.
  2. I hope this new puma behavior can somehow be fixed, reverted, or opted-out of. In my case, I had to lock puma version to 4.x since it was noticeable, and other developers complained.
  3. During development, in some cases I find myself having to restart the server several times during a debug / development session (for things that are not reloaded), and this shutdown lag makes it feel very sluggish.
  4. If there is no choice but to keep this behavior, I would at least expect a subsequent Ctrl+C to do a "hard stop" and exit immediately.

@cjlarose
Copy link
Member

cjlarose commented Oct 2, 2020

I observed this behavior (prior to reporting) on a standard Sinatra app. So it is not just a "hello world" symptom.

This is good to know. It's possible something else is going on in that case. It'd be nice to get a reproducible test case for a more real-world app, but I understand if that's difficult if it's proprietary.

I hope this new puma behavior can somehow be fixed, reverted, or opted-out of. In my case, I had to lock puma version to 4.x since it was noticeable, and other developers complained.

Is this an issue mostly for developers running the puma server on their own machines? I could see that being annoying. In your case, I would definitely recommend taking a look at the force_shutdown_after config option, which allows you specify that you want the puma server to shutdown immediately when requested

puma/lib/puma/dsl.rb

Lines 241 to 260 in 4be4069

# How long to wait for threads to stop when shutting them
# down. Defaults to :forever. Specifying :immediately will cause
# Puma to kill the threads immediately. Otherwise the value
# is the number of seconds to wait.
#
# Puma always waits a few seconds after killing a thread for it to try
# to finish up it's work, even in :immediately mode.
# @see Puma::Server#graceful_shutdown
def force_shutdown_after(val=:forever)
i = case val
when :forever
-1
when :immediately
0
else
Float(val)
end
@options[:force_shutdown_after] = i
end

If there is no choice but to keep this behavior, I would at least expect a subsequent Ctrl+C to do a "hard stop" and exit immediately.

I could see this also being useful.

@nateberkopec
Copy link
Member

nateberkopec commented Oct 2, 2020

I hope this new puma behavior can somehow be fixed, reverted, or opted-out of.

I suggest modifying force_shutdown_after to a small value when in development. The rationale for why Puma works the way it does makes total sense to me from a production point of view.

During development, in some cases I find myself having to restart the server several times

I agree it's an annoyance in development.

I would at least expect a subsequent Ctrl+C to do a "hard stop" and exit immediately.

CTRL-C sends SIGINT to the process. We have a pretty hard contract and docs around what Puma does when that is sent, because 99% of people manage Puma processes in production with these signals.

For me, three action items:

  1. For @DannyBen, does force_shutdown_after solve your issue?
  2. Question: Can we set force_shutdown_after to a low number predictably/safely when we know we're in development mode so people don't have to config this themselves? In any case, this change would be delayed until 6.0.
  3. Docs: we should surface this config option higher and document it better? Readme entry? Maybe PR to Rails to add it to the default config?

@cjlarose
Copy link
Member

cjlarose commented Oct 2, 2020

Docs: we should surface this config option higher and document it better? Readme entry? Maybe PR to Rails to add it to the default config?

In my own experience, I don't often need to kill and restart puma when I'm working on a Rails app, since Rails has support for autoloading whenever you change controllers/models/etc. I could see it being worth mentioning for non-Rails apps, though.

@nateberkopec
Copy link
Member

Actually, fourth concern: how does this Chrome behavior affect single-threaded apps in production?

It effectively halves one's capacity. That's kind of severe. For people running Puma with many threads, there's no issue, because nothing ever comes down the pipe so that thread in the threadpool just idles and doesn't use any GVL time.

@nateberkopec nateberkopec changed the title Ctrl+C takes a long time to quit in 5.0.x if request from Chrome is open Chrome opens two connections to Puma but sends data down one Oct 2, 2020
@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

It'd be nice to get a reproducible test case for a more real-world app

I can try and create a sample that serves more assets in its root page.

Is this an issue mostly for developers running the puma server on their own machines? I could see that being annoying. In your case, I would definitely recommend taking a look at the force_shutdown_after config option

Yes and no. From developer standpoint, the issue is obvious. But I also suspect this might cause delays when updating the apps in production. Is there a documentation on how to use force_shutdown_after in Sinatra / Rails?

And for @nateberkopec :

in regards to the Ctrl+C - notice I mentioned subsequent Ctrl+C. Not sure it changes anything, but just in case you missed it.

@nateberkopec
Copy link
Member

in regards to the Ctrl+C - notice I mentioned subsequent Ctrl+C. Not sure it changes anything, but just in case you missed it.

Noted. SIGINT is currently an idempotent operation in Puma, and I think all of our signals probably will remain so for production stability and predictability.

@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

And BTW, the last rename of the issue title will make it harder for people to find the same problem.

@cjlarose
Copy link
Member

cjlarose commented Oct 2, 2020

Actually, fourth concern: how does this Chrome behavior affect single-threaded apps in production?
It effectively halves one's capacity. That's kind of severe. For people running Puma with many threads, there's no issue, because nothing ever comes down the pipe so that thread in the threadpool just idles and doesn't use any GVL time.

I was thinking about this a little bit too, but I don't think it's that bad. Even if you have puma running with max_threads set to 1, the Reactor is in a separate thread (assuming queue_requests is on, which is the default). Idle connections just sit in the Reactor while threads in the thread pool are working on the connections that are actually servicing requests. I don't think Chrome's behavior should have any noticeable impact on capacity/throughput.

@nateberkopec
Copy link
Member

I was thinking about this a little bit too, but I don't think it's that bad. Even if you have puma running with max_threads set to 1, the Reactor is in a separate thread (assuming queue_requests is on, which is the default). Idle connections just sit in the Reactor while threads in the thread pool are working on the connections that are actually servicing requests. I don't think Chrome's behavior should have any noticeable impact on capacity/throughput.

Unfortunately I think this is not the case here. Take a look at @DannyBen's thread-backtraces output, you'll see he has two threads active in the threadpool.

@cjlarose
Copy link
Member

cjlarose commented Oct 2, 2020

Unfortunately I think this is not the case here. Take a look at @DannyBen's thread-backtraces output, you'll see he has two threads active in the threadpool.

I think that's just because the server is shutting down. As soon as the Reactor is shut down, it pushes all connections that it had alive into the threadpool. Under normal operating conditions, idle connections don't occupy threads in the threadpool.

You're right, though, that if a server had many idle connections at shutdown time, all of them would be pushed into the threadpool. And if that threadpool has only one thread, you basically have to wait for each thread to timeout (according to SHUTDOWN_GRACE_TIME = 5 seconds), in serial, assuming you don't have force_shutdown_after configured. That seems like an issue, for sure.

@nateberkopec
Copy link
Member

@schneems suddenly a connection to The Old H13 Problem ^^ https://devcenter.heroku.com/articles/error-codes#h13-connection-closed-without-response

@DannyBen
Copy link
Author

DannyBen commented Oct 2, 2020

I believe the below is a good stand-in for a "real app"?

The root page serves multiple images, so it should - according to the above discussion - occupy this additional connection?

From my tests, it is even worse. Shutdown takes much longer than it took with the simpler app.

Just save it and run ruby server.rb.

# server.rb
require 'bundler/inline'
require 'sinatra'

gemfile do
  source "https://rubygems.org"

  gem 'sinatra'
  gem 'puma', '~> 5.0'
  gem 'icodi'
end

set :port, 3000
set :bind, '0.0.0.0'

get '/' do
  (1..10).map { |i| "<img width=80 src='/image#{i}'>" }.join "\n"
end

get '/image*' do
  content_type 'image/svg+xml'
  Icodi.new.render
end

EDIT

Interesting observations:

  • If you run the server, visit the page in chrome once only and press Ctrl+C - shutdown halts for a long time.
  • If you run the server, visit the page twice and then press Ctrl+C - shutdown is immediate
  • If you run the server, visit the page once, then press Ctrl+C, and while it is "waiting" refresh the page - puma shuts down immediately - not before serving only part of the assets.

So, it seems like the reliability of the production environment suffers from this change as well (consistently reproducible).

@nateberkopec
Copy link
Member

What if we simplified can_close? to just @parsed_bytes == 0? Remove the requirement to serve a request.

The RFC talks through some scenarios.

@nateberkopec
Copy link
Member

Actually, I don't like how much that rolls back #2122. But maybe even a decrease in the default first data timeout might be appropriate? 30 seconds is a long time to wait for a client to say something.

@cjlarose
Copy link
Member

cjlarose commented Oct 23, 2020

Yeah, I think the changes in #2122 were valuable because they fixed a bunch of concurrency bugs related to what happens to requests that were in the Reactor at the time a shutdown begins. The problem this #2122 though is that pushing any connections into the ThreadPool that haven't yet made a request and will never make a request makes it take a long time for the ThreadPool to shut down.

My first thought was that we can potentially be more discerning about which requests we send into the ThreadPool from the Reactor at shutdown time. I'm tempted to do something as aggressive as just removing the can_close? check during shutdown entirely.

diff --git a/lib/puma/server.rb b/lib/puma/server.rb
index fd78f3b1..49c21c54 100644
--- a/lib/puma/server.rb
+++ b/lib/puma/server.rb
@@ -291,7 +291,7 @@ module Puma
     # will wake up and again be checked to see if it's ready to be passed to the thread pool.
     def reactor_wakeup(client)
       shutdown = !@queue_requests
-      if client.try_to_finish || (shutdown && !client.can_close?)
+      if client.try_to_finish
         @thread_pool << client
       elsif shutdown || client.timeout == 0
         client.timeout!

But of course that would mean that if the server had received some bytes of the request, but not all of them, we'd close the connection immediately during shutdown which isn't desirable.

Instead, I think it might be worth reexamining whether or not it really makes sense for us to even force requests into the ThreadPool from the Reactor at shutdown time at all. We can instead let the Reactor do what it does best (waiting for the appropriate amount of time until a connection sends data, then finally passing buffered clients to the ThreadPool) until it has no more connections. I'd love for Reactor#shutdown to just block until all requests leave the Reactor (either because they timed out naturally or because they entered the ThreadPool because the request was ready).

The benefit of that approach is that clients aren't treated differently just because they made a request close to the time the server was about to shut down (as a reminder, this could just be during a hot restart or a phased restart, not necessarily someone killing the server entirely with a Ctrl-C). We'd still give clients the normal amount of time to send data--the full PERSISTENT_TIMEOUT (20 seconds) for connections that have already made a request or the full FIRST_DATA_TIMEOUT (30 seconds) for connections that haven't.

In practice, though, this can be annoying because phased restarts on a single worker, hot restarts, and explicit shutdowns (Ctrl-C) could all take 30 seconds in the worst case. I think it's worth thinking about these cases separately since they have different impacts for users and it's worth thinking about what puma users expect (trying to match closely the behavior prior to #2122).

Before #2122, when a shutdown started, we effectively closed all connections in the Reactor immediately. One way of thinking about this is that the Reactor was behaving as though PERSISTENT_TIMEOUT and FIRST_DATA_TIMEOUT were both 0 seconds for all connections. This is great for the Ctrl-C experience since the server shuts down quickly, but bad for phased restarts and hot restarts since it doesn't really give the opportunity for clients to send their requests if they connected just shortly before a shutdown started. I think that, essentially, we want to be more impatient with regard to PERSISTENT_TIMEOUT and FIRST_DATA_TIMEOUT once a shutdown begins, but not so impatient that we aggressively kill legitimate connections.

I think one kinda elegant solution to this problem is that during a shutdown, the Reactor should just set the timeout_at time to a reasonable short timeout (maybe default to 1 second, let's call it shutdown_data_timeout) for all connections in its @timeout array. This would make it so that if a connection continues to provide chunks of data (with no more than 1 second between chunks), then the request will be served a response. But if the request either stops sending data or if it never sent any data at all, we only end up waiting for at most 1 second during a shutdown.

An improvement on this system would be use to separate shutdown_data_timeout for the Ctrl-C case and the phased restart or hot restart case. I think for the Ctrl-C case, a very short shutdown_timeout is fine (even 0 seconds is probably fine) since the user just wants to kill those connections, returning us to the Ctrl-C behavior prior to #2122. But for phased restarts and hot restarts, we can use a longer shutdown_data_timeout (maybe default it to 5 seconds), since again, we want to give enough time for legitimate requests that were just unlucky that they made a request close to the time the server was shutting down.

Hopefully that makes sense. Also curious about @wjordan 's thoughts.

@wjordan
Copy link
Contributor

wjordan commented Oct 23, 2020

Also curious about @wjordan 's thoughts.

Here's a few thoughts/responses:

I would at least expect a subsequent Ctrl+C to do a "hard stop" and exit immediately.

CTRL-C sends SIGINT to the process. We have a pretty hard contract and docs around what Puma does when that is sent, because 99% of people manage Puma processes in production with these signals.

I actually think this change could be interesting:

  • SIGINT is specifically designed as a 'Terminal interrupt' / 'Interactive attention' signal triggered by keyboard input (Ctrl-C). I doubt anyone is managing Puma in production in a foreground process, so I think it's fine to tailor SIGINT behavior to interactive use-cases.
  • The current behavior of SIGINT is nearly identical to SIGTERM, so if anyone's custom Puma-automation script is sending SIGINT (e.g., kill -INT $PID), it could be changed to send SIGTERM instead (and probably should anyway).
  • It's a fairly-common (though not universal) application convention for a second SIGINT to terminate the process more quickly (this convention may date back to default behavior of early UNIX systems). In any case I think it would be familiar behavior to handle a second SIGINT in this way, and expected by some Puma users.

Also worth noting that Ctrl-\ sends SIGQUIT ('Terminal quit signal') which exits immediately, as another workaround for situations where Ctrl-C doesn't shut down fast enough in iterative development.

What if we simplified can_close? to just @parsed_bytes == 0? Remove the requirement to serve a request.

The RFC talks through some scenarios.
Actually, I don't like how much that rolls back #2122.

I think this would probably be acceptable- although RFC 2616 does say 'Servers SHOULD always respond to at least one request per connection, if at all possible,' it looks like this requirement was removed from the updated RFC 7230, with the note that "some extraneous requirements about when servers are allowed to close connections prematurely have been removed."

As far as rolling back #2122, it's probably more important to wait on open connections that already sent part of a request than it is to wait for open connections that haven't sent any data at all.

But maybe even a decrease in the default first data timeout might be appropriate? 30 seconds is a long time to wait for a client to say something.

Varnish defaults its timeout_idle to 5 seconds, for what it's worth. However, in Puma the same first_data_timeout value is currently also used to set a timeout between bytes in a request that has already started being sent (equivalent to between_bytes_timeout in Varnish, which defaults to 60 seconds). Some changes would be needed before aggressively decreasing this timeout for one case and not the other.

Instead, I think it might be worth reexamining whether or not it really makes sense for us to even force requests into the ThreadPool from the Reactor at shutdown time at all. [...] I'd love for Reactor#shutdown to just block until all requests leave the Reactor

Interesting and probably feasible to simplify the server logic along these lines, but not really relevant to the current issue- graceful shutdown would wait on open connections just the same whether they are idling in the Reactor or the ThreadPool.

I think one kinda elegant solution to this problem is that during a shutdown, the Reactor should just set the timeout_at time to a reasonable short timeout [...]
An improvement on this system would be use to separate shutdown_data_timeout for the Ctrl-C case and the phased restart or hot restart case.

I think dropping open+unused connections at shutdown should be absolutely fine, no extra shutdown-specific (and/or interrupt-specific) timeouts should be necessary.

@cjlarose
Copy link
Member

Makes sense! I can open a PR that just modifies Client#can_close? to remove the check that ensures that we've served the client a request.

@nateberkopec
Copy link
Member

Ok I'm caught up here - let's do it @cjlarose

@Fjan
Copy link

Fjan commented Oct 26, 2020

Just leaving this here for reference: It appears the Chrome pre-connects that may be causing this issue have been made more aggressive in the February release, which may be why it wasn't noticed earlier. https://bugs.chromium.org/p/chromium/issues/detail?id=85229

@cjlarose
Copy link
Member

cjlarose commented Nov 7, 2020

@nateberkopec would you mind changing the title of this issue back to "Ctrl+C takes a long time to quit in 5.0.x"

I think it's more likely to be the issue title folks search for (#2484 is an example of someone bringing up the same issue).

@joshuapinter
Copy link

@cjlarose Just to clarify, my issue doesn't appear to be the same as this as my issue(s) is still present in the master branch.

@nateberkopec nateberkopec changed the title Chrome opens two connections to Puma but sends data down one Ctrl+C takes a long time to quit in 5.0.x Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants