New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hot_restart_does_not_drop_connections tests [changelog skip] #2423
Conversation
Suggestion: name it |
Good point. Renamed... |
In the test 'socket' loop, the sockets were closed inside a conditional of: socket.close if socket.is_a?(IO) && !socket.closed? Apparently, that wasn't enough (see Actions failed job), so changed to: if socket.is_a?(IO) && !socket.closed?
begin
socket.close
rescue Errno::EBADF
end
end |
paging @cjlarose |
Apologies, I thought I pinged you somewhere over the weekend, but I did not. So, I'd appreciate a review when you can. |
Nice work! Just left some minor comments. |
test/helpers/integration.rb
Outdated
pumactl | ||
end | ||
|
||
def hot_restart_does_not_drop_connections(num_threads: 1, ttl_requests: 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick: I read ttl_requests
as time-to-live requests, which didn't really make sense.
total_requests
would read a little easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def hot_restart_does_not_drop_connections(num_threads: 1, ttl_requests: 500) | ||
skipped = true | ||
skip_on :jruby, suffix: <<-MSG | ||
- file descriptors are not preserved on exec on JRuby; connection reset errors are expected during restarts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the behavior on JRuby should be the same as on Windows: Since exec
doesn't preserve file descriptors on those platforms, puma has to unbind-and-rebind the listening socket, so we lose some connections during restart.
Is there any reason why we run the test on Windows (with the expectation that the number of connection reset errors per thread is bound from above by the number of restarts), but skip the test on JRuby? Seems like we can make the same decision in both places.
Either we can skip on both platforms (connection reset errors are expected on both, so maybe not worth having explicit tests for), or we can run it on both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I lean towards just skipping it on both. Since we expect some connection reset errors on Windows, when this test runs on Windows, it's basically just testing that we can make requests to a running Puma server on that platform and get a successful response; it doesn't seem like it adds a lot of value on top of existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings either way. I doubt there are many publicly facing Windows Puma instances.
Running it locally, I found the reset errors were bounded and a very small percentage of total requests, and I think resets will normally be retried by most clients. Hence, I including Windows in the tests since if the tests start failing, something has changed for the worse. Again, not bothered by removing Windows tests.
Re JRuby and/or TruffleRuby, I'm not sure if I tried them, especially when the code was close to finished...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added JRuby in my fork, it definitely did not pass the new tests. See:
https://github.com/MSP-Greg/puma/actions/runs/302733028
Also, see comment below ('I hate intermittent tests') for more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are somewhat dependent on timing. Restarting on JRuby probably takes a bit longer than MRI Rubies, and the gap during restart will be difficult to account for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. In my fork, I had changed the test to run for a max number of seconds instead of doing a particular number of requests. cjlarose@691a5e4
Just something to consider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of felt that it was working, and along with more runs to see if they're stable, some of the quantities in the tests may be able to change.
I started with 'single' tests, added the threading, then added the 'cluster' tests. I added the debug output at the end of the test log, and noticed that on (probably) the threaded cluster test, all the requests were being processed before the first restart. That's when I kicked the counts up, and later added the small delay in hello_with_delay.ru
.
They're all finishing fairly quickly, and seeing the cluster threaded test complete 3k requests in less than 3 seconds seemed like it could stay that way even though the 3k number could probably drop to 2k.
After seeing some runs during the busy CI time (Thurs & Fri late afternoon), we might be able to add a 'something slowed Puma down' assert. Really don't have anything like that in the test suite.
BTW, thanks for taking a look at it, and your PR's...
I hate intermittent tests. With the change from I added a modified version of I removed all other tests other than the new ones, which should cause them to run parallel. Let me run it in my fork a few times... |
Co-authored-by: Chris LaRose <cjlarose@gmail.com>
I ran the reduced test set twelve times. All Ubuntu/macOS MRI jobs passed. After six, one Windows job failed. made a small rescue adjustment, and the next six Windows runs all passed. Hence, until the next intermittent failure, this seems stable. |
Description
Building on PR #2417, this adds testing for whether restart/USR2 drops connections.
Tests
Adds
hot_restart_does_not_drop_connections
tohelpers/integration.rb
. The method is used in bothtest_integration_cluster.rb
andtest_integration_single.rb
tests. Each file runs the test two ways, first, with all connections created in one thread, and a second test with connections created in five threads.Currently the test is adding output to the bottom 'debug' section, showing resets, restart count, and connections competed after the the first restart.
A file
rackup/hello_with_delay.ru
has been added and used in the test. A normal 'Hello World' app, with a delay of 0.001 before the return. The time can be increased to whatever is appropriate.Not sure about whether the test could also be used to test rolling-restart/USR1.
Integration Tests Misc
Moved
cli_pumactl
fromtest_integration_pumactl.rb
tohelpers/integration.rb
. This was done so it can be used in other integrations tests when signals aren't supported (Windows).Moved
thread_run_refused
fromtest_integration_cluster.rb
tohelpers/integration.rb
. It's being used byhot_restart_does_not_drop_connections
in both 'single' and 'cluster'.Added
if ::Process.respond_to?(:fork)
to the end oftest_integration_cluster.rb
. This omits the skips shown for the tests on platforms that do not supportfork
. Tired of looking thru them on non-MRI failures...Small changes to integration tests as to how 'worker' is used when starting Puma. The was needed to allow
hot_restart_does_not_drop_connections
to work in both 'single and 'cluster' tests.Added a method
fast_write
tohelpers/integration.rb
. It's essentiallyPuma::Request##fast_write
with different error handling. It usessyswrite
instead ofwrite
, which is probably only beneficial with tests that write a large numbers sockets, as in hundreds or more.Your checklist for this pull request
[changelog skip]
or[ci skip]
to the pull request title.[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.