Prevent connections from entering Reactor after shutdown begins #2377

cjlarose · 2020-09-22T23:36:51Z

Description

This change addresses one source of connection reset errors related to the shutdown process of Puma::Sever. Details about the particular problem are described in #2337 (comment) and in the commit messages.

Why does this PR not include tests?

The most straightforward way to test this change is to add an integration test that repeatedly fires off requests to a server that is restarting via hot restarts (SIGUSR2). The expectation is that no client connection experiences a connection reset error, or better, that every client request gets a successful response.

I've written such an integration test, but unfortunately, there are other sources of connection reset errors that make it so that the test does not pass reliably. I suggest we continue to document and catalog those errors in #2337.

My claim is that this change demonstrably fixes one problem that causes connection reset errors and gets us closer to the point that puma reliably handles all client connections that it accepted before shutting down, even if the change doesn't fix every kind of problem. It is possible to use the reproduction steps in #2337 to verify that the error Error reached top of thread-pool: closed stream (IOError) is resolved by this PR.

Does this change have an effect on puma's maximum throughput?

Not that I've measured. I did a quick test locally by spinning up a puma server

bundle exec ruby -Ilib bin/puma -w 4 test/rackup/hello.ru

Then firing off a bunch of requests

wrk -t20 -c20 -d30s http://localhost:9292

but the results seem to be within the margin of error.

Your checklist for this pull request

I have reviewed the guidelines for contributing to this repository.
I have added an entry to History.md if this PR fixes a bug or adds a feature. If it doesn't need an entry to HISTORY.md, I have added [changelog skip] or [ci skip] to the pull request title.
I have added appropriate tests if this PR fixes a bug or adds a feature.
My pull request is 100 lines added/removed or less so that it can be easily reviewed.
If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
I have updated the documentation accordingly.
All new and existing tests passed, including Rubocop.

When a `Puma::Server` instance starts to shutdown, it sets the instance variable `@queue_requests` to `false`, then starts to shut down the Reactor. The intent here is that after the Reactor shuts down, threads won't have the option of sending their connections to the Reactor (the connection must either be closed, assuming we've written a response to the socket, or the thread must wait until the client has finished writing its request, then write a response, then finally close the connection). This works most of the time just fine. The problem is that there are races in the implementation of the `ThreadPool` where it's possible for a thread to see that `@queue_requests` is `true` before the `Server` shutdown has started, then later try to add the request to the Reactor, not knowing that it's already shutting down. Depending on the precise order of operations, one possible outcome is that the `ThreadPool` executes `@reactor.add` after the `@reactor` has closed its `@trigger` pipe, resulting in the error `Error reached top of thread-pool: closed stream (IOError)`. Clients experience connection reset errors as a result. The fix introduced in this commit is to add a `Mutex` that makes it impossible for a thread in the `ThreadPool` to add a client connection to the `@reactor` after `@queue_requests` has been set to `false`. Co-Authored-By: Leo Belyi <leonid.belyi@mycase.com>

wjordan · 2020-09-23T04:32:01Z

It looks like the solution in this PR ended up quite similar to the portion of #2271 involving @queue_mutex:

Add @queue_mutex and helper methods if_queue_requests and queue_client to Server to handle an edge case where the reactor shuts down while a thread is concurrently queuing a client.

I think I triggered this same race condition when running portions of the test suite more quickly.

Based on the comments in that PR, if I recall the refactoring in #2279 may also fix this same race condition at the source in the Reactor, rather than in the Server through an extra mutex around its operations.

nateberkopec · 2020-09-23T14:34:04Z

I recall the refactoring in #2279 may also fix this same race condition at the source in the Reactor, rather than in the Server through an extra mutex around its operations.

That would be great, but that PR is a bigger project. I'll merge this for now, if we can get rid of the extra mutex eventually, great.

Modifies `TestPumaServer#shutdown_requests` to pause `Reactor#add` until after shutdown begins, to ensure requests are handled correctly for this edge case. Adds unit-test coverage for the fix introduced in puma#2377 and updated in puma#2279.

) * Test adding connection to Reactor after shutdown Modifies `TestPumaServer#shutdown_requests` to pause `Reactor#add` until after shutdown begins, to ensure requests are handled correctly for this edge case. Adds unit-test coverage for the fix introduced in #2377 and updated in #2279. * Fix Queue#close implementation for Ruby 2.2 Allow `ClosedQueueError` to be raised when `Queue#<<` is called. * Pass `@block` directly instead of `@block.method(:call)`

nateberkopec added bug waiting-for-review Waiting on review from anyone labels Sep 23, 2020

nateberkopec merged commit 7d4df29 into puma:master Sep 23, 2020

wjordan mentioned this pull request Sep 23, 2020

Better error handling during force shutdown #2271

Merged

8 tasks

cjlarose mentioned this pull request Sep 23, 2020

Requests fail during phased restarts, hot restarts, and shutdown #2337

Closed

wjordan mentioned this pull request Sep 23, 2020

[Draft] Various test-suite speed/reliability improvements #2241

Closed

8 tasks

nateberkopec mentioned this pull request Sep 30, 2020

Reactor refactor #2279

Merged

8 tasks

cjlarose mentioned this pull request Oct 7, 2020

Add hot restart integration test [changelog skip] #2417

Closed

8 tasks

wjordan mentioned this pull request Oct 7, 2020

Test adding connection to Reactor after shutdown [changelog skip] #2418

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent connections from entering Reactor after shutdown begins #2377

Prevent connections from entering Reactor after shutdown begins #2377

cjlarose commented Sep 22, 2020 •

edited

wjordan commented Sep 23, 2020 •

edited

nateberkopec commented Sep 23, 2020

Prevent connections from entering Reactor after shutdown begins #2377

Prevent connections from entering Reactor after shutdown begins #2377

Conversation

cjlarose commented Sep 22, 2020 • edited

Description

Why does this PR not include tests?

Does this change have an effect on puma's maximum throughput?

Your checklist for this pull request

wjordan commented Sep 23, 2020 • edited

nateberkopec commented Sep 23, 2020

cjlarose commented Sep 22, 2020 •

edited

wjordan commented Sep 23, 2020 •

edited