Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent connections from entering Reactor after shutdown begins #2377

Merged
merged 1 commit into from Sep 23, 2020

Conversation

cjlarose
Copy link
Member

@cjlarose cjlarose commented Sep 22, 2020

Description

This change addresses one source of connection reset errors related to the shutdown process of Puma::Sever. Details about the particular problem are described in #2337 (comment) and in the commit messages.

Why does this PR not include tests?

The most straightforward way to test this change is to add an integration test that repeatedly fires off requests to a server that is restarting via hot restarts (SIGUSR2). The expectation is that no client connection experiences a connection reset error, or better, that every client request gets a successful response.

I've written such an integration test, but unfortunately, there are other sources of connection reset errors that make it so that the test does not pass reliably. I suggest we continue to document and catalog those errors in #2337.

My claim is that this change demonstrably fixes one problem that causes connection reset errors and gets us closer to the point that puma reliably handles all client connections that it accepted before shutting down, even if the change doesn't fix every kind of problem. It is possible to use the reproduction steps in #2337 to verify that the error Error reached top of thread-pool: closed stream (IOError) is resolved by this PR.

Does this change have an effect on puma's maximum throughput?

Not that I've measured. I did a quick test locally by spinning up a puma server

bundle exec ruby -Ilib bin/puma -w 4 test/rackup/hello.ru

Then firing off a bunch of requests

wrk -t20 -c20 -d30s http://localhost:9292

but the results seem to be within the margin of error.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added an entry to History.md if this PR fixes a bug or adds a feature. If it doesn't need an entry to HISTORY.md, I have added [changelog skip] or [ci skip] to the pull request title.
  • I have added appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

When a `Puma::Server` instance starts to shutdown, it sets the instance
variable `@queue_requests` to `false`, then starts to shut down the
Reactor. The intent here is that after the Reactor shuts down, threads
won't have the option of sending their connections to the Reactor (the
connection must either be closed, assuming we've written a response to
the socket, or the thread must wait until the client has finished
writing its request, then write a response, then finally close the
connection). This works most of the time just fine.

The problem is that there are races in the implementation of the
`ThreadPool` where it's possible for a thread to see that
`@queue_requests` is `true` before the `Server` shutdown has started,
then later try to add the request to the Reactor, not knowing that it's
already shutting down.

Depending on the precise order of operations, one possible outcome is
that the `ThreadPool` executes `@reactor.add` after the `@reactor` has
closed its `@trigger` pipe, resulting in the error `Error reached top of
thread-pool: closed stream (IOError)`. Clients experience connection
reset errors as a result.

The fix introduced in this commit is to add a `Mutex` that makes it
impossible for a thread in the `ThreadPool` to add a client connection
to the `@reactor` after `@queue_requests` has been set to `false`.

Co-Authored-By: Leo Belyi <leonid.belyi@mycase.com>
@wjordan
Copy link
Contributor

wjordan commented Sep 23, 2020

It looks like the solution in this PR ended up quite similar to the portion of #2271 involving @queue_mutex:

Add @queue_mutex and helper methods if_queue_requests and queue_client to Server to handle an edge case where the reactor shuts down while a thread is concurrently queuing a client.

I think I triggered this same race condition when running portions of the test suite more quickly.

Based on the comments in that PR, if I recall the refactoring in #2279 may also fix this same race condition at the source in the Reactor, rather than in the Server through an extra mutex around its operations.

@nateberkopec nateberkopec added bug waiting-for-review Waiting on review from anyone labels Sep 23, 2020
@nateberkopec
Copy link
Member

I recall the refactoring in #2279 may also fix this same race condition at the source in the Reactor, rather than in the Server through an extra mutex around its operations.

That would be great, but that PR is a bigger project. I'll merge this for now, if we can get rid of the extra mutex eventually, great.

@nateberkopec nateberkopec merged commit 7d4df29 into puma:master Sep 23, 2020
@nateberkopec nateberkopec mentioned this pull request Sep 30, 2020
8 tasks
wjordan added a commit to wjordan/puma that referenced this pull request Oct 7, 2020
Modifies `TestPumaServer#shutdown_requests` to pause `Reactor#add` until after
shutdown begins, to ensure requests are handled correctly for this edge case.
Adds unit-test coverage for the fix introduced in puma#2377 and updated in puma#2279.
wjordan added a commit to wjordan/puma that referenced this pull request Oct 7, 2020
Modifies `TestPumaServer#shutdown_requests` to pause `Reactor#add` until after
shutdown begins, to ensure requests are handled correctly for this edge case.
Adds unit-test coverage for the fix introduced in puma#2377 and updated in puma#2279.
nateberkopec pushed a commit that referenced this pull request Oct 8, 2020
)

* Test adding connection to Reactor after shutdown
Modifies `TestPumaServer#shutdown_requests` to pause `Reactor#add` until after
shutdown begins, to ensure requests are handled correctly for this edge case.
Adds unit-test coverage for the fix introduced in #2377 and updated in #2279.

* Fix Queue#close implementation for Ruby 2.2
Allow `ClosedQueueError` to be raised when `Queue#<<` is called.

* Pass `@block` directly instead of `@block.method(:call)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting-for-review Waiting on review from anyone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants