Pass queued requests to thread pool on server shutdown #2122

wjordan · 2020-02-20T23:37:29Z

Description

Fixes a concurrency bug during server shutdown when queue_requests is enabled (the default), which can can be minimally reproduced by the following sequence:

s1 and s2 both open client connections
s1 sends a request
server shutdown is triggered (but does not complete while s1's request is still in process)
s2 sends a request

Expected: s2 receives a successful response
Actual: s2 receives a 500 response

Here is the test implementation, currently failing on master:

puma/test/test_puma_server.rb

Lines 771 to 788 in 17af6cf

    
           def test_shutdown_queued_request 
        
             server_run app: ->(env) { 
        
               sleep 3 
        
               [204, {}, []] 
        
             } 
        
             s1 = send_http "GET / HTTP/1.1\r\n\r\n" 
        
             s2 = send_http "GET / HTTP/1.1\r\n" 
        
             sleep 1 
        
             @server.stop 
        
             sleep 1 
        
             s2 << "\r\n" 
        
             assert_match /204/, s1.gets 
        
             assert_match /204/, s2.gets 
        
           end

The underlying issue is that in a 'graceful' shutdown, the server stops accepting connections and the thread pool stops accepting new work while it waits up to force_shutdown_after seconds to finish processing currently active requests. However, a client request queued in the reactor will be rejected if it gets received after the thread pool stops accepting new work.

To fix, this PR makes a small change to shut down the reactor before the thread pool, and to disable queue_requests and pass along any remaining non-'idle' client connections to the thread pool when the reactor shuts down. This allows graceful shutdown to behave properly without prematurely aborting any requests it shouldn't.

The logic for determining 'idle' clients that can be closed immediately follows guidance from RFC 2616 section 8.1.4:

Servers SHOULD always respond to at least one request per connection,
if at all possible.

The added test depends on the small fix in #2121 to pass properly on Windows, so this branch currently includes that commit as well.

Your checklist for this pull request

I have reviewed the guidelines for contributing to this repository.
I have added an entry to History.md if this PR fixes a bug or adds a feature. If it doesn't need an entry to HISTORY.md, I have added [changelog skip] the pull request title.
I have added appropriate tests if this PR fixes a bug or adds a feature.
My pull request is 100 lines added/removed or less so that it can be easily reviewed.
If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
I have updated the documentation accordingly.
All new and existing tests passed, including Rubocop.

wjordan · 2020-02-20T23:50:31Z

lib/puma/server.rb

@@ -293,8 +293,6 @@ def run(background=true)
        return run_lopez_mode(background)
      end

-      queue_requests = @queue_requests


This removes a change from cc4ad17, since we need to change behavior based on the updated value of the @queue_requests variable after the reactor is shutdown. I assume this was a performance optimization, but I don't know if it actually makes any measurable difference.

nateberkopec · 2020-03-05T17:28:31Z

lib/puma/client.rb

+    # middle of transmitting a response, unless a network or client failure
+    # is suspected.
+    def idle?
+      @requests_served > 0 && !in_data_phase


Why would we not be idle if @requests_served is 0?

I interpreted the RFC statement "Servers SHOULD always respond to at least one request per connection, if at all possible" to mean that the server shouldn't consider unused (@requests_served == 0) connections to be 'idle', so shouldn't force-close them during a graceful shutdown. This makes sense to me- if a client opens a connection it usually makes at least one request pretty soon after.

Maybe #idle? isn't the clearest term for this method? I used it here because it the same RFC section uses it to refer to a connection 'the server has decided to close':

A client, server, or proxy MAY close the transport connection at any
time. For example, a client might have started to send a new request
at the same time that the server has decided to close the "idle"
connection. From the server's point of view, the connection is being
closed while it was idle, but from the client's point of view, a
request is in progress.

I also considered #can_close?, #should_close? or #closeable? if any of those seem more fitting.

Riiight, so if we haven't served any requests, we can just shut down. Got it.

Any of the close methods you proposed I think would be more fitting.

I don't think @requests_served is a good variable name here. It is (from what I understand) a count of the number of requests that have been received from the client, and not the number of requests served from the server. I think a better name might be @client_request_count.

Given that tidbit, I don't think the logic here is entirely correct. I do want to take a step back from the specific implementation though and make sure we're on the same page at a high level.

Given that this on client.rb, I'm OK with the variable name.

evanphx · 2020-03-05T18:18:32Z

lib/puma/server.rb

@@ -422,11 +420,12 @@ def handle_servers

        @events.fire :state, @status

-        graceful_shutdown if @status == :stop || @status == :restart
        if queue_requests


I presume this needs to be @queue_requests given your change. But I'd also rather you not change this to an ivar read and leave it as a local var.

There were actually two separate queue_requests local vars- one within Server#run, and one within Server#handle_servers.

The local var in #run had to be removed in order for this PR to work at all- see my comment at #2122 (review). However, this one in #handle_servers actually doesn't need to be changed as well, so I left it as-is to minimize the diff.

That said, since this local var in #handle_servers is only ever read on server shutdown and probably has no performance impact, I would lean towards removing it for consistency/simplicity if that's your preference.

This explanation works for me.

nateberkopec · 2020-03-05T20:44:26Z

Needs a change on the idle method name
Leave queue requests as local var

schneems

Thanks a ton for your PR. It was really hard to review, mostly due to the complexity of the subject matter as well as complexity of Puma internals. I can only imagine that it must have taken you much longer to write this PR.

I want to take a step back from the implementation here and talk about some high level questions and goals

Is this a regression?

My high level question is: Did this used to work? Is this a regression or is this something new we're trying to find/fix? It looks like this might be related to #2200 but...I don't see where something changed in the reactor. Did something else get changed somewhere? If so, can we git bisect and understand what broke the behavior?

If you go back to 4.0.0 and run the same test does it pass or fail?

Side note: I would love if this process of seeing when a given test would have worked for failed was easier to do somehow. Committing it into git means we can't git bisect

Regardless of if this was broken or not, another question is: What do we want to happen? While digging in the code I found a few situations, and two of them have semi-undefined behavior:

Scenario A) Client 2 is fully buffered

Client Request 1 comes in and is fully buffered
Client Request 2 comes in and is partially buffered
Server gets shutdown request
Client Request 2 sends rest of its request before shutdown timeout
Server generates and delivers full response of Request 1, sends response
Desired: Reactor passes the request to the thread pool before shutting down for a response

I think we can all agree on this.

Scenario B) Client 2 CAN be (but is not yet) fully buffered

Client Request 1 comes in and is fully buffered
Client Request 2 comes in and is partially buffered
Server gets shutdown request
Client Request 2 sends rest of its request before shutdown timeout
Server generates and delivers full response of Request 1, sends response
Desired: Reactor keeps trying to buffer the request to see if it can pass it to the thread pool to be processed before shutting down?

Is this true? Do we want to do that? The downside is we have no way of knowing if the client will ever send the rest of the request in time. How do we detect the difference between scenario B to scenario C where we need to serve need a default response

Scenario C) Client 2 cannot fully buffered

Client Request 1 comes in and is fully buffered
Client Request 2 comes in and is partially buffered
Server gets shutdown request
Client Request 2 DOES NOT send rest of its request
Server waits for full response of Request 1, sends response
Desired: ???

If we pass a partial response to the threadpool for processing, there's no guarantee that it can do anything with it, it might be a half formed json body, that cannot be decoded. While RFC 2616 says we "should" send at least one response, what should we send? If we pass it to the thread pool in this state it's going to be a 500. I think that's not accurate. A 503 would perhaps be more appropriate.

Questions

Is this a regression?
If you pass a non-fully buffered client request to the thread pool what happens? Does it try to buffer it? cc/ @nateberkopec @evanphx
What do we want to do when the client can be fully buffered? (scenario B)
What do we want to do when the client cannot be fully buffered? (scenario C)

schneems · 2020-04-02T19:38:55Z

lib/puma/client.rb

+    # middle of transmitting a response, unless a network or client failure
+    # is suspected.
+    def idle?
+      @requests_served > 0 && !in_data_phase


I don't think @requests_served is a good variable name here. It is (from what I understand) a count of the number of requests that have been received from the client, and not the number of requests served from the server. I think a better name might be @client_request_count.

Given that tidbit, I don't think the logic here is entirely correct. I do want to take a step back from the specific implementation though and make sure we're on the same page at a high level.

Cover various scenario combinations related to timeout settings, queue_requests configuration, and post/get request type.

wjordan · 2020-04-06T18:33:33Z

My high level question is: Did this used to work? Is this a regression or is this something new we're trying to find/fix?

As far as I can tell, this never used to work since the reactor implementation has existed. For example, on bb2aeb8 (first commit where the queue_requests configuration option is available), the test in this PR fails with queue_requests = true and passes with queue_requests = false.

I only encountered this issue as an intermittent unit-test failure while working on #2123, and worked backwards from there to create a minimal reproducible failing test and fix. In any case it seems possible that #2200 is caused by this underlying issue, though I can't be sure.

Regardless of if this was broken or not, another question is: What do we want to happen?
What do we want to do when the client can be fully buffered? (scenario B)
What do we want to do when the client cannot be fully buffered? (scenario C)

I wasn't sure how Scenario 'A' differed from 'B' (identical except for the 'desired' part?), so I'll just address B and C.

For B, I agree on behavior - for a graceful shutdown, the server should wait to finish receiving the request, and send a response.

However, the implementation is a bit different. When the server gets a shutdown request, it immediately shuts the reactor down (passing pending requests to the thread pool) and sets queue_requests to false. The thread pool then continues reading the partially-buffered client requests and processing responses in its active threads, using the queue_requests = false code path as though the reactor never existed.
For C, the difference from B is that the server doesn't wait forever for the request to finish, it closes the connection after a configurable timeout. (The HTTP spec says that 'Servers will usually have some time-out value beyond which they will no longer maintain an inactive connection,' and allows that 'A client, server, or proxy MAY close the transport connection at any time.')

In Puma, this timeout is configurable by the lesser of first_data_timeout (default 30 seconds) or force_shutdown_after (default forever)

Also, historically (since Add status 408 for when server times out waiting for body content. #451), if a request is in the 'data phase' (request headers have been sent and a request body is in-flight) when it times out, Puma returns a 408 response before it drops the connection instead of just dropping without any response. (Call this Scenario C2) I think this is a minor detail, but ideally behavior during shutdown should be consistent with this existing behavior.
Finally, worth noting that if the server is busy with a response when the force-shutdown timeout completes, the server responds with a 503 error (uncaught Puma::ThreadPool::ForceShutdown exception). Worth pointing this out as a separate related scenario (call it Scenario D).

To clarify the behavior across all combinations of scenarios, I added a few extra test cases to this PR. Let me know if these are clear enough and if the expected behavior is reasonable, or if there are any additional scenarios you think should be covered.

If you pass a non-fully buffered client request to the thread pool what happens? Does it try to buffer it? cc/ @nateberkopec @evanphx

Yes, this is what happens when queue_requests is set to false, and what this PR does when the server gets a shutdown request.

Update method documentation based on feedback.

evanphx

Thanks for the hard work. You arrived at a very understandable place, looks good!

nateberkopec · 2020-04-10T07:04:37Z

👏

aaronjensen · 2020-04-27T19:44:20Z

🎊 Is there a timeframe for the next release of puma?

nateberkopec · 2020-04-27T23:14:18Z

Basically the two PRs listed in the milestone have to get merged: https://github.com/puma/puma/milestone/7

aaronjensen · 2020-04-28T00:05:08Z

Ah, major release is next. Okay, thanks.

wjordan force-pushed the empty_reactor_on_shutdown branch from b379b8a to 17af6cf Compare February 20, 2020 23:42

wjordan changed the title ~~Empty reactor on shutdown~~ Pass queued requests to thread pool on server shutdown Feb 20, 2020

wjordan commented Feb 20, 2020

View reviewed changes

wjordan mentioned this pull request Feb 21, 2020

Faster phased restart and worker timeout #2123

Closed

8 tasks

nateberkopec added the bug label Feb 21, 2020

nateberkopec requested a review from evanphx February 21, 2020 14:56

nateberkopec added the waiting-for-review Waiting on review from anyone label Feb 21, 2020

nateberkopec reviewed Mar 5, 2020

View reviewed changes

evanphx requested changes Mar 5, 2020

View reviewed changes

nateberkopec added waiting-for-changes Waiting on changes from the requestor and removed waiting-for-review Waiting on review from anyone labels Mar 5, 2020

nateberkopec mentioned this pull request Mar 25, 2020

Still getting H13s during autoscaling #2200

Closed

schneems reviewed Apr 2, 2020

View reviewed changes

wjordan force-pushed the empty_reactor_on_shutdown branch 3 times, most recently from e76799b to 9a0e207 Compare April 4, 2020 03:43

wjordan added 2 commits April 6, 2020 10:19

Pass queued requests to thread pool on server shutdown

4aa0f6b

handle ThreadPool::ForceShutdown when waiting for client request

ee3f0c9

wjordan force-pushed the empty_reactor_on_shutdown branch from 9a0e207 to 941262e Compare April 6, 2020 17:58

Add extra shutdown-request test cases

d8b2680

Cover various scenario combinations related to timeout settings, queue_requests configuration, and post/get request type.

wjordan force-pushed the empty_reactor_on_shutdown branch from 941262e to d8b2680 Compare April 6, 2020 18:21

Rename #idle? to #can_close?

9649606

Update method documentation based on feedback.

nateberkopec added waiting-for-review Waiting on review from anyone and removed waiting-for-changes Waiting on changes from the requestor labels Apr 7, 2020

evanphx self-requested a review April 10, 2020 02:24

evanphx approved these changes Apr 10, 2020

View reviewed changes

nateberkopec merged commit 19b2a21 into puma:master Apr 10, 2020

cjlarose mentioned this pull request Sep 22, 2020

Requests fail during phased restarts, hot restarts, and shutdown #2337

Closed

This was referenced Oct 1, 2020

Reactor refactor #2279

Merged

Ctrl+C takes a long time to quit in 5.0.x #2398

Closed

Add hot restart integration test [changelog skip] #2417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass queued requests to thread pool on server shutdown #2122

Pass queued requests to thread pool on server shutdown #2122

wjordan commented Feb 20, 2020 •

edited

wjordan Feb 20, 2020

nateberkopec Mar 5, 2020

wjordan Mar 5, 2020 •

edited

nateberkopec Mar 5, 2020

schneems Apr 2, 2020

nateberkopec Apr 7, 2020

evanphx Mar 5, 2020

wjordan Apr 6, 2020

nateberkopec Apr 7, 2020

nateberkopec commented Mar 5, 2020 •

edited

schneems left a comment •

edited

schneems Apr 2, 2020

wjordan commented Apr 6, 2020 •

edited

evanphx left a comment

nateberkopec commented Apr 10, 2020

aaronjensen commented Apr 27, 2020

nateberkopec commented Apr 27, 2020

aaronjensen commented Apr 28, 2020

	def test_shutdown_queued_request
	server_run app: ->(env) {
	sleep 3
	[204, {}, []]
	}

	s1 = send_http "GET / HTTP/1.1\r\n\r\n"
	s2 = send_http "GET / HTTP/1.1\r\n"
	sleep 1

	@server.stop
	sleep 1

	s2 << "\r\n"

	assert_match /204/, s1.gets
	assert_match /204/, s2.gets
	end

Pass queued requests to thread pool on server shutdown #2122

Pass queued requests to thread pool on server shutdown #2122

Conversation

wjordan commented Feb 20, 2020 • edited

Description

Your checklist for this pull request

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjordan Mar 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateberkopec commented Mar 5, 2020 • edited

schneems left a comment • edited

Choose a reason for hiding this comment

Is this a regression?

Scenario A) Client 2 is fully buffered

Scenario B) Client 2 CAN be (but is not yet) fully buffered

Scenario C) Client 2 cannot fully buffered

Questions

Choose a reason for hiding this comment

wjordan commented Apr 6, 2020 • edited

evanphx left a comment

Choose a reason for hiding this comment

nateberkopec commented Apr 10, 2020

aaronjensen commented Apr 27, 2020

nateberkopec commented Apr 27, 2020

aaronjensen commented Apr 28, 2020

wjordan commented Feb 20, 2020 •

edited

wjordan Mar 5, 2020 •

edited

nateberkopec commented Mar 5, 2020 •

edited

schneems left a comment •

edited

wjordan commented Apr 6, 2020 •

edited