Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting first_data_timeout triggers an H18 error on Heroku #2539

Closed
pomartel opened this issue Jan 27, 2021 · 12 comments
Closed

Setting first_data_timeout triggers an H18 error on Heroku #2539

pomartel opened this issue Jan 27, 2021 · 12 comments
Labels

Comments

@pomartel
Copy link

If I set the first_data_timeout config to a value lower than 30 seconds, the connection is closed correctly if no data has been received. However an H18 error still gets triggered by Heroku. If I understand this correctly, the desired behaviour should be to return an HTTP 408 error.

I'm using the latest version of Puma (5.1.1) and here is the Heroku metrics graph showing that H18n errors are triggered after about 5 seconds:

Screen Shot 2021-01-27 at 3 49 16 PM

This is also discussed on Stack Overflow.

@dentarg
Copy link
Member

dentarg commented Jan 27, 2021

@pomartel Can we also see some logs? I see there are H27's too in there, didn't noticed them in the SO post (I didn't look careful enough on the screenshot).

@pomartel
Copy link
Author

pomartel commented Jan 27, 2021

Here are the latest logs that have H18 errors. They all get serviced in about 5 seconds which corresponds to the value I set for first_data_timeout. I don't think the H27 are related:

Jan 27 10:59:02 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/" host=app.do request_id=9a91eef5-dc86-438f-82ad-b8403d5149a9 fwd="176.9.89.48" dyno=web.1 connect=0ms service=5070ms status=503 bytes=71 protocol=http
Jan 27 11:01:16 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=PATCH path="/polls/esail-quiz" host=qz.app.do request_id=14a52eb1-089f-4e75-8156-7cb61de3d14d fwd="109.147.146.153" dyno=web.1 connect=0ms service=6755ms status=503 bytes=71 protocol=https
Jan 27 11:42:13 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/" host=app.do request_id=43ffcaba-1cb5-492f-a627-a8d60cd0d134 fwd="176.9.89.48" dyno=web.1 connect=0ms service=5082ms status=503 bytes=71 protocol=http
Jan 27 13:08:12 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/" host=app.do request_id=7c108263-bac3-4b31-b783-55c5076aceda fwd="176.9.89.48" dyno=web.1 connect=0ms service=5151ms status=503 bytes=71 protocol=http
Jan 27 13:21:55 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/polls/electoral-vote-3108018/forms" host=poll.app.do request_id=4c615739-e12a-49cc-9e16-a0c71910d723 fwd="47.29.21.174" dyno=web.1 connect=0ms service=38396ms status=503 bytes=71 protocol=https
Jan 27 14:09:22 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/" host=app.do request_id=24db1bc6-7913-4fa8-9317-ad2f2b1e07d8 fwd="148.251.137.42" dyno=web.1 connect=2ms service=5442ms status=503 bytes=71 protocol=http
Jan 27 14:52:10 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/polls/2457318/forms" host=qz.app.do request_id=610eb0fa-5b37-4c2b-baaa-3ef96d45036a fwd="88.155.162.219" dyno=web.1 connect=0ms service=8619ms status=503 bytes=71 protocol=https
Jan 27 15:20:22 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/" host=app.do request_id=09dac1c8-1f0b-4f07-ab46-8590ef6695a9 fwd="138.201.76.138" dyno=web.1 connect=1ms service=5111ms status=503 bytes=71 protocol=http
Jan 27 15:44:06 poll-en heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/" host=app.do request_id=1c435535-bfdb-43bb-9890-2e946588e6ae fwd="148.251.231.105" dyno=web.1 connect=0ms service=5092ms status=503 bytes=71 protocol=http

@dentarg
Copy link
Member

dentarg commented Jan 27, 2021

I was thinking if H18 and H27 happens at the same time (have the same request_id), not sure if a request can have more than one Heroku error code but I would look for it :)

@nateberkopec
Copy link
Member

We've currently written the code so that if the client has sent nothing or has sent an incomplete request, we do not write a 408. If the client has written a complete request, we will write a 408.

I think this is actually the exact opposite of what it should be:

The 408 (Request Timeout) status code indicates that the server did
not receive a complete request message within the time that it was
prepared to wait.

I'm not sure what we should do if a client object timeouts after a request has been sent.

@nateberkopec
Copy link
Member

This should be a pretty simple test to write, btw.

@ylecuyer
Copy link
Contributor

I tested this one and reproduced the H18 error. However with some debugging I added, I saw that puma returns a 408 error to heroku so I don't understand why the H18 error is still here 🤷‍♂️

@wjordan
Copy link
Contributor

wjordan commented Apr 19, 2021

We've currently written the code so that if the client has sent nothing or has sent an incomplete request, we do not write a 408. If the client has written a complete request, we will write a 408.

Here is the code's current 408-sending behavior:

puma/lib/puma/client.rb

Lines 213 to 216 in 366496b

def timeout!
write_error(408) if in_data_phase
raise ConnectionError
end

puma/lib/puma/client.rb

Lines 107 to 109 in 366496b

def in_data_phase
!@read_header
end

This is my interpretation:

  • If the client opens a connection and sends nothing before a timeout: nothing
  • If the client sends an incomplete request (specifically, if the server has received the full header and is now reading the body) before a read timeout: 408
  • If the client sends a complete request then keeps the connection open before a timeout: nothing

Related 503-sending behavior:

  • If the client has written a complete request, but the server has a force-shutdown during an app-server response): 503

    puma/lib/puma/request.rb

    Lines 90 to 94 in 366496b

    rescue ThreadPool::ForceShutdown => e
    @events.unknown_error e, client, "Rack app"
    @events.log "Detected force shutdown of a thread"
    status, headers, res_body = lowlevel_error(e, env, 503)

So I think Puma is behaving how we expect (responding with 408 if a request times out when sending a long POST body).

I don't know why Heroku is turning the 408 responses into H18/503. One guess is that Heroku has hard-coded 'request-then-response' logic that doesn't properly handle any 'interrupt response' that closes the socket before a full request body is received (even though it's 'correct' HTTP)? I would be interested in whether the same H18 error could be triggered on Heroku if a 413 Payload Too Large error is returned by an application to interrupt POST requests that are too big.

@wjordan
Copy link
Contributor

wjordan commented Apr 19, 2021

Note issue #2574 may also be in play here - since v5.0.3, the first_data_timeout setting unintentionally works as a 'total request timeout' instead of an first/in-between-bytes timeout which is how it behaved previously- so a long POST upload taking more than 5 seconds would be returning 408 timeouts (where it should only timeout if a POST upload doesn't send any data for more than 5 seconds).

@dentarg
Copy link
Member

dentarg commented Apr 19, 2021

I've examined https://devcenter.heroku.com/articles/http-routing but I can't really spot any clues about what we are seeing here

I would be interested in whether the same H18 error could be triggered on Heroku if a 413 Payload Too Large error is returned by an application to interrupt POST requests that are too big.

I tried this, with first_data_timeout set to 1 and doing curl -s -v -o /dev/null -X POST -T <100M file> <URL> to an Heroku app. HTTP/1.1 503 Service Unavailable is returned and the H13 error is logged:

2021-04-19T20:56:41.833382+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=POST path="/Geekbench-5.1.0-Mac.zip" host=puma-issue-2539.herokuapp.com request_id=3ab60f1d-d974-4187-ab88-20da20e72a06 fwd="..." dyno=web.1 connect=1ms service=1007ms status=503 bytes=0 protocol=https
2021-04-19T20:57:00.744623+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=POST path="/Geekbench-5.1.0-Mac.zip" host=puma-issue-2539.herokuapp.com request_id=7d1d414f-86a4-433e-9412-a9bac986b504 fwd="..." dyno=web.1 connect=0ms service=1020ms status=503 bytes=0 protocol=https

Interestingly, after heroku labs:enable http-end-to-end-continue it turned "413 Payload Too Large" into H18 again:

2021-04-19T21:02:20.418884+00:00 heroku[router]: sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/Geekbench-5.1.0-Mac.zip" host=puma-issue-2539.herokuapp.com request_id=51a1693e-ccbf-4399-ad9c-4b4448fd2191 fwd="..." dyno=web.1 connect=0ms service=1009ms status=503 bytes=25 protocol=https
2021-04-19T21:02:51.898649+00:00 heroku[router]: sock=backend at=error code=H18 desc="Server Request Interrupted" method=POST path="/Geekbench-5.1.0-Mac.zip" host=puma-issue-2539.herokuapp.com request_id=9d46a67d-4687-4d63-9e79-2a386f13c0b7 fwd="..." dyno=web.1 connect=1ms service=1022ms status=503 bytes=25 protocol=https

heroku labs:enable http-end-to-end-continue didn't change anything for "408 Request Timeout", still H18.

cc @schneems do you have more info about the Heroku router internals? or who the people to ping would be?

@dentarg
Copy link
Member

dentarg commented Apr 19, 2021

returned and the H13 error is logged

Scratch that, H18 is logged. H13 was me doing this too quickly and not adding 413 the ERROR_RESPONSE hash.

@wjordan
Copy link
Contributor

wjordan commented Apr 20, 2021

I found a StackOverflow answer that confirmed the Heroku issue with support, and it matches my initial guess:

There is another case that can cause this error that is not documented by Heroku.
If your server responds to the request and closes the connection without reading the request body, then the router will respond with the H18 error. The Heroku router logs will show sock=backend.
In this case your server is not doing anything wrong with respect to the HTTP spec. It's a bug with the Heroku router.
I contacted Heroku technical support about this, and they confirmed the issue. It will be fixed in a new version of the router they are implementing.
The workaround is to always ensure the request body is read on the backend server before closing the connection.

So Puma could theoretically implement a 'Heroku-H18 workaround' option that keeps reading the complete request body even if a 4xx error response is returned before the request completes, but that would defeat the purpose of the 4xx error interrupting the request (which is to prevent unnecessary data from being transferred in the first place). So I think the current behavior is fine from Puma's perspective, and this bug is pretty clearly Heroku's responsibility to fix.

@nateberkopec nateberkopec added invalid and removed bug labels Apr 23, 2021
@nateberkopec
Copy link
Member

Closing as "not our problem": if you believe differently, please continue the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants