Requests to Puma hanging due to issues with Keep-Alive, Content-Length, and HTTP_VERSION headers #1565

aardvarkk · 2018-04-18T20:34:54Z

Issue

I have a situation in which requests to a Puma server are hanging when the Connection: Keep-Alive header was passed. I noticed that the response Puma was returning to the client didn't contain a Content-Length, and also didn't have chunked transfer encoding. I don't believe that's a valid response, so that's the first issue.

Digging a bit deeper into Puma's source, I noticed a strange thing on line 672 of server.rb. Even though the Version header contained HTTP/1.1, somehow the request on the Puma side showed the value HTTP/1.1, HTTP/1.1. See the image:

Since line 672 of server.rb checks for exact equality (http_11 = if env[HTTP_VERSION] == HTTP_11), it doesn't believe the request is HTTP/1.1. It then goes on to respond without either a chunked transfer encoding or a content length, which I believe is invalid. Unfortunately I don't know how to dig deeper to figure out where the "doubled" version is coming from. When I use Wireshark to look at the data on the wire, I don't see the doubling of the HTTP version.

Here's the request:

Here's the response. Notice that the response is HTTP/1.0, the "Connection: Keep-Alive" is still there, but there's no Content-Length.

Sorry I don't have an easy way to reproduce this, but I've tried to include as much information as I can. There seem to be two issues on Puma's side:

The HTTP_VERSION header parsing seems to somehow get doubled or reach an invalid state despite what I believe to be a valid request.
Puma returns a Connection: Keep-Alive response in HTTP/1.0 without a Content-Length, which I don't believe is a valid output.

I can work around this issue by not passing Connection: Keep-Alive to the server, but it seems like there's an underlying issue in Puma that should probably be addressed.

System configuration

Ruby version: 2.3.5
Rails version: 4.2.10
Puma version: 3.11.4

The text was updated successfully, but these errors were encountered:

aardvarkk · 2018-04-18T22:12:03Z

This may be related to the fact that the request has both a Version header and the version also being passed as part of the first line of the HTTP request. Perhaps they both get parsed (and combined), even though I don't think Version is a real HTTP header?

aardvarkk · 2018-04-18T22:18:08Z

I have confirmed that this issue is avoided if the request does not include a Version header. That being said, this page states that the Version header is provisionally valid, but is supposed to indicate a version of an "evolving object", not the HTTP version. It appears to be processed as if it's a second specifier for the HTTP version, which would be incorrect.

HoneyryderChuck · 2018-04-19T09:53:36Z

Interesting bug. HTTP_VERSION shouldn't be used to keep the http version header (not required by spec, at least). However puma, unicorn and thin (at least) set it. And I bet that some middlewares rely on it.

Good luck 👍

dannyfallon · 2018-04-19T16:13:02Z

Unfortunately I don't know how to dig deeper to figure out where the "doubled" version is coming from

It comes from here, set via this and lives in Ragel here and here

As part of the HTTP spec every client-provided header gets transformed to uppercase, has - replaces with _ and gets the prefix of HTTP_ added. In the case of a Version header that'll transform it to HTTP_VERSION.

The problem is we're already setting HTTP_VERSION to HTTP/1.1 in Puma (via the C/Java extensions) we fall into the RFC's behaviour for when you set the same header more than once: We add the client header value to create HTTP/1.1, HTTP/1.1. You could send a request with Version:Blabbedy and get HTTP_VERSION: HTTP/1.1, Blabbedy 😬

Though it's not in the Rack spec, this is also a problem in Rack itself (rack/rack#970). We can fix Puma and we can open PRs on Thin/Unicorn too but I'd love some thoughts from the maintainers of Puma on it before I do

guizmaii · 2019-04-23T10:02:46Z

Hi everyone,

Is there any news on that subject ? Can we help ? :)

headius · 2019-05-02T20:42:06Z

Adding some research here from my exploration...

There may be some interaction with chunked encodings, as was the case for this issue with node.js: nodejs/node-v0.x-archive#940

I did try playing with the chunked header but it did not appear to help ab -k numbers at least on JRuby + Rails set up.

As another really weird data point, @noahgibbs "RSB" benchmarks seem to work just fine with ab -k in the rack benchmark app:

Keep-Alive requests:    1000000
Total transferred:      100000000 bytes
HTML transferred:       12000000 bytes
Requests per second:    48072.16 [#/sec] (mean)

Throwing it at a generated Rails app has the hanging behavior. I'm not sure what's different between these.

headius · 2019-05-02T20:53:24Z

And for completeness here's ab -k against a simple generated Rails app:

[] ~/projects/rsb/rack_test_app $ ab -c 20 -n 1000 http://localhost:3000/posts/1
...
Complete requests:      1000
Failed requests:        0
Total transferred:      1554596 bytes
HTML transferred:       840000 bytes
Requests per second:    597.56 [#/sec] (mean)
Time per request:       33.470 [ms] (mean)
Time per request:       1.673 [ms] (mean, across all concurrent requests)
Transfer rate:          907.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5   4.2      5      20
Processing:    10   27   7.0     26      50
Waiting:        7   24   7.5     24      49
Total:         11   32   5.9     32      52

Percentage of the requests served within a certain time (ms)
  50%     32
  66%     34
  75%     35
  80%     37
  90%     40
  95%     44
  98%     46
  99%     48
 100%     52 (longest request)

[] ~/projects/rsb/rack_test_app $ ab -c 20 -n 1000 -k http://localhost:3000/posts/1
...
Complete requests:      204
Failed requests:        164
   (Connect: 0, Receive: 0, Length: 164, Exceptions: 0)
Non-2xx responses:      40
Keep-Alive requests:    184
Total transferred:      353464 bytes
HTML transferred:       207162 bytes
Requests per second:    8.20 [#/sec] (mean)
Time per request:       2438.527 [ms] (mean)
Time per request:       121.926 [ms] (mean, across all concurrent requests)
Transfer rate:          13.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0       4
Processing:     6 2000 6021.9     19   20230
Waiting:        0   17  10.9     16      59
Total:          6 2000 6021.8     20   20230

Percentage of the requests served within a certain time (ms)
  50%     20
  66%     23
  75%     27
  80%     32
  90%     61
  95%  20221
  98%  20224
  99%  20228
 100%  20230 (longest request)

noahgibbs · 2019-06-11T14:12:40Z

I have a (maybe?) JRuby-specific variant on this problem to report. When I run a very simple Rack server and use the wrk testing tool (https://github.com/wg/wrk), I get 13335 reqs/sec with CRuby 2.6, but 246.65 reqs/sec with JRuby 9.2.5.0.

One way to repro this bug is to start from public AMI ami-088f9a7191b2befa7. After creating and instance based on it, log in as the "ubuntu" user. You'll have RVM and Ruby available. There's a trivial Rack "Hello, world" app in ~ubuntu/config.ru. From there:

rvm use jruby-9.2.50
puma

# on a different login, or after backgrounding Puma
wrk -d 5 http://localhost:9292

If you substitute "rvm use 2.6.0" for "rvm use jruby-9.2.5.0" you should get around 13,000 reqs/sec. With JRuby it's reliably between 247 reqs/sec and 250 reqs/sec.

The AMI isn't doing anything terribly complicated. It builds its own "wrk" binary because wrk isn't reliably packaged. But it's using normal Ubuntu and RVM.

ioquatix · 2020-01-11T23:49:31Z

Just FYI, in Rack 2.1 we use SERVER_PROTOCOL instead of HTTP_VERSION which might affect this issue.

nateberkopec · 2020-02-28T21:06:25Z

Turned out this was: GHSA-7xx3-m584-x994

ioquatix · 2020-02-28T21:29:22Z

I found this issue when debugging puma's performance curve https://github.com/socketry/falcon-benchmark - I live streamed the investigation here https://www.youtube.com/watch?v=2u6JRvKh7Dg - at the time I didn't really realise it was a serious issue since. So, sorry about that.

Basically, when you look at puma's "latency" in response to changes in concurrency, any server with a fixed pool size should have a linear increase in latency as the number of simultaneous requests increases. If you have 8 threads, and you make 8 requests that take 10ms each, the same configuration with 16 requests should mean the first 8 requests take 10ms each, and the 2nd 8 requests must wait 10ms.

If wrk made 16 connections to puma with 8 threads, puma would accept all 16 connections, but only service 8 of them, and the other 8 connections would just block indefinitely. At the end of the test, you had:

8 connections which had serviced duration/latency connections, So if you ran the test for 1 second, and the latency was 100ms per request, you'd end up with 10 requests per thread = 80 requests in total.
8 connections which stalled, and did not service even one request.

The second batch of connections, the request was sent, but no response was received, Unfortunately wrk drops them from the statistics calculations - wrk only considered complete requests. So, it only reports 100ms of latency, despite half of the connections having infinite latency.

When you look at this on a graph (which you can see I was making on falcon-benchmark), where you'd normally expect latency to increase as the number of concurrent connections to increase, it stayed the same for puma which was super confusing to me. If you throw 8 connections at Puma and it has 8 threads, you expect the latency to be equal to the latency of the request itself. But if you throw more connections at puma, you will encounter queuing latency too, and that wasn't showing up. I made a fork of wrk to reveal these issues: https://github.com/ioquatix/wrk

nateberkopec · 2020-02-28T21:56:54Z

at the time I didn't really realise it was a serious issue

Neither did we! So don't worry about it. Thanks for the writeup.

nateberkopec added the bug label May 9, 2018

nateberkopec mentioned this issue Apr 20, 2019

Bad performance on JRuby with keep-alive #1779

Closed

ioquatix mentioned this issue May 22, 2019

What are the issues with ab? noahgibbs/rsb#1

Closed

nateberkopec mentioned this issue Jul 8, 2019

Persistent Connections monopolize a thread #1835

Closed

nateberkopec closed this as completed Feb 28, 2020

jarthod mentioned this issue May 10, 2021

Keepalive connections blocking threads (again?) #2625

Closed

amuta mentioned this issue May 3, 2022

Puma treats the HTTP protocol version as an header, loses track of real protocol version when there is a client-provided header named 'version' #2870

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests to Puma hanging due to issues with Keep-Alive, Content-Length, and HTTP_VERSION headers #1565

Requests to Puma hanging due to issues with Keep-Alive, Content-Length, and HTTP_VERSION headers #1565

aardvarkk commented Apr 18, 2018 •

edited

aardvarkk commented Apr 18, 2018

aardvarkk commented Apr 18, 2018

HoneyryderChuck commented Apr 19, 2018 •

edited

dannyfallon commented Apr 19, 2018 •

edited

guizmaii commented Apr 23, 2019

headius commented May 2, 2019

headius commented May 2, 2019

noahgibbs commented Jun 11, 2019

ioquatix commented Jan 11, 2020

nateberkopec commented Feb 28, 2020

ioquatix commented Feb 28, 2020

nateberkopec commented Feb 28, 2020

Requests to Puma hanging due to issues with Keep-Alive, Content-Length, and HTTP_VERSION headers #1565

Requests to Puma hanging due to issues with Keep-Alive, Content-Length, and HTTP_VERSION headers #1565

Comments

aardvarkk commented Apr 18, 2018 • edited

Issue

System configuration

aardvarkk commented Apr 18, 2018

aardvarkk commented Apr 18, 2018

HoneyryderChuck commented Apr 19, 2018 • edited

dannyfallon commented Apr 19, 2018 • edited

guizmaii commented Apr 23, 2019

headius commented May 2, 2019

headius commented May 2, 2019

noahgibbs commented Jun 11, 2019

ioquatix commented Jan 11, 2020

nateberkopec commented Feb 28, 2020

ioquatix commented Feb 28, 2020

nateberkopec commented Feb 28, 2020

aardvarkk commented Apr 18, 2018 •

edited

HoneyryderChuck commented Apr 19, 2018 •

edited

dannyfallon commented Apr 19, 2018 •

edited