Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests to Puma hanging due to issues with Keep-Alive, Content-Length, and HTTP_VERSION headers #1565

Closed
aardvarkk opened this issue Apr 18, 2018 · 12 comments
Labels

Comments

@aardvarkk
Copy link

aardvarkk commented Apr 18, 2018

Issue

I have a situation in which requests to a Puma server are hanging when the Connection: Keep-Alive header was passed. I noticed that the response Puma was returning to the client didn't contain a Content-Length, and also didn't have chunked transfer encoding. I don't believe that's a valid response, so that's the first issue.

Digging a bit deeper into Puma's source, I noticed a strange thing on line 672 of server.rb. Even though the Version header contained HTTP/1.1, somehow the request on the Puma side showed the value HTTP/1.1, HTTP/1.1. See the image:

image

Since line 672 of server.rb checks for exact equality (http_11 = if env[HTTP_VERSION] == HTTP_11), it doesn't believe the request is HTTP/1.1. It then goes on to respond without either a chunked transfer encoding or a content length, which I believe is invalid. Unfortunately I don't know how to dig deeper to figure out where the "doubled" version is coming from. When I use Wireshark to look at the data on the wire, I don't see the doubling of the HTTP version.

Here's the request:

image

Here's the response. Notice that the response is HTTP/1.0, the "Connection: Keep-Alive" is still there, but there's no Content-Length.

image

Sorry I don't have an easy way to reproduce this, but I've tried to include as much information as I can. There seem to be two issues on Puma's side:

  1. The HTTP_VERSION header parsing seems to somehow get doubled or reach an invalid state despite what I believe to be a valid request.
  2. Puma returns a Connection: Keep-Alive response in HTTP/1.0 without a Content-Length, which I don't believe is a valid output.

I can work around this issue by not passing Connection: Keep-Alive to the server, but it seems like there's an underlying issue in Puma that should probably be addressed.

System configuration

Ruby version: 2.3.5
Rails version: 4.2.10
Puma version: 3.11.4

@aardvarkk
Copy link
Author

This may be related to the fact that the request has both a Version header and the version also being passed as part of the first line of the HTTP request. Perhaps they both get parsed (and combined), even though I don't think Version is a real HTTP header?

@aardvarkk
Copy link
Author

I have confirmed that this issue is avoided if the request does not include a Version header. That being said, this page states that the Version header is provisionally valid, but is supposed to indicate a version of an "evolving object", not the HTTP version. It appears to be processed as if it's a second specifier for the HTTP version, which would be incorrect.

@HoneyryderChuck
Copy link

HoneyryderChuck commented Apr 19, 2018

Interesting bug. HTTP_VERSION shouldn't be used to keep the http version header (not required by spec, at least). However puma, unicorn and thin (at least) set it. And I bet that some middlewares rely on it.

Good luck 👍

@dannyfallon
Copy link
Contributor

dannyfallon commented Apr 19, 2018

Unfortunately I don't know how to dig deeper to figure out where the "doubled" version is coming from

It comes from here, set via this and lives in Ragel here and here

As part of the HTTP spec every client-provided header gets transformed to uppercase, has - replaces with _ and gets the prefix of HTTP_ added. In the case of a Version header that'll transform it to HTTP_VERSION.

The problem is we're already setting HTTP_VERSION to HTTP/1.1 in Puma (via the C/Java extensions) we fall into the RFC's behaviour for when you set the same header more than once: We add the client header value to create HTTP/1.1, HTTP/1.1. You could send a request with Version:Blabbedy and get HTTP_VERSION: HTTP/1.1, Blabbedy 😬

Though it's not in the Rack spec, this is also a problem in Rack itself (rack/rack#970). We can fix Puma and we can open PRs on Thin/Unicorn too but I'd love some thoughts from the maintainers of Puma on it before I do

@guizmaii
Copy link

Hi everyone,

Is there any news on that subject ? Can we help ? :)

@headius
Copy link
Contributor

headius commented May 2, 2019

Adding some research here from my exploration...

There may be some interaction with chunked encodings, as was the case for this issue with node.js: nodejs/node-v0.x-archive#940

I did try playing with the chunked header but it did not appear to help ab -k numbers at least on JRuby + Rails set up.

As another really weird data point, @noahgibbs "RSB" benchmarks seem to work just fine with ab -k in the rack benchmark app:

Keep-Alive requests:    1000000
Total transferred:      100000000 bytes
HTML transferred:       12000000 bytes
Requests per second:    48072.16 [#/sec] (mean)

Throwing it at a generated Rails app has the hanging behavior. I'm not sure what's different between these.

@headius
Copy link
Contributor

headius commented May 2, 2019

And for completeness here's ab -k against a simple generated Rails app:

[] ~/projects/rsb/rack_test_app $ ab -c 20 -n 1000 http://localhost:3000/posts/1
...
Complete requests:      1000
Failed requests:        0
Total transferred:      1554596 bytes
HTML transferred:       840000 bytes
Requests per second:    597.56 [#/sec] (mean)
Time per request:       33.470 [ms] (mean)
Time per request:       1.673 [ms] (mean, across all concurrent requests)
Transfer rate:          907.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5   4.2      5      20
Processing:    10   27   7.0     26      50
Waiting:        7   24   7.5     24      49
Total:         11   32   5.9     32      52

Percentage of the requests served within a certain time (ms)
  50%     32
  66%     34
  75%     35
  80%     37
  90%     40
  95%     44
  98%     46
  99%     48
 100%     52 (longest request)

[] ~/projects/rsb/rack_test_app $ ab -c 20 -n 1000 -k http://localhost:3000/posts/1
...
Complete requests:      204
Failed requests:        164
   (Connect: 0, Receive: 0, Length: 164, Exceptions: 0)
Non-2xx responses:      40
Keep-Alive requests:    184
Total transferred:      353464 bytes
HTML transferred:       207162 bytes
Requests per second:    8.20 [#/sec] (mean)
Time per request:       2438.527 [ms] (mean)
Time per request:       121.926 [ms] (mean, across all concurrent requests)
Transfer rate:          13.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0       4
Processing:     6 2000 6021.9     19   20230
Waiting:        0   17  10.9     16      59
Total:          6 2000 6021.8     20   20230

Percentage of the requests served within a certain time (ms)
  50%     20
  66%     23
  75%     27
  80%     32
  90%     61
  95%  20221
  98%  20224
  99%  20228
 100%  20230 (longest request)

@noahgibbs
Copy link
Contributor

I have a (maybe?) JRuby-specific variant on this problem to report. When I run a very simple Rack server and use the wrk testing tool (https://github.com/wg/wrk), I get 13335 reqs/sec with CRuby 2.6, but 246.65 reqs/sec with JRuby 9.2.5.0.

One way to repro this bug is to start from public AMI ami-088f9a7191b2befa7. After creating and instance based on it, log in as the "ubuntu" user. You'll have RVM and Ruby available. There's a trivial Rack "Hello, world" app in ~ubuntu/config.ru. From there:

rvm use jruby-9.2.50
puma

# on a different login, or after backgrounding Puma
wrk -d 5 http://localhost:9292

If you substitute "rvm use 2.6.0" for "rvm use jruby-9.2.5.0" you should get around 13,000 reqs/sec. With JRuby it's reliably between 247 reqs/sec and 250 reqs/sec.

The AMI isn't doing anything terribly complicated. It builds its own "wrk" binary because wrk isn't reliably packaged. But it's using normal Ubuntu and RVM.

@ioquatix
Copy link
Contributor

Just FYI, in Rack 2.1 we use SERVER_PROTOCOL instead of HTTP_VERSION which might affect this issue.

@nateberkopec
Copy link
Member

Turned out this was: GHSA-7xx3-m584-x994

@ioquatix
Copy link
Contributor

I found this issue when debugging puma's performance curve https://github.com/socketry/falcon-benchmark - I live streamed the investigation here https://www.youtube.com/watch?v=2u6JRvKh7Dg - at the time I didn't really realise it was a serious issue since. So, sorry about that.

Basically, when you look at puma's "latency" in response to changes in concurrency, any server with a fixed pool size should have a linear increase in latency as the number of simultaneous requests increases. If you have 8 threads, and you make 8 requests that take 10ms each, the same configuration with 16 requests should mean the first 8 requests take 10ms each, and the 2nd 8 requests must wait 10ms.

If wrk made 16 connections to puma with 8 threads, puma would accept all 16 connections, but only service 8 of them, and the other 8 connections would just block indefinitely. At the end of the test, you had:

  • 8 connections which had serviced duration/latency connections, So if you ran the test for 1 second, and the latency was 100ms per request, you'd end up with 10 requests per thread = 80 requests in total.
  • 8 connections which stalled, and did not service even one request.

The second batch of connections, the request was sent, but no response was received, Unfortunately wrk drops them from the statistics calculations - wrk only considered complete requests. So, it only reports 100ms of latency, despite half of the connections having infinite latency.

When you look at this on a graph (which you can see I was making on falcon-benchmark), where you'd normally expect latency to increase as the number of concurrent connections to increase, it stayed the same for puma which was super confusing to me. If you throw 8 connections at Puma and it has 8 threads, you expect the latency to be equal to the latency of the request itself. But if you throw more connections at puma, you will encounter queuing latency too, and that wasn't showing up. I made a fork of wrk to reveal these issues: https://github.com/ioquatix/wrk

@nateberkopec
Copy link
Member

at the time I didn't really realise it was a serious issue

Neither did we! So don't worry about it. Thanks for the writeup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants