Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response refactor, increase requests-per-second, body type handling #2896

Merged
merged 14 commits into from
Sep 13, 2022

Conversation

MSP-Greg
Copy link
Member

@MSP-Greg MSP-Greg commented Jun 14, 2022

Description

This PR refactors response generation, speeding up operations with various body types and sizes. It also adds code to better handle the various body types returned by Rails and/or Rack. Much of this has been worked on by @guilleiguaran, so thanks for his work on it (see #2892). Essentially, body types fall into five categories:

  1. An Array - Puma will determine the content-length if not provided.
  2. An Enum - this would be chunked by Puma or sent as is if chunked encoding has already been applied.
  3. A File object - dependent on its size, it will either be sent with the headers, or sent after the headers using IO.copy_stream.
  4. An object that responds to to_path - Puma will attempt to open the file, and if successful, will send as above.
  5. An object that responds to call.

This PR has been around for a while, and I'm sure I've run millions of requests thru it. Two items:

  1. Duplicating client errors is difficult. I believe I've got the error handling correct, but we may see some issues with it. Not.sure.
  2. Previously, Puma didn't ever call to_path on a response body. This PR has been updated to do so when the body responds to it, but isn't a File object. It opens the file (based on the name in to_path), and treats it as a File/IO object. This may cause issues with the Rack::Sendfile middleware.

Below summarizes performance using the code in benchmarks/local, all using:
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6 -Y
wrk -t8 -c16 -d10s
Server cluster mode -w2 -t5:5, bind: tcp6
ruby 3.2.0dev (2022-09-12T13:13:32Z master 2aa8edaec7) +YJIT [x86_64-linux]
Both include 'test/rackup/ci_*.ru files - cache response bodies' (#2952)
Windows WSL2/Ubuntu 22.04


Body    ────────── req/sec ──────────   ─────── req 50% times ─────── Master
 KB     array   chunk  string      io   array   chunk  string      io
1       13780    9719   13581   10964    0.65    1.22    0.68    0.90
10       5920    2769   13203   10831    1.78    4.04    0.80    0.95
100      1063     404   11002    9153   10.54   26.85    0.98    1.21
256       448     159   11264    9039   24.39   65.65    0.92    1.23
512       234      85   10757    8848   45.18  121.76    0.96    1.22
1024      129      43    8822    8543   79.86  237.58    0.77    1.21
2048       68      22    6342    8405  152.13  456.80    1.54    1.21
─────────────────────────────────────────────────────────────────────

Body    ────────── req/sec ──────────   ─────── req 50% times ─────── PR #2896
 KB     array   chunk  string      io   array   chunk  string      io
1       15889   15885   15594   10898   0.200   0.197   0.195   0.960
10      15419   15204   15605   11073   0.297   0.285   0.225   0.728
100     10596    9660   11324    9396   1.020   1.120   0.960   1.180
256      9198    7501   11960    9286   1.120   1.360   0.860   1.190
512      7112    5512   11012    9017   1.430   1.870   0.930   1.190
1024     4355    3315    9302    8762   1.740   2.760   0.709   1.170
2048     2801    1938    6595    8562   3.460   5.240   1.470   1.170
─────────────────────────────────────────────────────────────────────

There are significant speed increases for almost all body types/sizes, with the exception of 'single string' and 'io/file' bodies, which show smaller increases.

The commits:

All three of the below are small refactors

Most of the remaining commits involve code to handle various body types, along with tests.

Please feel free to test these changes.

Closes #2892, closes #2703, closes #2457

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Jun 14, 2022

Below is data for ssl connections. Again, significant speed increases in most of the body types/sizes.

──────────────────────────────────────────────────────────────────────────── ssl
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y

Body    ────────── req/sec ──────────   ─────── req 50% times ─────── Master
 KB     array   chunk  string      io   array   chunk  string      io
1       11090    7303   10931    7657    1.34    1.28    1.32    1.65
10       2267    2263    9755    3911    4.44    4.34    1.20    3.00
100       308     309    4151     673   38.56   38.93    2.96   16.38
256       125     123    2165     275   92.57   92.01    5.19   38.25
512        63      62    1191     145  179.65  178.39    9.07   75.12
1024       31      31     617      74  323.35  315.68   17.02  145.23
2048       15      15     320      38  632.93  646.78   32.60  264.60
─────────────────────────────────────────────────────────────────────

Body    ────────── req/sec ──────────   ─────── req 50% times ─────── PR #2896
 KB     array   chunk  string      io   array   chunk  string      io
1       13802   13492   13820    9576   0.839   0.850   0.838   1.220
10      13131   12708   13067    9077   0.890   0.910   0.890   1.270
100      3799    4312    4482    2772   2.590   2.400   2.320   3.810
256      2177    2029    2231    1460   4.690   4.970   4.560   7.080
512      1175    1110    1196     798   8.400   9.030   8.250  12.800
1024      611     576     637     416  16.040  17.220  15.680  24.320
2048      305     290     325     215  32.290  34.010  30.720  46.800
─────────────────────────────────────────────────────────────────────
ruby 3.2.0dev (2022-06-14T01:21:55Z master 048f14221c) +YJIT [x86_64-linux]

Using Windows WSL2/Ubuntu 22.04 with OpenSSL 3.0.2

@MSP-Greg MSP-Greg changed the title Response refactor Response refactor, increase requests-per-second Jun 15, 2022
@MSP-Greg
Copy link
Member Author

All of the above was done with Windows WSL2/Ubuntu 22.04. But, that's not native, essentially running in a VM.

So, I ran the benchmarks on GitHub Actions, which are 2 core VM's running on larger systems. They list several processors, I looked up two, one had 12 cores, another 32. I ran the test a few times to account for 'noisy neighbors'. Data below. Again, almost all response body type/size combinations run faster than current master...

wrk -t5 -c10 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -T5 -c2 -Y
Server cluster mode -w2 -t5:5, bind: tcp
ruby 3.2.0dev (2022-06-16T16:48:52Z master cd5cafa4a3) +YJIT [x86_64-linux]

═════════════════════════════════════════════════════════════════════ Master
Body    ────────── req/sec ──────────   ─────── req 50% times ───────
 KB     array   chunk  string      io   array   chunk  string      io
1       15190   13626   14527   12669   0.233   0.614   0.551   0.257
10      11129   10219   13308   10879   0.228   0.846   0.434   0.337
100      3454    3310    7736    4412   2.700   2.960   1.190   1.990
256      1595    1493    4513    2252   6.080   6.470   1.920   3.830
512       862     803    2626    1258  11.340  10.970   3.430   7.810
1024      439     419    1517     668  24.280  23.980   6.110  14.820
2048      221     213     733     345  44.230  47.190  11.850  26.920
─────────────────────────────────────────────────────────────────────

═════════════════════════════════════════════════════════════════════ PR #2896
Body    ────────── req/sec ──────────   ─────── req 50% times ───────
 KB     array   chunk  string      io   array   chunk  string      io
1       16005   15106   16250   13138   0.570   0.601   0.568   0.649
10      15634   14384   15077   12620   0.550   0.636   0.547   0.737
100      8499    7470    8817    7038   0.980   1.140   1.040   0.626
256      5215    4131    5281    6813   1.800   2.280   1.810   1.030
512      3133    2464    3155    5339   2.710   3.810   2.880   1.710
1024     1730    1336    1838    3419   5.160   6.540   5.120   2.800
2048      884     688     954    1942   9.760  11.880   9.760   4.860
─────────────────────────────────────────────────────────────────────
wrk latency data, grouped by response body size
═════════════════════════════════════════════════════════════════════    1kB
Type   req/sec    50%     75%     90%     99%    100%  Resp Size
───────────────────────────────────────────────────────────────── Master
array   15190    0.23    4.11    9.11   15.18   44.35      1022
chunk   13626    0.61    1.56    4.80   12.35   24.82      1040
string  14527    0.55    2.05    5.62   14.66   38.41      1022
io      12669    0.26    4.27    9.08   15.32   37.56      1022
───────────────────────────────────────────────────────────────── PR #2896
array   16005    0.57    0.65    0.96    3.27   40.51      1022
chunk   15106    0.60    0.72    1.00    2.71   41.34      1040
string  16250    0.57    0.64    0.87    2.37   18.10      1022
io      13138    0.65    0.88    1.59    4.47   27.82      1022

═════════════════════════════════════════════════════════════════════ 10kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 11129 0.23 4.44 9.49 15.43 75.46 10318
chunk 10219 0.85 2.00 4.71 12.26 33.53 10318
string 13308 0.43 3.33 8.82 16.75 35.13 10238
io 10879 0.34 4.39 9.84 18.99 34.19 10238
───────────────────────────────────────────────────────────────── PR #2896
array 15634 0.55 0.71 1.48 3.84 26.45 10239
chunk 14384 0.64 0.73 1.03 2.70 42.92 10318
string 15077 0.55 0.74 1.47 3.83 19.55 10238
io 12620 0.74 0.85 1.14 2.76 28.74 10239

═════════════════════════════════════════════════════════════════════ 100kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 3454 2.70 3.27 3.88 6.41 38.69 103104
chunk 3310 2.96 3.14 3.57 5.69 32.04 103109
string 7736 1.19 1.39 1.79 4.99 42.35 102398
io 4412 1.99 2.50 3.91 10.31 31.69 102398
───────────────────────────────────────────────────────────────── PR #2896
array 8499 0.98 1.48 2.44 7.07 21.90 102398
chunk 7470 1.14 1.53 2.49 6.84 33.93 103107
string 8817 1.04 1.19 1.56 3.99 43.69 102446
io 7038 0.63 17.34 33.80 44.19 63.60 102398

═════════════════════════════════════════════════════════════════════ 256kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 1595 6.08 6.49 7.41 12.81 49.06 263945
chunk 1493 6.47 7.03 8.10 15.22 74.17 263944
string 4513 1.92 2.43 3.95 11.27 35.72 262248
io 2252 3.83 5.49 9.87 24.42 60.74 262139
───────────────────────────────────────────────────────────────── PR #2896
array 5215 1.80 2.01 2.46 5.33 58.87 262138
chunk 4131 2.28 2.58 3.03 5.59 25.88 263972
string 5281 1.81 2.03 2.40 5.08 21.45 262155
io 6813 1.03 4.07 24.08 42.82 57.61 262139

═════════════════════════════════════════════════════════════════════ 512kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 862 11.34 11.94 13.37 20.81 44.18 527877
chunk 803 10.97 13.85 20.43 76.32 114.16 527884
string 2626 3.43 4.19 6.00 15.99 40.05 524363
io 1258 7.81 10.53 14.66 25.06 64.31 524290
───────────────────────────────────────────────────────────────── PR #2896
array 3133 2.71 3.63 6.26 16.88 37.30 524244
chunk 2464 3.81 4.41 5.39 9.98 27.48 528044
string 3155 2.88 3.34 4.83 13.47 39.83 524324
io 5339 1.71 2.29 12.73 41.39 69.71 524428

═════════════════════════════════════════════════════════════════════ 1024kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 439 24.28 26.15 29.12 37.36 54.84 1055750
chunk 419 23.98 25.56 27.65 38.95 61.14 1055735
string 1517 6.11 7.29 9.77 15.31 39.23 1048590
io 668 14.82 17.03 19.49 29.18 52.12 1048582
───────────────────────────────────────────────────────────────── PR #2896
array 1730 5.16 6.89 9.78 35.59 62.04 1048535
chunk 1336 6.54 8.45 13.55 40.03 90.03 1056056
string 1838 5.12 5.82 7.62 13.64 39.44 1048488
io 3419 2.80 3.45 4.86 38.38 61.40 1048474

═════════════════════════════════════════════════════════════════════ 2048kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 221 44.23 47.96 52.12 120.97 347.25 2111458
chunk 213 47.19 51.63 55.62 64.85 95.46 2111497
string 733 11.85 17.93 27.84 64.10 147.48 2096385
io 345 26.92 31.42 43.47 240.98 314.45 2097178
───────────────────────────────────────────────────────────────── PR #2896
array 884 9.76 14.96 24.83 60.45 105.23 2097201
chunk 688 11.88 18.85 42.21 99.54 154.52 2110244
string 954 9.76 11.24 13.92 24.41 69.11 2096501
io 1942 4.86 6.20 8.41 36.39 64.73 2097527

lib/puma/io_buffer.rb Outdated Show resolved Hide resolved
@@ -12,6 +12,13 @@ module Puma
#
module Request

LEN_BUFFER = 1_024 * 256
STRM_LEN_MAX = 1_024 * 1_024 * 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "strm"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short for stream. Some of the code here calls the IOBuffer object stream, and some of the current code calls it lines.

Would it be better to call it io_buffer?

Also, much of the code uses io for the client socket. Should it be called socket?

And, thanks again, when I'm writing code, I often use short names. Need to stop doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your suggestions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the last commit, renamed as mentioned above. Also, I often needed to remind myself what res_info was. Changed it to resp_info. I see res, and I don't immediately think of response...

LEN_BUFFER = 1_024 * 256
STRM_LEN_MAX = 1_024 * 1_024 * 4

SKT_WRITE_ERR_MSG = "Socket timeout writing data"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like if we used "socket" in the code over "skt"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. See above comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dentarg - fixed

@dentarg
Copy link
Member

dentarg commented Jun 18, 2022

Just some comments from the phone, haven't looked through the actual changes. Not sure when I'll be able to do that.

class IOBuffer < String
def append(*args)
args.each { |a| concat(a) }
class IOBuffer < StringIO
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any benefit in using Ruby’s IO::Buffer if it’s available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the API, and it's a bit more 'low level' than StringIO. I switched to StringIO because it's more 'byte' oriented (or encoding agnostic), and it works reasonably well with both String & IO objects.

I haven't benched anything, but I normally use Ruby master locally, so I suppose...

@nateberkopec
Copy link
Member

@MSP-Greg I think is waiting on some variable renames but otherwise looks good to go?

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 4, 2022

@nateberkopec working on this re PR #2892.

Very busy with a (paid) project fixing Ruby code I wrote 10+ years ago. I was pretty new to Ruby then, mostly did C# and js back then...

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 4, 2022

Question: Should an empty response body return a Content-Length: 0 header in the response?

@guilleiguaran
Copy link
Contributor

That's a good question, HTTP allows it and some clients might want to know it (e.g for verification), but I'm not sure how common is having it for empty body responses.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 4, 2022

@guilleiguaran I updated this, adding checks for if the body is a File io without a Content-Length.

Thanks for all your help here and in PR #2892...

Also, please see eb5d7a35ce8b, that's the code similar to your PR. I start with app_body, and convert it to body. close is called on app_body at the end of the response handling.

From what I can tell, Content-Length: 0 is allowed as a value, and it's passed in the response. Some tests needed changes for it...

I'm not sure how common is having it for empty body responses.

Now it's being set for a body that's either [] or ['']

length = 0
if array_body = app_body.to_ary
body = array_body.map { |part| length += part.bytesize; part }
elsif app_body.is_a?(::File) && app_body.respond_to?(:size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should check if app_body responds to to_path instead of is_a?(::File) to conform with the Rack SPEC?

The SPEC mentions 'File-like' and to_path but doesn't mention the File class specifically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm looking at ActionDispatch::Response::FileBody, and seeing that it is not a File, but responds to to_path and each is patched to return 16k chunks. Since it's not a File, we can't get the size from it, so it has to be chunked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another commit to handle to_path. Reason for the File object check is that Puma can use copy_stream for File objects, which would typically be faster than calling any kind of each iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rack 3.0.0 released…

end
end
body.close unless body.closed?
elsif body.respond_to?(:to_ary) && body.length == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an exception here when the response coming from Rails is already chunked (Rails app sets Transfer-Encoding: chunked header so Puma doesn't try to calculate the Content-Length), in this case body reach this point being a RackBody instance that responds to to_ary but don't implements length:

2022-09-05 01:25:12 -0700 Read: #<NoMethodError: undefined method `length' for #<ActionDispatch::Response::RackBody:0x000000014ed57948>>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking/testing.

Fixed - changed body.respond_to?(:to_ary) to `body.is_a?(::Array)

Maybe the PR needs more tests with RackBody. I'll see about adding some...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 7, 2022

Rebased for Rack 3, added commit test_rack_server.rb - add tests for Rack::Chunked middleware

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 12, 2022

@agbaber

Thank you for the results, sorry they take so long. For the most part, your data tracks with mine, in that many of the speed increases are significant.

I'm wondering if something is wrong with the code, as the Ruby 2.7 string data is much different that Ruby 3.1. I've used Ruby master for most of my local testing.

Do you recall if latency summary section output before the main summary shown above listed the string body sizes as similar to the other sized bodies? Example from my output:

─────────────────────────────────────────────────────────────────────────────────── 2048kB
array      39   266.0   286.6   317.5  1510.0  1940.0   2097155
chunk      16   633.1   675.3   715.9  1370.0  1460.0   2111471
string   2627     3.9    15.1    33.2    51.2    77.8   2097409
io       7682     1.4     6.8    15.4    23.8    65.2   2097244

JFYI, the body size number isn't terribly accurate due to the limited digits in wrk's 'bytes read' output.

EDIT: I'm going to test on Ruby 2.7 locally.

@MSP-Greg
Copy link
Member Author

@agbaber Thanks again. I found the bug, Ruby 3 doesn't raise a "can't modify a frozen string" error, but Ruby 2.7 does. That was the cause of the odd string data in Ruby 2.7. Fixed in PR #2951. Rebased this, so it has the change also...

@JuanitoFatas
Copy link
Contributor

Late to the party, but I tried this on Ruby 3.2.0 Preview 2. Following are the results.

My laptop information

ProductName:	macOS
ProductVersion:	12.4
BuildVersion:	21F79
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21] without YJIT

master - 0e832e7

$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y

Body    ────────── req/sec ──────────   ─────── req 50% times ───────
 KB     array   chunk  string      io   array   chunk  string      io
1       20293   17969   22868   18407    0.36    0.60    0.53    0.57
10      11181    4149   19329   16614    0.74    2.19    0.42    0.65
100      1036     236    8728    7782    8.91   69.89    1.03    1.37
256       905     272    4574    4199    7.71   21.60    1.89    2.37
512       431     149    2729    2326   15.20   41.19    3.55    4.01
1024      172      30    1428    1261   35.52  103.26    6.54    7.58
2048       47      16     759     632  155.84  646.02   12.67   15.15
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch master
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]
View Full Result
$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y

bundle exec bin/puma -q -b ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none -S tmp/bench_test_puma.state -w2 -t5:5 --control-url=tcp://127.0.0.1:40010 --control-token=test test/rackup/ci_select.ru

usage: sleep seconds
ruby: warning: Ruby was built without YJIT support
ruby: warning: Ruby was built without YJIT support
[14328] Puma starting in cluster mode...
[14328] * Puma version: 5.6.5 (ruby 3.2.0-p-1) ("Birdie's Version")
[14328] *  Min threads: 5
[14328] *  Max threads: 5
[14328] *  Environment: development
[14328] *   Master PID: 14328
[14328] *      Workers: 2
[14328] *     Restarts: (✔) hot (✔) phased
[14328] * Listening on ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none
[14328] Use Ctrl-C to stop
[14328] * Starting control server on http://127.0.0.1:40010
[14328] - Worker 1 (PID: 14333) booted in 0.0s, phase: 0
[14328] - Worker 0 (PID: 14332) booted in 0.01s, phase: 0

warm-up
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: a10'   https://[::1]:40001 |  13918 requests in  1.10s, 159.46MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: c10'   https://[::1]:40001 |   6163 requests in  1.10s,  71.10MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: s10'   https://[::1]:40001 |  22974 requests in  1.10s, 263.20MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: i10'   https://[::1]:40001 |  18741 requests in  1.10s, 214.71MB read

wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1'    https://[::1]:40001 | 204979 requests in 10.10s, 546.71MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1'    https://[::1]:40001 | 181543 requests in 10.10s, 487.25MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1'    https://[::1]:40001 | 230961 requests in 10.10s, 615.87MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1'    https://[::1]:40001 | 185924 requests in 10.10s, 495.81MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a10'   https://[::1]:40001 | 112923 requests in 10.10s,   1.26GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c10'   https://[::1]:40001 |  41905 requests in 10.10s, 483.31MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s10'   https://[::1]:40001 | 195247 requests in 10.10s,   2.18GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i10'   https://[::1]:40001 | 167796 requests in 10.10s,   1.88GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a100'  https://[::1]:40001 |  10461 requests in 10.10s,   1.02GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c100'  https://[::1]:40001 |   2371 requests in 10.03s, 237.69MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s100'  https://[::1]:40001 |  88165 requests in 10.10s,   8.55GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i100'  https://[::1]:40001 |  78638 requests in 10.10s,   7.63GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a256'  https://[::1]:40001 |   9139 requests in 10.10s,   2.25GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c256'  https://[::1]:40001 |   2743 requests in 10.10s, 696.48MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s256'  https://[::1]:40001 |  46207 requests in 10.10s,  11.36GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i256'  https://[::1]:40001 |  42425 requests in 10.10s,  10.43GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a512'  https://[::1]:40001 |   4351 requests in 10.10s,   2.13GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c512'  https://[::1]:40001 |   1509 requests in 10.10s, 764.44MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s512'  https://[::1]:40001 |  27568 requests in 10.10s,  13.51GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i512'  https://[::1]:40001 |  23491 requests in 10.10s,  11.51GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1024' https://[::1]:40001 |   1733 requests in 10.10s,   1.70GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1024' https://[::1]:40001 |    298 requests in 10.10s, 304.86MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1024' https://[::1]:40001 |  14428 requests in 10.10s,  14.12GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1024' https://[::1]:40001 |  12739 requests in 10.10s,  12.47GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a2048' https://[::1]:40001 |    473 requests in 10.04s,   0.94GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c2048' https://[::1]:40001 |    158 requests in 10.10s, 325.42MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s2048' https://[::1]:40001 |   7668 requests in 10.10s,  15.00GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i2048' https://[::1]:40001 |   6388 requests in 10.10s,  12.50GB read

Type   req/sec    50%     75%     90%     99%    100%  Resp Size  Errors
───────────────────────────────────────────────────────────────────────────────────    1kB
array   20293     0.4     0.5     0.8     3.4    11.0      1023
chunk   17969     0.6     1.1     2.8     5.4    11.6      1040
string  22868     0.5     1.0     2.4     4.3    13.2      1022
io      18407     0.6     0.9     3.2     6.0    15.2      1022
───────────────────────────────────────────────────────────────────────────────────   10kB
array   11181     0.7     1.0     1.3     4.4    15.8     10239
chunk    4149     2.2     7.1    19.5    56.0    96.4     10319
string  19329     0.4     0.6     1.2     4.3    12.1     10238
io      16614     0.7     1.1     3.2     5.9    15.6     10239
───────────────────────────────────────────────────────────────────────────────────  100kB
array    1036     8.9    23.5    72.0   214.2   313.9    102444
chunk     236    69.9   100.9   275.5   496.6   638.2    103344
string   8728     1.0     1.4     2.5     9.4    21.4    102400
io       7782     1.4     2.2     8.0    14.6    28.2    102400
───────────────────────────────────────────────────────────────────────────────────  256kB
array     905     7.7    15.5    25.4    51.5   286.1    262277
chunk     272    21.6    64.8   103.7   337.2  1050.0    264470
string   4574     1.9     2.4     2.9     7.2    28.5    262109
io       4199     2.4     3.1     9.0    20.6    32.4    262148
───────────────────────────────────────────────────────────────────────────────────  512kB
array     431    15.2    37.5    59.6   155.7   473.8    524888
chunk     149    41.2   110.0   174.3   365.3  1010.0    529420
string   2729     3.6     4.3     7.1    36.7    47.4    524522
io       2326     4.0     4.7     5.4    12.3    54.6    524349
─────────────────────────────────────────────────────────────────────────────────── 1024kB
array     172    35.5   100.7   140.7   869.0  1710.0   1051747
chunk      30   103.3   708.9   799.9   913.7  1470.0   1070940
string   1428     6.5     7.9     9.6    26.2    67.5   1049204
io       1261     7.6     8.5     9.5    18.6    73.1   1048938
─────────────────────────────────────────────────────────────────────────────────── 2048kB
array      47   155.8   425.9   523.4  1080.0  1580.0   2121956
chunk      16   646.0   806.9   888.2  1110.0  1160.0   2157877
string    759    12.7    14.6    16.6    31.6   140.6   2098010
io        632    15.2    17.0    20.4    36.0   173.2   2099701

Body    ────────── req/sec ──────────   ─────── req 50% times ───────
 KB     array   chunk  string      io   array   chunk  string      io
1       20293   17969   22868   18407    0.36    0.60    0.53    0.57
10      11181    4149   19329   16614    0.74    2.19    0.42    0.65
100      1036     236    8728    7782    8.91   69.89    1.03    1.37
256       905     272    4574    4199    7.71   21.60    1.89    2.37
512       431     149    2729    2326   15.20   41.19    3.55    4.01
1024      172      30    1428    1261   35.52  103.26    6.54    7.58
2048       47      16     759     632  155.84  646.02   12.67   15.15
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch master
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]

[14328] - Gracefully shutting down workers...
[14328] === puma shutdown: 2022-09-13 20:58:46 +0900 ===
[14328] - Goodbye!

 5:14 Total Time

This PR - c32f8ff

$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y

Body    ────────── req/sec ──────────   ─────── req 50% times ───────
 KB     array   chunk  string      io   array   chunk  string      io
1       23522   24581   25445   21879   0.405   0.424   0.455   0.537
10      22458   21593   22811   19156   0.472   0.430   0.487   0.568
100      7772    7431    8768    6853   1.100   1.330   1.230   1.210
256      4089    3646    4731    4037   2.150   2.410   2.050   2.210
512      2303    2021    2474    2244   3.960   4.560   3.730   4.100
1024     1170    1075    1442    1262   7.930   8.840   6.510   7.570
2048      565     538     755     656  16.640  17.940  12.770  14.790
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch 00-response-refactor
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]
View Full Result
$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y

bundle exec bin/puma -q -b ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none -S tmp/bench_test_puma.state -w2 -t5:5 --control-url=tcp://127.0.0.1:40010 --control-token=test test/rackup/ci_select.ru

usage: sleep seconds
ruby: warning: Ruby was built without YJIT support
ruby: warning: Ruby was built without YJIT support
[13614] Puma starting in cluster mode...
[13614] * Puma version: 5.6.5 (ruby 3.2.0-p-1) ("Birdie's Version")
[13614] *  Min threads: 5
[13614] *  Max threads: 5
[13614] *  Environment: development
[13614] *   Master PID: 13614
[13614] *      Workers: 2
[13614] *     Restarts: (✔) hot (✔) phased
[13614] * Listening on ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none
[13614] Use Ctrl-C to stop
[13614] * Starting control server on http://127.0.0.1:40010
[13614] - Worker 0 (PID: 13628) booted in 0.01s, phase: 0
[13614] - Worker 1 (PID: 13629) booted in 0.01s, phase: 0

warm-up
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: a10'   https://[::1]:40001 |  26562 requests in  1.10s, 304.31MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: c10'   https://[::1]:40001 |  26710 requests in  1.10s, 308.04MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: s10'   https://[::1]:40001 |  26873 requests in  1.10s, 307.87MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: i10'   https://[::1]:40001 |  21684 requests in  1.10s, 248.42MB read

wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1'    https://[::1]:40001 | 237588 requests in 10.10s, 633.59MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1'    https://[::1]:40001 | 248277 requests in 10.10s, 666.33MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1'    https://[::1]:40001 | 257024 requests in 10.10s, 685.37MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1'    https://[::1]:40001 | 220993 requests in 10.10s, 589.28MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a10'   https://[::1]:40001 | 226841 requests in 10.10s,   2.54GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c10'   https://[::1]:40001 | 218111 requests in 10.10s,   2.46GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s10'   https://[::1]:40001 | 230415 requests in 10.10s,   2.58GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i10'   https://[::1]:40001 | 193550 requests in 10.10s,   2.17GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a100'  https://[::1]:40001 |  78517 requests in 10.10s,   7.62GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c100'  https://[::1]:40001 |  75054 requests in 10.10s,   7.33GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s100'  https://[::1]:40001 |  88591 requests in 10.10s,   8.60GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i100'  https://[::1]:40001 |  69228 requests in 10.10s,   6.72GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a256'  https://[::1]:40001 |  41302 requests in 10.10s,  10.15GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c256'  https://[::1]:40001 |  36824 requests in 10.10s,   9.11GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s256'  https://[::1]:40001 |  47792 requests in 10.10s,  11.75GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i256'  https://[::1]:40001 |  40768 requests in 10.10s,  10.02GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a512'  https://[::1]:40001 |  23257 requests in 10.10s,  11.40GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c512'  https://[::1]:40001 |  20412 requests in 10.10s,  10.07GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s512'  https://[::1]:40001 |  24992 requests in 10.10s,  12.25GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i512'  https://[::1]:40001 |  22664 requests in 10.10s,  11.11GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1024' https://[::1]:40001 |  11815 requests in 10.10s,  11.56GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1024' https://[::1]:40001 |  10864 requests in 10.10s,  10.70GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1024' https://[::1]:40001 |  14565 requests in 10.10s,  14.25GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1024' https://[::1]:40001 |  12749 requests in 10.10s,  12.47GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a2048' https://[::1]:40001 |   5656 requests in 10.01s,  11.06GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c2048' https://[::1]:40001 |   5433 requests in 10.10s,  10.70GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s2048' https://[::1]:40001 |   7628 requests in 10.10s,  14.92GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i2048' https://[::1]:40001 |   6625 requests in 10.10s,  12.96GB read

Type   req/sec    50%     75%     90%     99%    100%  Resp Size
─────────────────────────────────────────────────────────────────    1kB
array   23522    0.40    0.63    1.66    3.50   10.67      1022
chunk   24581    0.42    0.66    2.27    4.34   11.19      1040
string  25445    0.46    0.76    2.06    3.75   13.53      1022
io      21879    0.54    0.95    3.15    5.15   13.91      1022
─────────────────────────────────────────────────────────────────   10kB
array   22458    0.47    0.74    2.02    3.77   11.16     10238
chunk   21593    0.43    0.65    1.69    4.07   16.12     10318
string  22811    0.49    0.79    2.20    4.11   11.73     10238
io      19156    0.57    0.96    2.70    5.07   16.60     10238
─────────────────────────────────────────────────────────────────  100kB
array    7772    1.10    1.50    2.18   11.60   28.77    102403
chunk    7431    1.33    1.89    5.60   12.58   22.79    103110
string   8768    1.23    2.04    5.79   10.36   18.90    102445
io       6853    1.21    1.58    2.02    6.84   26.84    102406
─────────────────────────────────────────────────────────────────  256kB
array    4089    2.15    2.67    3.22    6.04   32.39    262218
chunk    3646    2.41    3.00    3.61    7.66   30.20    263806
string   4731    2.05    2.65    6.64   21.66   34.47    262144
io       4037    2.21    2.69    3.22    7.24   30.38    262079
─────────────────────────────────────────────────────────────────  512kB
array    2303    3.96    4.75    5.62   11.08   57.11    524597
chunk    2021    4.56    5.54    6.55   11.35   57.56    528084
string   2474    3.73    4.57    5.40    9.28   42.29    524419
io       2244    4.10    4.88    5.85   10.87   61.56    524574
───────────────────────────────────────────────────────────────── 1024kB
array    1170    7.93    9.37   11.32   21.85   95.88   1048425
chunk    1075    8.84   10.37   12.14   22.93   95.74   1055826
string   1442    6.51    7.65    8.91   21.50   60.49   1048690
io       1262    7.57    8.53    9.59   24.34   94.99   1048879
───────────────────────────────────────────────────────────────── 2048kB
array     565   16.64   20.12   24.35   38.16  176.99   2098685
chunk     538   17.94   21.14   24.58   35.24  184.88   2113056
string    755   12.77   14.66   16.75   32.02  155.45   2098783
io        656   14.79   16.34   18.30   37.37  188.06   2098245

Body    ────────── req/sec ──────────   ─────── req 50% times ───────
 KB     array   chunk  string      io   array   chunk  string      io
1       23522   24581   25445   21879   0.405   0.424   0.455   0.537
10      22458   21593   22811   19156   0.472   0.430   0.487   0.568
100      7772    7431    8768    6853   1.100   1.330   1.230   1.210
256      4089    3646    4731    4037   2.150   2.410   2.050   2.210
512      2303    2021    2474    2244   3.960   4.560   3.730   4.100
1024     1170    1075    1442    1262   7.930   8.840   6.510   7.570
2048      565     538     755     656  16.640  17.940  12.770  14.790
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch 00-response-refactor
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]

[13614] - Gracefully shutting down workers...
[13614] === puma shutdown: 2022-09-13 20:50:47 +0900 ===
[13614] - Goodbye!

 5:14 Total Time

@MSP-Greg
Copy link
Member Author

@JuanitoFatas Thanks for the data.

@MSP-Greg MSP-Greg changed the title Response refactor, increase requests-per-second Response refactor, increase requests-per-second, body type handling Sep 13, 2022
@MSP-Greg
Copy link
Member Author

I updated the notes in the first post, and mentioned two possible issues, correct client (socket) error handling, and possible issues if Rack::Sendfile middleware is used.

Otherwise, I think it's time to own it and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Puma::IOBuffer can be more efficient and used in more places
7 participants