Puma::IOBuffer can be more efficient and used in more places #2457

WJWH · 2020-10-26T19:36:16Z

At the moment, Puma::IOBuffer is just a subclass of String, with the append method overwritten and reset aliased to clear. Strangely, the overwritten append behavior seems to implement the same behavior as a normal String, but that is not important for this issue.

The main use of IOBuffer is during request serving where it is used to accumulate lines for the response status line and headers. During the course of a request, several lines are appended to the empty string and all in the end they are all fed to fast_write in a single call. Because a String in Ruby is implemented as a dynamic array of bytes representing codepoints, these repeated additions will cause the underlying array to be doubled in size and copied whenever the String requires more space. Because the initial size is very low and all the headers together can easily be a few hundred bytes, this doubling occurs several times, incurring allocations and copying each time. It's still amortized O(1), but the constant is higher than it could be.

A simple way to prevent this would be to change IOBuffer class so that it maintains a total_size along with an array of Strings that have been added so far. At the end when to_s is called on it, it allocates a single String with the correct capacity straight away and copies the entire array of strings into that.

The early hints construction in str_early_hints could probably also benefit from being buffered this way. I can create a PR if desired?

The text was updated successfully, but these errors were encountered:

nateberkopec · 2020-10-26T19:38:14Z

Certainly! For inclusion, we'll be looking at changes in results in the benchmarks folder. You can also write your own benchmark in that folder for a "worst case scenario" that you believe will showcase the changes the best.

WJWH · 2020-10-26T23:04:41Z

I rewrote the IOBuffer class in several different ways but did not manage to get it faster than the current implementation. I suspect that the bookkeeping in Ruby land introduces more overhead than the String concatenation operations implemented in C, even if they are slightly more efficient algorithmically speaking.

With this in mind, I changed it back the original implementation but:

Preallocating a 4k string from the start.
Not iterating over appended strings with each but doing it in C with the concat function from String.
Don't clear the entire string (resetting the underlying size to zero) but allocate a new, preallocated String instead.

On the realistic_response benchmark I got results of [6081, 6251, 6181] req/s for my branch vs [6050, 6109, 6035] req/s for the code on master branch (running each version three times). This is technically 1.7% faster but the effect is quite small and the variability is quite high. It could probably be faster by dipping more into C but I quite like the readability and mostly-pure-Ruby-ness of the code as is so I'm not sure.

MSP-Greg · 2020-10-26T23:16:44Z

Anyone ever try the NIO::ByteBuffer class?

nateberkopec · 2020-10-27T01:33:16Z

Linking the original removal of the C extension here: #1980

WJWH · 2020-10-27T10:53:45Z

@MSP-Greg I haven't used the NIO class in particular but this kind of buffers/stringbuilders were what inspired the issue in the first place. As I mentioned above I could not get the Ruby implementation to keep up with the C appending functions for String.

@nateberkopec Interesting to see the discussion on the other topic. For some reason it got into my head that this was pretty old code but it's only been a year or so. In the discussions it's mentioned that we have to support all the way back to Ruby 2.2, is there a list somewhere of supported Rubies? I'm not advocating we ruthlessly drop support but I do want to point out that even 2.4 is no longer supported by the core Ruby team and supporting older versions implicitly encourages people to run potentially unsecure Ruby versions.

After a good nights sleep I had some more ideas and will do some extra tests later this week.

EDIT: After some more digging, it seems that TCP_NODELAY is NOT set by default for most apps, since even though add_tcp_listener and add_ssl_listener have a optimize_for_latency parameter that defaults to true, in this bit in binder it practically always gets set to false unless people explicitly add low_latency=true to their bind uris. This would probably mean that in the vast majority of users will have Nagles algorithm enabled since the option is only mentioned in a single line of the DSL docs. Since the use of IOBuffer is preventing too many short writes, it might be interesting to see if we can just fast_write every string as it comes in and let the OS handle the packet coalescing.

nateberkopec · 2020-10-27T13:53:35Z

is there a list somewhere of supported Rubies

The test suite is the spec - if it ain't in the test suite, we don't support it. We'll investigate the Ruby support issue again when we start thinking about Puma 6.0, but that is at least 1 month away.

MSP-Greg · 2020-10-27T14:35:41Z

is there a list somewhere of supported Rubies

In terms of the supported version:

puma/puma.gemspec

Line 30 in b127d4c

s.required_ruby_version = Gem::Requirement.new(">= 2.2")

As Nate mentioned, CI gives a good idea, as it may show platform info that can't be placed in the gemspec.

Sorry for mentioning NIO::ByteBuffer. At present, string operations aren't considered a bottleneck in request processing.

As to optimize_for_latency, CI has a matrix of three OS's, and there several Ruby platforms/versions. Regardless of the default setting, I think it's probably best to have a minimal number of writes. JFYI, CI creates client requests to test Puma, and we've found that syswrite does improve the stability, (as opposed to write or <<).

WJWH · 2020-11-03T13:18:24Z

I finally got around to benchmarking a few more variants:

Changing IOBuffer to just an Array of strings and then iobuf.each {|str| fast_write(io,str)} had only about 30% the req/s of the current implementation, presumably due to the vastly higher number of syscalls.
Changing IOBuffer to just an Array of strings and using IO.write and IO.flush so that it falls through to the writev() syscall resulted in about 10% extra requests per second although I did not account for nonblocking IO and rescue-ing EWOULDBLOCK and friends. If it reduces reliability it's probably not worth it.

That leaves the few % to be gained from optimizing how the underlying String for the IOBuffer is initialized. Since Ruby 2.2 is still supported that won't work either since setting the initial capacity was only added later. I'll just add it to my longterm todo list and come back in a few years 🙂

nateberkopec · 2020-11-03T15:04:20Z

I'll mark as "6.0" since we'll consider dropping 2.2 support at that time.

calvinxiao · 2021-04-11T13:09:11Z

~~I think append is just an alias of concat~~, oh it's mentioned in #1980

Benchmark shows the current append implementation is slower in args = ["Lorem", "ipsum", "dolor", "sit", "amet"]

Warming up --------------------------------------
              append   114.374k i/100ms
              concat   186.215k i/100ms
Calculating -------------------------------------
              append      1.151M (± 0.9%) i/s -      5.833M in   5.066979s
              concat      1.880M (± 0.9%) i/s -      9.497M in   5.051291s

Comparison:
              concat:  1880249.0 i/s
              append:  1151287.2 i/s - 1.63x  (± 0.00) slower

FooBarWidget · 2021-05-07T14:19:23Z

In Passenger we use writev() to perform a gathered write. This way we don't even have to allocate that final string.

Not sure whether Ruby core has support for writev nowadays. In Passenger we wrote our own support for it, through a native extension.

MSP-Greg · 2021-05-07T15:21:07Z

@FooBarWidget

Thanks for the post.

Not sure whether Ruby core has support for writev nowadays

IO.write has support for multiple arguments, but ssl sockets aren't true IO objects.

Given that, and also issues with encoding and bodies that need Puma to support 'chunking', I personally thought using a StringIO buffer was best, both for headers and the body. See PR #2595.

Again personally, I think a lot of code re IO and sockets was developed when memory and bandwidth were much lower than they are today.

@wjordan and others are looking at other options, like writev.

nateberkopec · 2022-09-04T05:36:50Z

@MSP-Greg I think this is closed by #2896?

MSP-Greg · 2022-09-04T12:32:19Z

Yes. I'll add to the comments

nateberkopec added the perf label Oct 26, 2020

nateberkopec added this to the 6.0.0 milestone Nov 3, 2020

FooBarWidget mentioned this issue May 7, 2021

TCP server socket not optimized for low latency by default, contradicting docs #2623

Closed

MSP-Greg mentioned this issue Sep 4, 2022

Response refactor, increase requests-per-second, body type handling #2896

Merged

7 tasks

MSP-Greg closed this as completed in #2896 Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puma::IOBuffer can be more efficient and used in more places #2457

Puma::IOBuffer can be more efficient and used in more places #2457

WJWH commented Oct 26, 2020

nateberkopec commented Oct 26, 2020

WJWH commented Oct 26, 2020 •

edited

MSP-Greg commented Oct 26, 2020

nateberkopec commented Oct 27, 2020

WJWH commented Oct 27, 2020 •

edited

nateberkopec commented Oct 27, 2020

MSP-Greg commented Oct 27, 2020

WJWH commented Nov 3, 2020

nateberkopec commented Nov 3, 2020

calvinxiao commented Apr 11, 2021 •

edited

FooBarWidget commented May 7, 2021

MSP-Greg commented May 7, 2021

nateberkopec commented Sep 4, 2022

MSP-Greg commented Sep 4, 2022

Puma::IOBuffer can be more efficient and used in more places #2457

Puma::IOBuffer can be more efficient and used in more places #2457

Comments

WJWH commented Oct 26, 2020

nateberkopec commented Oct 26, 2020

WJWH commented Oct 26, 2020 • edited

MSP-Greg commented Oct 26, 2020

nateberkopec commented Oct 27, 2020

WJWH commented Oct 27, 2020 • edited

nateberkopec commented Oct 27, 2020

MSP-Greg commented Oct 27, 2020

WJWH commented Nov 3, 2020

nateberkopec commented Nov 3, 2020

calvinxiao commented Apr 11, 2021 • edited

FooBarWidget commented May 7, 2021

MSP-Greg commented May 7, 2021

nateberkopec commented Sep 4, 2022

MSP-Greg commented Sep 4, 2022

WJWH commented Oct 26, 2020 •

edited

WJWH commented Oct 27, 2020 •

edited

calvinxiao commented Apr 11, 2021 •

edited