Client.bucket performance improvements #249

rafbgarcia · 2021-05-04T16:58:18Z

Hi,

I've made some performance tests against client.bucket, and it seems like channels with hundreds/thousands of users suffer from the array parsing, as you can see in this rubyprof against the master branch.

I tested a few alternatives:

Stored sorted user_ids and use binary search - improved by just a bit
Stored a hash version of user_ids - as you can imagine, this made it much worse
Stored user_ids as ",#{user_ids.join(","),}" and checked with user_ids.include?(",#{user_id},") - this made a big improvement
Using Oj instead of JSON makes it a bit better

Here are the rubyprofs for String.include? and Oj + String.include?

And below are IPS benchmarks.

This is the code I used for the tests.

Array.include?

➜  message_bus git:(master) ✗ docker-compose run tests rake performance
Creating message_bus_tests_run ... done
/usr/local/bin/ruby -e "ARGV.each{|f| load f}" spec/performance/allowed.rb spec/performance/publish.rb
Warming up --------------------------------------
             backlog    10.000  i/100ms
Calculating -------------------------------------
             backlog    102.247  (± 2.9%) i/s -    520.000  in   5.090800s

String.include?

➜  message_bus git:(master) ✗ docker-compose run tests rake performance
Creating message_bus_tests_run ... done
/usr/local/bin/ruby -e "ARGV.each{|f| load f}" spec/performance/allowed.rb spec/performance/publish.rb
Warming up --------------------------------------
             backlog    17.000  i/100ms
Calculating -------------------------------------
             backlog    183.294  (± 9.8%) i/s -    918.000  in   5.074522s

String.include? + Oj

➜  message_bus git:(master) ✗ docker-compose run tests rake performance
Creating message_bus_tests_run ... done
/usr/local/bin/ruby -e "ARGV.each{|f| load f}" spec/performance/allowed.rb spec/performance/publish.rb
Warming up --------------------------------------
             backlog    20.000  i/100ms
Calculating -------------------------------------
             backlog    202.522  (± 5.4%) i/s -      1.020k in   5.057622s

@SamSaffron, what do you think about these?

The text was updated successfully, but these errors were encountered:

SamSaffron · 2021-05-04T22:53:06Z

This is very interesting information:

Given: https://gist.github.com/df1a982f065b85c1861905c5a3171448

binary search numbers:  3064824.1 i/s
              string:   388081.3 i/s - 7.90x  (± 0.00) slower
             numbers:    47063.5 i/s - 65.12x  (± 0.00) slower

And that the issue here was clearly around this line:

message_bus/lib/message_bus/client.rb

Line 140 in 3955190

user_allowed = msg.user_ids.include?(self.user_id)

Are you 100% sure that simply sorting the list of users and groups upfront and using bsearch will not solve our issue?

rafbgarcia · 2021-05-04T23:04:43Z

I think the test is a bit misleading as it should include parsing back values from Redis, right?

https://gist.github.com/rafbgarcia/1073ad5397c463a38ba81e8fbe6bbe1b

Warming up --------------------------------------
binary search numbers
                       177.000  i/100ms
             numbers   169.000  i/100ms
              string     1.467k i/100ms
Calculating -------------------------------------
binary search numbers
                          1.785k (± 5.3%) i/s -      9.027k in   5.073454s
             numbers      1.711k (± 4.9%) i/s -      8.619k in   5.049765s
              string     14.827k (± 7.3%) i/s -     74.817k in   5.076025s

Comparison:
              string:    14826.6 i/s
binary search numbers:     1784.9 i/s - 8.31x  (± 0.00) slower
             numbers:     1711.2 i/s - 8.66x  (± 0.00) slower

SamSaffron · 2021-05-04T23:33:29Z

The way back from redis certainly should be counted, I see, so in PRD you are seeing very high numbers JSON parse?

The optimisation proposed is to limit/optimise json parsing vs cleaning up the 2.25 percent in include?

Instead of using JSON serialization for user_ids maybe we use a more efficient serialization format? I loath to add more dependencies but there may be something built in that is super fast we can use, maybe @jeremyevans has some ideas?

SamSaffron · 2021-05-04T23:57:45Z

So to summarize per:

https://gist.github.com/e38a3438aa4a27a3f95c5c7386e2cd40


Comparison:
      String hack Oj:    28291.9 i/s
         String hack:    20373.9 i/s - 1.39x  (± 0.00) slower
Numbers binary search Oj:     3736.4 i/s - 7.57x  (± 0.00) slower
  Numbers include Oj:     3492.2 i/s - 8.10x  (± 0.00) slower
Numbers binary search:     2264.1 i/s - 12.50x  (± 0.00) slower
     Numbers include:     2164.2 i/s - 13.07x  (± 0.00) slower

Sure, a PR to optionally allow for using of Oj vs JSON for serialization is fine (no extra dependency though, minimal indirection to be added)
I need to sleep on this, but I do not think we can beat the "string" hack, there is no simple way to deserialze without adding cost. A c extension though could do better, but certainly comes with caveats.

rafbgarcia · 2021-05-05T00:09:11Z

Yea I agree with you on both points. Your suggestion to explore alternative serialization options was great. I added binary serialization to the list.

https://gist.github.com/rafbgarcia/d043ee02c73b6e518361d60425d5734e

Comparison:
  Binary string hack:    28731.8 i/s
      String hack Oj:    16416.5 i/s - 1.75x  (± 0.00) slower
         String hack:     8848.8 i/s - 3.25x  (± 0.00) slower
Binary numbers binary search:     4192.2 i/s - 6.85x  (± 0.00) slower
  Numbers include Oj:     1763.8 i/s - 16.29x  (± 0.00) slower
Numbers binary search Oj:     1648.6 i/s - 17.43x  (± 0.00) slower
Numbers binary search:     1613.1 i/s - 17.81x  (± 0.00) slower
     Numbers include:     1519.2 i/s - 18.91x  (± 0.00) slower

ignisf · 2021-05-05T00:11:19Z

Check out https://ruby-doc.org/stdlib-3.0.1/libdoc/set/rdoc/Set.html as well

SamSaffron · 2021-05-05T00:16:27Z

@ignisf the issue is that marshalling back to a Set from any wire format, would kill all the gain it has on offer.

I can see gains even with the low count of 2 with the string hack / oj

https://gist.github.com/6394c76bb6462edc5043d7c594173c02

    String hack Oj:  3644428.5 i/s
  Numbers include Oj:  3424118.9 i/s - 1.06x  (± 0.00) slower
Numbers binary search Oj:  3176211.7 i/s - 1.15x  (± 0.00) slower
         String hack:  1818291.7 i/s - 2.00x  (± 0.00) slower
     Numbers include:  1573969.7 i/s - 2.32x  (± 0.00) slower
Numbers binary search:  1478193.7 i/s - 2.47x  (± 0.00) slower

Even at 1 it is not slower:

  Numbers include Oj:  4322382.0 i/s
      String hack Oj:  4315847.4 i/s - same-ish: difference falls within error
Numbers binary search Oj:  3847566.7 i/s - 1.12x  (± 0.00) slower
         String hack:  1817312.4 i/s - 2.38x  (± 0.00) slower
     Numbers include:  1606598.3 i/s - 2.69x  (± 0.00) slower
Numbers binary search:  1540899.1 i/s - 2.81x  (± 0.00) slower

SamSaffron · 2021-05-05T00:18:20Z

I am not sure Marshal.load is going to help unless we swap the entire envelope from JSON to binary format. We are still going to be stuck with on JSON.parse if we don't switch the envelope :(

Switching the envelope is going to have compatibility concerns possibly. (but maybe we allow for a full Marshal based envelope, optionally).... allow for a base "serializer/deserializer" class in message bus, have the JSON based one be default but allow for people to opt for a Marshal based one or OJ based one if they wish?

rafbgarcia · 2021-05-05T00:46:05Z

To be honest, I'm not sure how one would transition from JSON to binary.

Would that require clearing all backlogs? Or we'd have to create two deserializers: one for transitioning and another for the binary?

However, most of the performance improvement is by changing the data type. Even changing from JSON to Oj is not very important IMO.

Are you inclining more towards not accepting that change?

SamSaffron · 2021-05-05T01:22:09Z

A transition from number list to string hack is going to change the transport format, so we are going to be stuck either making a backwards non-compatible change to get that in or carry a "If String do this, if numbers do this, ugly thing in the code"

I think it is fine to offer multiple serializers including non-compatible ones, as long as they are opt in, that way you would swap it and dump your backlog to move to the new faster more efficient one.

I think we need to take a few days to think about this. There is no trivial perf hack here. I think Oj vs JSON though is quite big for the general case of small messages, I am seeing quite a bump there and this can be done in a backwards compat way.

rafbgarcia · 2021-05-05T12:05:51Z

What if we ship a minor version with the "If String do this, if numbers do this, ugly thing in the code" and a major version with the backwards non-compatible usage of the string hack to allow for a smooth transition?

I feel like the opt-in serializer is a simpler change. I'm happy to work on any of these. Just let me know.

SamSaffron · 2021-05-05T21:54:58Z

I am still sleeping on this :) it sure is very tricky.

A fundamental issue I have is that ",1,2,3,4,".include?(',4,') just looks so aesthetically ugly to me :)

I think allowing consumers to opt for serialization format is ideal here, this is the first change we should drive.

As to how to solve the fundamental perf issue. I think this is a more "general" problem in Ruby worthy of a general solution.

What we could do here to resolve this (and even be faster than the string hack) is build a c extension in a separate gem.

A new type Int32Set / Int64Set
Supports 'include?`
Supports marshal / dump

The c extension can have it so Int32Set is just stored as an int[] list in c and only takes up 1 RVALUE.

Then the internal interfaces in message_bus can remain exactly as they are today (except for the new Serializer class)

@jeremyevans do you like this plan?

SamSaffron · 2021-05-06T05:34:41Z

@rafbgarcia I knocked this implementation up ... work in progress but the test suite passes with OjFast

#250

New pattern used is thanks to @mame , limitation is that ids must not be longer than 4 bytes. We could pack the ints more and allow for 8 bytes with limited perf hit. packing dynamic sizes though is tricky cause we would need an extremely custom unpacking algorithm and binary search.

Regardless, the pattern allows for custom codecs so you could implement the string based one if you wished. Let's move the discussion to the PR. Going to close this off.

eregon · 2021-05-06T09:42:41Z

Doesn't Redis have a way to return whether an array/list contains an element?
Then there would be no need to transfer/deserialize the entire list. I thought part of the point of Redis is that it does include some operations like that for efficiency.

SamSaffron · 2021-05-07T00:28:11Z

@eregon indeed it does, but it gets super tricky to amend all the internals to support this, it would be a massive change. Plus the extra round trips would end up hindering performance for the majority of users that only attempt to restrict message delivery to 2 or 3 users vs 10s of thousands of users.

It is a tough balancing act here :)

SamSaffron closed this as completed May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client.bucket performance improvements #249

Client.bucket performance improvements #249

rafbgarcia commented May 4, 2021

SamSaffron commented May 4, 2021 •

edited

rafbgarcia commented May 4, 2021

SamSaffron commented May 4, 2021

SamSaffron commented May 4, 2021

rafbgarcia commented May 5, 2021

ignisf commented May 5, 2021

SamSaffron commented May 5, 2021

SamSaffron commented May 5, 2021 •

edited

rafbgarcia commented May 5, 2021

SamSaffron commented May 5, 2021

rafbgarcia commented May 5, 2021

SamSaffron commented May 5, 2021

SamSaffron commented May 6, 2021

eregon commented May 6, 2021

SamSaffron commented May 7, 2021

Client.bucket performance improvements #249

Client.bucket performance improvements #249

Comments

rafbgarcia commented May 4, 2021

Array.include?

String.include?

String.include? + Oj

SamSaffron commented May 4, 2021 • edited

rafbgarcia commented May 4, 2021

SamSaffron commented May 4, 2021

SamSaffron commented May 4, 2021

rafbgarcia commented May 5, 2021

ignisf commented May 5, 2021

SamSaffron commented May 5, 2021

SamSaffron commented May 5, 2021 • edited

rafbgarcia commented May 5, 2021

SamSaffron commented May 5, 2021

rafbgarcia commented May 5, 2021

SamSaffron commented May 5, 2021

SamSaffron commented May 6, 2021

eregon commented May 6, 2021

SamSaffron commented May 7, 2021

SamSaffron commented May 4, 2021 •

edited

SamSaffron commented May 5, 2021 •

edited