Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client.bucket performance improvements #249

Closed
rafbgarcia opened this issue May 4, 2021 · 15 comments
Closed

Client.bucket performance improvements #249

rafbgarcia opened this issue May 4, 2021 · 15 comments

Comments

@rafbgarcia
Copy link

Hi,

I've made some performance tests against client.bucket, and it seems like channels with hundreds/thousands of users suffer from the array parsing, as you can see in this rubyprof against the master branch.

I tested a few alternatives:

  • Stored sorted user_ids and use binary search - improved by just a bit
  • Stored a hash version of user_ids - as you can imagine, this made it much worse
  • Stored user_ids as ",#{user_ids.join(","),}" and checked with user_ids.include?(",#{user_id},") - this made a big improvement
  • Using Oj instead of JSON makes it a bit better

Here are the rubyprofs for String.include? and Oj + String.include?

And below are IPS benchmarks.

This is the code I used for the tests.

Array.include?

➜  message_bus git:(master) ✗ docker-compose run tests rake performance
Creating message_bus_tests_run ... done
/usr/local/bin/ruby -e "ARGV.each{|f| load f}" spec/performance/allowed.rb spec/performance/publish.rb
Warming up --------------------------------------
             backlog    10.000  i/100ms
Calculating -------------------------------------
             backlog    102.247  (± 2.9%) i/s -    520.000  in   5.090800s

String.include?

➜  message_bus git:(master) ✗ docker-compose run tests rake performance
Creating message_bus_tests_run ... done
/usr/local/bin/ruby -e "ARGV.each{|f| load f}" spec/performance/allowed.rb spec/performance/publish.rb
Warming up --------------------------------------
             backlog    17.000  i/100ms
Calculating -------------------------------------
             backlog    183.294  (± 9.8%) i/s -    918.000  in   5.074522s

String.include? + Oj

➜  message_bus git:(master) ✗ docker-compose run tests rake performance
Creating message_bus_tests_run ... done
/usr/local/bin/ruby -e "ARGV.each{|f| load f}" spec/performance/allowed.rb spec/performance/publish.rb
Warming up --------------------------------------
             backlog    20.000  i/100ms
Calculating -------------------------------------
             backlog    202.522  (± 5.4%) i/s -      1.020k in   5.057622s

@SamSaffron, what do you think about these?

@SamSaffron
Copy link
Member

SamSaffron commented May 4, 2021

This is very interesting information:

Given: https://gist.github.com/df1a982f065b85c1861905c5a3171448

binary search numbers:  3064824.1 i/s
              string:   388081.3 i/s - 7.90x  (± 0.00) slower
             numbers:    47063.5 i/s - 65.12x  (± 0.00) slower

And that the issue here was clearly around this line:

user_allowed = msg.user_ids.include?(self.user_id)

Are you 100% sure that simply sorting the list of users and groups upfront and using bsearch will not solve our issue?

@rafbgarcia
Copy link
Author

I think the test is a bit misleading as it should include parsing back values from Redis, right?

https://gist.github.com/rafbgarcia/1073ad5397c463a38ba81e8fbe6bbe1b

Warming up --------------------------------------
binary search numbers
                       177.000  i/100ms
             numbers   169.000  i/100ms
              string     1.467k i/100ms
Calculating -------------------------------------
binary search numbers
                          1.785k (± 5.3%) i/s -      9.027k in   5.073454s
             numbers      1.711k (± 4.9%) i/s -      8.619k in   5.049765s
              string     14.827k (± 7.3%) i/s -     74.817k in   5.076025s

Comparison:
              string:    14826.6 i/s
binary search numbers:     1784.9 i/s - 8.31x  (± 0.00) slower
             numbers:     1711.2 i/s - 8.66x  (± 0.00) slower

@SamSaffron
Copy link
Member

The way back from redis certainly should be counted, I see, so in PRD you are seeing very high numbers JSON parse?

image

The optimisation proposed is to limit/optimise json parsing vs cleaning up the 2.25 percent in include?

Instead of using JSON serialization for user_ids maybe we use a more efficient serialization format? I loath to add more dependencies but there may be something built in that is super fast we can use, maybe @jeremyevans has some ideas?

@SamSaffron
Copy link
Member

So to summarize per:

https://gist.github.com/e38a3438aa4a27a3f95c5c7386e2cd40


Comparison:
      String hack Oj:    28291.9 i/s
         String hack:    20373.9 i/s - 1.39x  (± 0.00) slower
Numbers binary search Oj:     3736.4 i/s - 7.57x  (± 0.00) slower
  Numbers include Oj:     3492.2 i/s - 8.10x  (± 0.00) slower
Numbers binary search:     2264.1 i/s - 12.50x  (± 0.00) slower
     Numbers include:     2164.2 i/s - 13.07x  (± 0.00) slower
  1. Sure, a PR to optionally allow for using of Oj vs JSON for serialization is fine (no extra dependency though, minimal indirection to be added)
  2. I need to sleep on this, but I do not think we can beat the "string" hack, there is no simple way to deserialze without adding cost. A c extension though could do better, but certainly comes with caveats.

@rafbgarcia
Copy link
Author

Yea I agree with you on both points. Your suggestion to explore alternative serialization options was great. I added binary serialization to the list.

https://gist.github.com/rafbgarcia/d043ee02c73b6e518361d60425d5734e

Comparison:
  Binary string hack:    28731.8 i/s
      String hack Oj:    16416.5 i/s - 1.75x  (± 0.00) slower
         String hack:     8848.8 i/s - 3.25x  (± 0.00) slower
Binary numbers binary search:     4192.2 i/s - 6.85x  (± 0.00) slower
  Numbers include Oj:     1763.8 i/s - 16.29x  (± 0.00) slower
Numbers binary search Oj:     1648.6 i/s - 17.43x  (± 0.00) slower
Numbers binary search:     1613.1 i/s - 17.81x  (± 0.00) slower
     Numbers include:     1519.2 i/s - 18.91x  (± 0.00) slower

@ignisf
Copy link

ignisf commented May 5, 2021

@SamSaffron
Copy link
Member

@ignisf the issue is that marshalling back to a Set from any wire format, would kill all the gain it has on offer.

I can see gains even with the low count of 2 with the string hack / oj

https://gist.github.com/6394c76bb6462edc5043d7c594173c02

    String hack Oj:  3644428.5 i/s
  Numbers include Oj:  3424118.9 i/s - 1.06x  (± 0.00) slower
Numbers binary search Oj:  3176211.7 i/s - 1.15x  (± 0.00) slower
         String hack:  1818291.7 i/s - 2.00x  (± 0.00) slower
     Numbers include:  1573969.7 i/s - 2.32x  (± 0.00) slower
Numbers binary search:  1478193.7 i/s - 2.47x  (± 0.00) slower

Even at 1 it is not slower:

  Numbers include Oj:  4322382.0 i/s
      String hack Oj:  4315847.4 i/s - same-ish: difference falls within error
Numbers binary search Oj:  3847566.7 i/s - 1.12x  (± 0.00) slower
         String hack:  1817312.4 i/s - 2.38x  (± 0.00) slower
     Numbers include:  1606598.3 i/s - 2.69x  (± 0.00) slower
Numbers binary search:  1540899.1 i/s - 2.81x  (± 0.00) slower

@SamSaffron
Copy link
Member

SamSaffron commented May 5, 2021

I am not sure Marshal.load is going to help unless we swap the entire envelope from JSON to binary format. We are still going to be stuck with on JSON.parse if we don't switch the envelope :(

Switching the envelope is going to have compatibility concerns possibly. (but maybe we allow for a full Marshal based envelope, optionally).... allow for a base "serializer/deserializer" class in message bus, have the JSON based one be default but allow for people to opt for a Marshal based one or OJ based one if they wish?

@rafbgarcia
Copy link
Author

To be honest, I'm not sure how one would transition from JSON to binary.

Would that require clearing all backlogs? Or we'd have to create two deserializers: one for transitioning and another for the binary?

However, most of the performance improvement is by changing the data type. Even changing from JSON to Oj is not very important IMO.

Are you inclining more towards not accepting that change?

@SamSaffron
Copy link
Member

A transition from number list to string hack is going to change the transport format, so we are going to be stuck either making a backwards non-compatible change to get that in or carry a "If String do this, if numbers do this, ugly thing in the code"

I think it is fine to offer multiple serializers including non-compatible ones, as long as they are opt in, that way you would swap it and dump your backlog to move to the new faster more efficient one.

I think we need to take a few days to think about this. There is no trivial perf hack here. I think Oj vs JSON though is quite big for the general case of small messages, I am seeing quite a bump there and this can be done in a backwards compat way.

@rafbgarcia
Copy link
Author

What if we ship a minor version with the "If String do this, if numbers do this, ugly thing in the code" and a major version with the backwards non-compatible usage of the string hack to allow for a smooth transition?

I feel like the opt-in serializer is a simpler change. I'm happy to work on any of these. Just let me know.

@SamSaffron
Copy link
Member

I am still sleeping on this :) it sure is very tricky.

A fundamental issue I have is that ",1,2,3,4,".include?(',4,') just looks so aesthetically ugly to me :)

I think allowing consumers to opt for serialization format is ideal here, this is the first change we should drive.

As to how to solve the fundamental perf issue. I think this is a more "general" problem in Ruby worthy of a general solution.

What we could do here to resolve this (and even be faster than the string hack) is build a c extension in a separate gem.

  1. A new type Int32Set / Int64Set
  2. Supports 'include?`
  3. Supports marshal / dump

The c extension can have it so Int32Set is just stored as an int[] list in c and only takes up 1 RVALUE.

Then the internal interfaces in message_bus can remain exactly as they are today (except for the new Serializer class)

@jeremyevans do you like this plan?

@SamSaffron
Copy link
Member

@rafbgarcia I knocked this implementation up ... work in progress but the test suite passes with OjFast

#250

New pattern used is thanks to @mame , limitation is that ids must not be longer than 4 bytes. We could pack the ints more and allow for 8 bytes with limited perf hit. packing dynamic sizes though is tricky cause we would need an extremely custom unpacking algorithm and binary search.

Regardless, the pattern allows for custom codecs so you could implement the string based one if you wished. Let's move the discussion to the PR. Going to close this off.

@eregon
Copy link

eregon commented May 6, 2021

Doesn't Redis have a way to return whether an array/list contains an element?
Then there would be no need to transfer/deserialize the entire list. I thought part of the point of Redis is that it does include some operations like that for efficiency.

@SamSaffron
Copy link
Member

@eregon indeed it does, but it gets super tricky to amend all the internals to support this, it would be a massive change. Plus the extra round trips would end up hindering performance for the majority of users that only attempt to restrict message delivery to 2 or 3 users vs 10s of thousands of users.

It is a tough balancing act here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants