New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client.bucket performance improvements #249
Comments
This is very interesting information: Given: https://gist.github.com/df1a982f065b85c1861905c5a3171448
And that the issue here was clearly around this line: message_bus/lib/message_bus/client.rb Line 140 in 3955190
Are you 100% sure that simply sorting the list of users and groups upfront and using bsearch will not solve our issue? |
I think the test is a bit misleading as it should include parsing back values from Redis, right? https://gist.github.com/rafbgarcia/1073ad5397c463a38ba81e8fbe6bbe1b
|
The way back from redis certainly should be counted, I see, so in PRD you are seeing very high numbers JSON parse? The optimisation proposed is to limit/optimise json parsing vs cleaning up the 2.25 percent in Instead of using JSON serialization for user_ids maybe we use a more efficient serialization format? I loath to add more dependencies but there may be something built in that is super fast we can use, maybe @jeremyevans has some ideas? |
So to summarize per: https://gist.github.com/e38a3438aa4a27a3f95c5c7386e2cd40
|
Yea I agree with you on both points. Your suggestion to explore alternative serialization options was great. I added binary serialization to the list. https://gist.github.com/rafbgarcia/d043ee02c73b6e518361d60425d5734e
|
Check out https://ruby-doc.org/stdlib-3.0.1/libdoc/set/rdoc/Set.html as well |
@ignisf the issue is that marshalling back to a Set from any wire format, would kill all the gain it has on offer. I can see gains even with the low count of 2 with the string hack / oj https://gist.github.com/6394c76bb6462edc5043d7c594173c02
Even at 1 it is not slower:
|
I am not sure Marshal.load is going to help unless we swap the entire envelope from JSON to binary format. We are still going to be stuck with on JSON.parse if we don't switch the envelope :( Switching the envelope is going to have compatibility concerns possibly. (but maybe we allow for a full Marshal based envelope, optionally).... allow for a base "serializer/deserializer" class in message bus, have the JSON based one be default but allow for people to opt for a Marshal based one or OJ based one if they wish? |
To be honest, I'm not sure how one would transition from JSON to binary. Would that require clearing all backlogs? Or we'd have to create two deserializers: one for transitioning and another for the binary? However, most of the performance improvement is by changing the data type. Even changing from JSON to Oj is not very important IMO. Are you inclining more towards not accepting that change? |
A transition from number list to string hack is going to change the transport format, so we are going to be stuck either making a backwards non-compatible change to get that in or carry a "If String do this, if numbers do this, ugly thing in the code" I think it is fine to offer multiple serializers including non-compatible ones, as long as they are opt in, that way you would swap it and dump your backlog to move to the new faster more efficient one. I think we need to take a few days to think about this. There is no trivial perf hack here. I think Oj vs JSON though is quite big for the general case of small messages, I am seeing quite a bump there and this can be done in a backwards compat way. |
What if we ship a minor version with the I feel like the opt-in serializer is a simpler change. I'm happy to work on any of these. Just let me know. |
I am still sleeping on this :) it sure is very tricky. A fundamental issue I have is that I think allowing consumers to opt for serialization format is ideal here, this is the first change we should drive. As to how to solve the fundamental perf issue. I think this is a more "general" problem in Ruby worthy of a general solution. What we could do here to resolve this (and even be faster than the string hack) is build a c extension in a separate gem.
The c extension can have it so Int32Set is just stored as an int[] list in c and only takes up 1 RVALUE. Then the internal interfaces in message_bus can remain exactly as they are today (except for the new Serializer class) @jeremyevans do you like this plan? |
@rafbgarcia I knocked this implementation up ... work in progress but the test suite passes with OjFast New pattern used is thanks to @mame , limitation is that ids must not be longer than 4 bytes. We could pack the ints more and allow for 8 bytes with limited perf hit. packing dynamic sizes though is tricky cause we would need an extremely custom unpacking algorithm and binary search. Regardless, the pattern allows for custom codecs so you could implement the string based one if you wished. Let's move the discussion to the PR. Going to close this off. |
Doesn't Redis have a way to return whether an array/list contains an element? |
@eregon indeed it does, but it gets super tricky to amend all the internals to support this, it would be a massive change. Plus the extra round trips would end up hindering performance for the majority of users that only attempt to restrict message delivery to 2 or 3 users vs 10s of thousands of users. It is a tough balancing act here :) |
Hi,
I've made some performance tests against
client.bucket
, and it seems like channels with hundreds/thousands of users suffer from the array parsing, as you can see in this rubyprof against the master branch.I tested a few alternatives:
",#{user_ids.join(","),}"
and checked withuser_ids.include?(",#{user_id},")
- this made a big improvementHere are the rubyprofs for String.include? and Oj + String.include?
And below are IPS benchmarks.
This is the code I used for the tests.
Array.include?
String.include?
String.include? + Oj
@SamSaffron, what do you think about these?
The text was updated successfully, but these errors were encountered: