Optimized initialization of Redis::Cluster #912
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We were noticing that initializing a Redis Cluster client was fairly slow, and after a coworker dug into it, it turned out to come primarily from
build_slot_node_key_map
. Since the current implementation does heavy amount of allocation/work, when you consider that each node in a cluster has the slots iterated over, which leads to about 16,384 iterations * nodes.Using the redis-rb cluster configuration as an example, it takes about 17ms to initialize and 49k objects, we saw significantly higher in production, but we're also running a lot more nodes. The PR gets it down to about ~1.5ms and 9 allocations.
The optimization works by building up the primary/secondary node configuration off of
slots_arr
instead. My understanding, is secondaries are always mirroring the slot configuration of a primary, so this trick is fine.After we have that, we simply load up the slot configuration by adding our existing hash, which ensures we don't have to allocate a new object for every single slot. When we call
put
(fromMOVED
), we duplicate the map to ensure only that slot is changed.I dropped the use of
Set
, since Ruby sets are inefficient as they're just wrappers around hashes, and the number of secondaries is sufficiently low enough that checkinginclude?
is not the end of the world.It also means that you no longer have to call
to_a
each time you callfind_node_key_of_slave
. If you somehow had a configuration where you had tens of thousands of secondaries, yourfind_node_key_of_slave
call would be slow anyway, and now only your initialization is slow.Let me know if I'm misunderstanding about the Redis Cluster setup that makes this not work though.
My understanding is the existing Redis tests should cover this, and so it doesn't need additional tests, but happy to add something if you want.