Ensure command details are populated for Cluster clients #1026

praboud-stripe · 2021-09-09T13:06:36Z

I recently stumbled over a corner case in the clustered client which is sort of interesting:

Background

If the Redis nodes that the client is connecting to are degraded when CommandLoader is called, the command details hash returned will be {} - the client essentially doesn't understand what any of the commands' syntax is.

redis-rb/lib/redis/cluster/command_loader.rb

Lines 13 to 20 in 688ac95

    
           details = {} 
        
           nodes.each do |node| 
        
             details = fetch_command_details(node) 
        
             details.empty? ? next : break 
        
           end 
        
           details

(This is pretty rare in practice. The queries made by SlotLoader and NodeLoader immediately prior to this need to have worked - otherwise those will raise - but then all nodes must be unreachable when we're running CommandLoader.)

These command details are later used by Command to process a command into a key, so that the command can be routed to the right node. If the command details are empty and we don't understand the command syntax, Command.determine_first_key_position doesn't know which command argument to get the hash from, so Command.extract_first_key returns '':

redis-rb/lib/redis/cluster/command.rb

Lines 50 to 61 in 688ac95

    
           def determine_first_key_position(command) 
        
             case command.first.to_s.downcase 
        
             when 'eval', 'evalsha', 'migrate', 'zinterstore', 'zunionstore' then 3 
        
             when 'object' then 2 
        
             when 'memory' 
        
               command[1].to_s.casecmp('usage').zero? ? 2 : 0 
        
             when 'scan', 'sscan', 'hscan', 'zscan' 
        
               determine_optional_key_position(command, 'match') 
        
             when 'xread', 'xreadgroup' 
        
               determine_optional_key_position(command, 'streams') 
        
             else 
        
               dig_details(command, :first_key_position).to_i

redis-rb/lib/redis/cluster/command.rb

Lines 14 to 16 in 688ac95

    
           def extract_first_key(command) 
        
             i = determine_first_key_position(command) 
        
             return '' if i == 0

...and that means that Cluster.find_node_key returns nil, because we don't know where the command needs to be routed. This is handled relatively gracefully - find_node routes to a random node if we don't have a specific node to route to.

redis-rb/lib/redis/cluster.rb

Lines 258 to 260 in 688ac95

    
           def find_node_key(command, primary_only: false) 
        
             key = @command.extract_first_key(command) 
        
             return if key.empty?

redis-rb/lib/redis/cluster.rb

Lines 272 to 273 in 688ac95

    
           def find_node(node_key) 
        
             return @node.sample if node_key.nil?

However, routing commands randomly means most commands are incurring an extra hop, because the randomly selected node is most likely wrong, and will need to redirect the client to the correct node. This works, but is a bit slower, and unnecessarily takes a dependency on an extra node (ie: if either of the random node or the node our data is on is degraded, the command will fail). The client gets wedged in this wonky state for its entire lifetime, because there's no mechanism to later update the command details. Because clients are intended to be long-lived, for most use cases, this means until the process is restarted.

Change Summary

This patch changes the client to require that the initial fetch of command details succeeds, and raises a CannotConnectError if that's not possible. Users should already handle this error, and retry creating the client later, or use some other suitable failure handling mechanism. Since SlotLoader and NodeLoader can already raise CannotConnectError, this shouldn't be a breaking change to the gem's external interface.

praboud-stripe · 2021-09-14T12:44:32Z

It's my first time contributing, so apologies if there's a set of instructions I missed somewhere, but what's next for getting this merged? AFAICT, a maintainer needs to kick off the CI build (I can't do this myself because this is my first PR to the repo), and then somebody with write access to merge the PR (assuming the build passes).

byroot · 2021-09-14T12:46:30Z

Yeah, my fault, I should have reviewed this a while ago. I'll merge in a few minutes assuming the CI is green.

Thanks for your patch.

praboud-stripe · 2021-09-14T12:47:56Z

No worries, and thanks!

praboud-stripe · 2021-09-14T12:48:25Z

Agh, there's a lint error - lemme send a fix.

Previously, if a clustered client is unable to query any of the nodes for command details during setup, it'd initialize successfully, but without any command details. (This is a bit of a corner case - in order to hit this case, at least one node needs to be reachable when querying nodes + slots, but all nodes would need to be unreachable immediately afterwards when querying command details.) In this case, the client would be unable to determine which node a given command should be routed to. Because the client falls back to routing commands to a random node, and following redirects as necessary, this wouldn't cause any overt problems, but leaves the client following unnecessary redirects for the rest of the lifespan of the client. This patch changes the client to require that the initial fetch of command details succeeds (as is already done with slot and node detail fetching), and raises a CannotConnectError if that's not possible. Users should already handle this error, and retry creating the client later, or use some other suitable failure handling mechanism.

supercaracal approved these changes Sep 11, 2021

View reviewed changes

praboud-stripe force-pushed the cluster-command branch from 4cda62b to ea24b5b Compare September 14, 2021 12:50

byroot merged commit 5c4f2e8 into redis:master Sep 14, 2021

praboud-stripe deleted the cluster-command branch September 14, 2021 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure command details are populated for Cluster clients #1026

Ensure command details are populated for Cluster clients #1026

praboud-stripe commented Sep 9, 2021 •

edited

praboud-stripe commented Sep 14, 2021

byroot commented Sep 14, 2021

praboud-stripe commented Sep 14, 2021

praboud-stripe commented Sep 14, 2021

	details = {}

	nodes.each do \|node\|
	details = fetch_command_details(node)
	details.empty? ? next : break
	end

	details

	def determine_first_key_position(command)
	case command.first.to_s.downcase
	when 'eval', 'evalsha', 'migrate', 'zinterstore', 'zunionstore' then 3
	when 'object' then 2
	when 'memory'
	command[1].to_s.casecmp('usage').zero? ? 2 : 0
	when 'scan', 'sscan', 'hscan', 'zscan'
	determine_optional_key_position(command, 'match')
	when 'xread', 'xreadgroup'
	determine_optional_key_position(command, 'streams')
	else
	dig_details(command, :first_key_position).to_i

	def extract_first_key(command)
	i = determine_first_key_position(command)
	return '' if i == 0

	def find_node_key(command, primary_only: false)
	key = @command.extract_first_key(command)
	return if key.empty?

Ensure command details are populated for Cluster clients #1026

Ensure command details are populated for Cluster clients #1026

Conversation

praboud-stripe commented Sep 9, 2021 • edited

Background

Change Summary

praboud-stripe commented Sep 14, 2021

byroot commented Sep 14, 2021

praboud-stripe commented Sep 14, 2021

praboud-stripe commented Sep 14, 2021

praboud-stripe commented Sep 9, 2021 •

edited