You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been testing the StackExchange.Redis library against a Redis cluster with three shards. My topology consists of three physical nodes each running three Redis instances, i.e. one master and two replicas (for the other two shards) on each physical node, resulting in a total of nine Redis instances. Each of the physical nodes has its own public IP address and each Redis instance.
When the master is failed over to one of its replicas, the library doesn't seem to handle the MOVED response properly on set commands. Specifically, when the library processes the MOVED response as it tries to set a value on the old master (which is now a replica), it correctly updates ServerSelectionStrategy.map for the given hash slot (i.e. it changes the slot in the array to the ServerEndpoint of the new master), but when it tries to re-send the set command, the logic in ServerSelectionStrategy.Select() causes the old master to be chosen again because the ServerEndpoint isn't marked as a master yet:
private ServerEndPoint FindMaster(ServerEndPoint endpoint, RedisCommand command)
{
int max = 5;
do
{
if (!endpoint.IsReplica && endpoint.IsSelectable(command)) return endpoint;
endpoint = endpoint.Master;
} while (endpoint != null && --max != 0);
return null;
}
Interestingly, ServerSelectionStrategy.Select() has a comment that states that all the entries in the 'map' are masters, which is correct, so why do we need to even call FindMaster() on the node if we could just use it directly?
ServerEndPoint endpoint = arr[slot], testing;
// but: ^^^ is the MASTER slots; if we want a replica, we need to do some thinking
The end result of the current logic is that the set operation ultimately fails with an InternalServer error since the same endpoint is tried twice and it's no longer the master.
The text was updated successfully, but these errors were encountered:
Several potential solutions come to mind, but I'm not an expert in the library and would love some feedback.
One solution could be to add an override flag that will allow the library to write to the ServerEndpoint because we got the MOVED hint, even though it’s still set as a replica, which will make sets work until the next auto discovery will fix everything up. In other words, one could extend the logic in PhysicalBridge.WriteMessageToServerInsideWriteLock to check an additional flag on the following line: if (isMasterOnly && ServerEndPoint.IsReplica && (ServerEndPoint.ReplicaReadOnly || !ServerEndPoint.AllowReplicaWrites))
Alternatively, we could kick off an auto discovery operation (cluster nodes command) to refresh the map when it looks like a key has moved, but that seems pretty heavy when we’re anyway getting the MOVED hints.
I'm not 100% sure about this because if there is a MOVED happening (e.g. bad proxy somewhere) this would just continually re-run...but only once every 5 seconds. Overall though, we linger in a bad state retrying moves until a discovery happens today and this could be resolved much faster.
Meant to help address #1520, #1660, #2074, and #2020.
Meant to help address #1520, #1660, #2074, and #2020.
I'm not 100% sure about this because if there is a MOVED happening (e.g. bad proxy somewhere) this would just continually re-run...but only once every 5 seconds. Overall though, we linger in a bad state retrying moves until a discovery happens today and this could be resolved much faster.
I've been testing the StackExchange.Redis library against a Redis cluster with three shards. My topology consists of three physical nodes each running three Redis instances, i.e. one master and two replicas (for the other two shards) on each physical node, resulting in a total of nine Redis instances. Each of the physical nodes has its own public IP address and each Redis instance.
When the master is failed over to one of its replicas, the library doesn't seem to handle the MOVED response properly on set commands. Specifically, when the library processes the MOVED response as it tries to set a value on the old master (which is now a replica), it correctly updates
ServerSelectionStrategy.map
for the given hash slot (i.e. it changes the slot in the array to theServerEndpoint
of the new master), but when it tries to re-send the set command, the logic in ServerSelectionStrategy.Select() causes the old master to be chosen again because theServerEndpoint
isn't marked as a master yet:Interestingly,
ServerSelectionStrategy.Select()
has a comment that states that all the entries in the 'map' are masters, which is correct, so why do we need to even callFindMaster()
on the node if we could just use it directly?The end result of the current logic is that the set operation ultimately fails with an InternalServer error since the same endpoint is tried twice and it's no longer the master.
The text was updated successfully, but these errors were encountered: