Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many Cluster redirections when Azure Redis Cache has a failover #1863

Open
Toritos01 opened this issue Mar 6, 2024 · 0 comments
Open

Comments

@Toritos01
Copy link

Toritos01 commented Mar 6, 2024

Hello, there seems to be an issue with ioredis 5.3.2 when connected to an Azure Redis Cache. Initially, everything was mostly working fine when I use two shards. However, I noticed that when one of shard's primary node fails-over (this happens periodically in Azure Redis Cache for maintenance), it completely breaks the ioredis connection of my app that was connected at the time. The only way that I have been able to fix this is by using the "Reboot" feature in the Azure portal to make the other shard also failover, then both of them get fixed somehow.

I am able to consistently reproduce this failure by using the Azure portal "Reboot" to make one of the shard's Primary nodes failover. What I notice when this happens is that the shard that failed over completely stops receiving commands from my app (I can tell by looking at the "Monitor" command in the Redis Console for that shard). I can also tell that this is not an Azure Redis Cache issue because when I run a different test-app, I see some of those commands successfully go to the shard that failed over. This also tells me that the shard is already operational after the failover, but that ioredis still will not connect to it.

This seems to me like an issue where ioredis gets stuck in a redirect loop and cannot find its way back to the shard the failed over. The ioredis error that I see coming up look like this:
{"name":"Error","message":"Too many Cluster redirections. Last error: ReplyError: MOVED <IP>","stack":"Error: Too many Cluster redirections. Last error: ReplyError: MOVED <IP>"}
or also sometimes:
{"name":"Error","message":"Too many Cluster redirections. Last error: Error: Connection is closed.","stack":"Error: Too many Cluster redirections. Last error: Error: Connection is closed."}

Some additional context on my setup:
-Based on the logs, the app I am testing with is mostly sending out "rpop" commands repeatedly
-ioredis 5.3.2
-Node version 18
-Clustering enabled with 2 shards
-Azure Redis Cache v6
-SSL enabled, and using SSL port (6380)
-No multi-key operations across shards are occurring
-The constructor below is what I use to access the cluster:
const redisClient = new redis.Cluster([
{
port: 6380,
host: <hostname>,
},
], {
slotsRefreshTimeout: 50000,
dnsLookup: (address, callback) => callback(null, address),
showFriendlyErrorStack: true,
redisOptions: {
port: 6380,
host: <hostname>,
password: <password>,
connectTimeout: 20000,
enableReadyCheck: true,
maxRetriesPerRequest: 3,
enableOfflineQueue: true,
enableAutoPipelining: true,
autoPipeliningIgnoredCommands: ['ping'],
tls:
{
servername: <hostname>
}
},
});

I would mainly like to know if anyone knows why this happens, or any possible workarounds.

TL;DR:
Routine Azure Redis Cache failovers cause the "Too many Cluster redirections" error on a service that is running at the time of failover, by causing ioredis to stop sending commands to the shard that failed over, even after it recovers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant