Reconnect on known errors after failover when pushing jobs to Redis

In a Redis cluster setup, failovers will happen. In these cases a `Redis::CommandError` can be raised for different reasons, for example when the server becomes a replica, when there is a "Not enough replicas" error from the primary, or when a blocking command is force-unblocked. These errors can occur when pushing a job to Redis, so it needs to reconnect to the current master node and retry. Otherwise, these jobs are lost. The retry logic is similar to the implementation for `Sidekiq.redis`.
sidekiq · Jan 31, 2022 · ee45f2a · ee45f2a
1 parent 10e7feb
commit ee45f2a
Showing 1 changed file with 16 additions and 2 deletions.
diff --git a/lib/sidekiq/client.rb b/lib/sidekiq/client.rb
@@ -189,8 +189,22 @@ def enqueue_in(interval, klass, *args)
 
     def raw_push(payloads)
       @redis_pool.with do |conn|
-        conn.pipelined do |pipeline|
-          atomic_push(pipeline, payloads)
+        retryable = true
+        begin
+          conn.pipelined do
+            atomic_push(conn, payloads)
+          end
+        rescue Redis::BaseError => ex
+          # 2550 Failover can cause the server to become a replica, need
+          # to disconnect and reopen the socket to get back to the primary.
+          # 4495 Use the same logic if we have a "Not enough replicas" error from the primary.
+          # 4985 Use the same logic when a blocking command is force-unblocked.
+          if retryable && ex.message =~ /READONLY|NOREPLICAS|UNBLOCKED/
+            conn.disconnect!
+            retryable = false
+            retry
+          end
+          raise
         end
       end
       true