New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconnect on "UNBLOCKED force unblock" errors #4985
Conversation
8741fa3
to
7faabd0
Compare
Any reason not to trigger on the UNBLOCKED code, just like the others? |
I wasn't sure about that part, either - the part I am matching on seems more related to the failover scenario, which seems to be what we're interested in. I don't know what other scenarios could lead to such an error. |
I think the uppercase word is the "error code". The message after it is informative and could be translated into other languages, for instance. We should perform logic on the error code only. |
These errors can occur during Sidekiq's long-running job fetching command. This uses Redis' blocking BRPOP primitive. On failover in a cluster setup, these commands are interrupted by the server. This error causes the worker threads to be restarted, but as they are bubbled up to the top, they cause a lot of spam in our error logging systems. As related errors from other commands are being handled (see
7faabd0
to
cb0b516
Compare
That sounds sensible. Changed it. |
Oh, one more thing: I also considered handling this directly in the |
I think it's better to have a blanket policy to reset the connection upon any of these error codes. |
Thanks a lot! @mperham I read somewhere you only do releases every few months. That is totally fine, of course. Is the main branch considered stable, though? |
Yes, main is stable. |
Thanks a lot for Sidekiq!
In our setup with a Redis cluster, we are having regular failovers,
which is fine. We are currently improving the way our application
handles these situations and stumbled across a possible improvement
in Sidekiq itself.
By far the most frequent error in our error monitoring system is the
following:
These errors can occur during Sidekiq's long-running job fetching
command. This uses Redis' blocking BRPOP primitive. On failover in a
cluster setup, these commands are interrupted by the server.
This error causes the worker threads to be restarted, but as they are
bubbled up to the top, they cause a lot of spam in our error logging
systems. As related errors from other commands are being handled (see
#2550 and #4495) this way, it seems senbile to also handle this one.
Details
Stacktrace