New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly signal that we handled an exception with a retry, fixes #4138 #4141
Conversation
Under just the right conditions, we could lose a job: - Job raises an error - Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception - Processor catches Redis exception and thinks a retry was created - Redis comes back online just in time for the job to be acknowledged and lost That's a very specific and rare set of steps but it can happen. Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases: 1. Job is successful: job is acknowledged. 2. Job fails, retry is created, Processor rescues specific error: job is acknowledged. 3. Sidekiq::Shutdown is raised: job is not acknowledged Now there is another case: 4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.
Maybe you could clarify what was the actual problem and how did you fix it? |
that's the mechanism I would rely the most |
Nope, I phrased it poorly but that scenario should be extremely rare. I mean “the retry cannot be created in Redis due to unexpected error”
… On Apr 5, 2019, at 16:55, Yuri Smirnov ***@***.***> wrote:
Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.
that's the mechanism I would rely the most
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Welp, it happened already, and we only are using sidekiq for like 4 months |
What do you think this is? A fix.
… On Apr 5, 2019, at 17:05, Yuri Smirnov ***@***.***> wrote:
Welp, it happened already, and we only are using sidekiq for like 4 months
I mean if there is problem in the code we should fix it right?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
There is a bunch of code that I still don't understand, But I see that now we are not acking every exception and that's seems cool to me |
I will test this version versus the last stable for my case |
I'll merge this later this week if I don't hear anything. |
Under just the right conditions, we could lose a job:
That's a very specific and rare set of steps but it can happen.
Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases:
Now there is another case: