Explicitly signal that we handled an exception with a retry, fixes #4138 #4141

mperham · 2019-04-05T23:03:30Z

Under just the right conditions, we could lose a job:

Job raises an error
Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception
Processor catches Redis exception and thinks a retry was created
Redis comes back online just in time for the job to be acknowledged and lost

That's a very specific and rare set of steps but it can happen.

Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases:

Job is successful: job is acknowledged.
Job fails, retry is created, Processor rescues specific error: job is acknowledged.
Sidekiq::Shutdown is raised: job is not acknowledged

Now there is another case:

Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.

Under just the right conditions, we could lose a job: - Job raises an error - Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception - Processor catches Redis exception and thinks a retry was created - Redis comes back online just in time for the job to be acknowledged and lost That's a very specific and rare set of steps but it can happen. Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases: 1. Job is successful: job is acknowledged. 2. Job fails, retry is created, Processor rescues specific error: job is acknowledged. 3. Sidekiq::Shutdown is raised: job is not acknowledged Now there is another case: 4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.

tycooon · 2019-04-05T23:46:41Z

Maybe you could clarify what was the actual problem and how did you fix it?

tycooon · 2019-04-05T23:55:29Z

Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.

that's the mechanism I would rely the most

mperham · 2019-04-06T00:00:00Z

Nope, I phrased it poorly but that scenario should be extremely rare. I mean “the retry cannot be created in Redis due to unexpected error”

…

On Apr 5, 2019, at 16:55, Yuri Smirnov ***@***.***> wrote: Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future. that's the mechanism I would rely the most — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

tycooon · 2019-04-06T00:05:13Z

Welp, it happened already, and we only are using sidekiq for like 4 months
I mean if there is problem in the code we should fix it right?

mperham · 2019-04-06T00:06:06Z

What do you think this is? A fix.

…

On Apr 5, 2019, at 17:05, Yuri Smirnov ***@***.***> wrote: Welp, it happened already, and we only are using sidekiq for like 4 months I mean if there is problem in the code we should fix it right? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

tycooon · 2019-04-06T00:11:19Z

There is a bunch of code that I still don't understand, But I see that now we are not acking every exception and that's seems cool to me

tycooon · 2019-04-06T00:18:56Z

I will test this version versus the last stable for my case

mperham · 2019-04-09T17:29:42Z

I'll merge this later this week if I don't hear anything.

mperham mentioned this pull request Apr 5, 2019

Job getting lost in case of failed retry #4138

Closed

changes

dea5c8e

mperham merged commit c650e9b into master Apr 12, 2019

mperham deleted the signal_successful_retry branch April 12, 2019 01:34

mperham mentioned this pull request Apr 12, 2019

Race condition in shutdown #4115

Closed

mhenrixon mentioned this pull request Apr 13, 2019

:until_executed jobs get stuck every now and then mhenrixon/sidekiq-unique-jobs#379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly signal that we handled an exception with a retry, fixes #4138 #4141

Explicitly signal that we handled an exception with a retry, fixes #4138 #4141

mperham commented Apr 5, 2019

tycooon commented Apr 5, 2019

tycooon commented Apr 5, 2019 •

edited

mperham commented Apr 6, 2019 via email

tycooon commented Apr 6, 2019

mperham commented Apr 6, 2019 via email

tycooon commented Apr 6, 2019

tycooon commented Apr 6, 2019

mperham commented Apr 9, 2019

Explicitly signal that we handled an exception with a retry, fixes #4138 #4141

Explicitly signal that we handled an exception with a retry, fixes #4138 #4141

Conversation

mperham commented Apr 5, 2019

tycooon commented Apr 5, 2019

tycooon commented Apr 5, 2019 • edited

mperham commented Apr 6, 2019 via email

tycooon commented Apr 6, 2019

mperham commented Apr 6, 2019 via email

tycooon commented Apr 6, 2019

tycooon commented Apr 6, 2019

mperham commented Apr 9, 2019

tycooon commented Apr 5, 2019 •

edited