Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly signal that we handled an exception with a retry, fixes #4138 #4141

Merged
merged 2 commits into from Apr 12, 2019

Conversation

mperham
Copy link
Collaborator

@mperham mperham commented Apr 5, 2019

Under just the right conditions, we could lose a job:

  • Job raises an error
  • Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception
  • Processor catches Redis exception and thinks a retry was created
  • Redis comes back online just in time for the job to be acknowledged and lost

That's a very specific and rare set of steps but it can happen.

Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases:

  1. Job is successful: job is acknowledged.
  2. Job fails, retry is created, Processor rescues specific error: job is acknowledged.
  3. Sidekiq::Shutdown is raised: job is not acknowledged

Now there is another case:

  1. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.



Under just the right conditions, we could lose a job:

- Job raises an error
- Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception
- Processor catches Redis exception and thinks a retry was created
- Redis comes back online just in time for the job to be acknowledged and lost

That's a very specific and rare set of steps but it can happen.

Instead have the Retry subsystem raise a specific error signaling that it created a retry.  There will be three common cases:

1. Job is successful: job is acknowledged.
2. Job fails, retry is created, Processor rescues specific error: job is acknowledged.
3. Sidekiq::Shutdown is raised: job is not acknowledged

Now there is another case:

4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.
@tycooon
Copy link

tycooon commented Apr 5, 2019

Maybe you could clarify what was the actual problem and how did you fix it?

@tycooon
Copy link

tycooon commented Apr 5, 2019

Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.

that's the mechanism I would rely the most

@mperham
Copy link
Collaborator Author

mperham commented Apr 6, 2019 via email

@tycooon
Copy link

tycooon commented Apr 6, 2019

Welp, it happened already, and we only are using sidekiq for like 4 months
I mean if there is problem in the code we should fix it right?

@mperham
Copy link
Collaborator Author

mperham commented Apr 6, 2019 via email

@tycooon
Copy link

tycooon commented Apr 6, 2019

There is a bunch of code that I still don't understand, But I see that now we are not acking every exception and that's seems cool to me

@tycooon
Copy link

tycooon commented Apr 6, 2019

I will test this version versus the last stable for my case

@mperham
Copy link
Collaborator Author

mperham commented Apr 9, 2019

I'll merge this later this week if I don't hear anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants