-
Notifications
You must be signed in to change notification settings - Fork 11
Error Handling
I hate to say it but some of your jobs will raise exceptions when executing. It's true.
Sidekiq has a number of features to handle errors of all types.
- Use an error service - Honeybadger, Airbrake, Rollbar, BugSnag, Sentry, Exceptiontrap, Raygun, etc. They're all similar in feature sets and pricing but pick one and use it. The error service will send you an email every time there is an exception in a job (Smarter ones like Honeybadger will send email on the 1st, 3rd and 10th identical error so your inbox won't be overwhelmed if 1000s of jobs are failing).
- Let Sidekiq catch errors raised by your jobs. Sidekiq's built-in retry mechanism will catch those exceptions and retry the jobs regularly. The error service will notify you of the exception. You fix the bug, deploy the fix and Sidekiq will retry your job successfully.
- If you don't fix the bug within 25 retries (about 21 days), Sidekiq will stop retrying and move your job to the Dead set. You can fix the bug and retry the job manually anytime within the next 6 months using the Web UI.
- After 6 months, Sidekiq will discard the job.
Gems can attach to Sidekiq's global error handlers so they will be informed any time there is an error inside Sidekiq. Error services should all provide integration automatically by including their gem within your application's Gemfile.
You can create your own error handler by providing something which responds to call(exception, context_hash)
:
Sidekiq.configure_server do |config|
config.error_handlers << proc {|ex,ctx_hash| MyErrorService.notify(ex, ctx_hash) }
end
Note that error handlers are only relevant to the Sidekiq server process. They aren't active in Rails console, for instance.
Enabling backtrace
logging for a job will cause the backtrace to be persisted throughout the lifetime of the job. Beware: backtraces can take 1-4k of memory in Redis each so large amounts of failing jobs can significantly increase your Redis memory usage.
sidekiq_options backtrace: true
You should use caution when enabling backtrace
by limiting it to a couple of lines, or use an error service to keep track of failures and associated backtraces.
sidekiq_options backtrace: 20 # top 20 lines
Sidekiq will retry failures with an exponential backoff using the formula (retry_count ** 4) + 15 + (rand(30) * (retry_count + 1))
(i.e. 15, 16, 31, 96, 271, ... seconds + a random amount of time). It will perform 25 retries over approximately 21 days. Assuming you deploy a bug fix within that time, the job will get retried and successfully processed. After 25 times, Sidekiq will move that job to the Dead Job queue, assuming that it will need manual intervention to work.
The maximum number of retries can be globally configured by adding the following to your sidekiq.yml
:
:max_retries: 1
This table contains approximate retry waiting times (click to expand).
# | Next retry backoff | Total waiting time ------------------------------------------- 1 | 0d 0h 0m 30s | 0d 0h 0m 30s 2 | 0d 0h 0m 46s | 0d 0h 1m 16s 3 | 0d 0h 1m 16s | 0d 0h 2m 32s 4 | 0d 0h 2m 36s | 0d 0h 5m 8s 5 | 0d 0h 5m 46s | 0d 0h 10m 54s 6 | 0d 0h 12m 10s | 0d 0h 23m 4s 7 | 0d 0h 23m 36s | 0d 0h 46m 40s 8 | 0d 0h 42m 16s | 0d 1h 28m 56s 9 | 0d 1h 10m 46s | 0d 2h 39m 42s 10 | 0d 1h 52m 6s | 0d 4h 31m 48s 11 | 0d 2h 49m 40s | 0d 7h 21m 28s 12 | 0d 4h 7m 16s | 0d 11h 28m 44s 13 | 0d 5h 49m 6s | 0d 17h 17m 50s 14 | 0d 7h 59m 46s | 1d 1h 17m 36s 15 | 0d 10h 44m 16s | 1d 12h 1m 52s 16 | 0d 14h 8m 0s | 2d 2h 9m 52s 17 | 0d 18h 16m 46s | 2d 20h 26m 38s 18 | 0d 23h 16m 46s | 3d 19h 43m 24s 19 | 1d 5h 14m 36s | 5d 0h 58m 0s 20 | 1d 12h 17m 16s | 6d 13h 15m 16s 21 | 1d 20h 32m 10s | 8d 9h 47m 26s 22 | 2d 6h 7m 6s | 10d 15h 54m 32s 23 | 2d 17h 10m 16s | 13d 9h 4m 48s 24 | 3d 5h 50m 16s | 16d 14h 55m 4s 25 | 3d 20h 16m 6s | 20d 11h 11m 10sHint: This table was calculated under the assumption that `rand(30)` always returns 15.
The Sidekiq Web UI has a "Retries" and "Dead" tab which lists failed jobs and allows you to run them, inspect them or delete them.
The Dead set is a holding pen for jobs which have failed all their retries. Sidekiq will not retry those jobs, you must manually retry them via the UI. The Dead set is limited by default to 10,000 jobs or 6 months so it doesn't grow infinitely. Only jobs configured with 0 or greater retries will go to the Dead set. Use retry: false
if you want a particular type of job to be executed only once, no matter what happens.
You can specify the number of retries for a particular worker if 25 is too many:
class LessRetryableWorker
include Sidekiq::Worker
sidekiq_options retry: 5 # Only five retries and then to the Dead Job Queue
def perform(...)
end
end
Configure job retries to use a lower priority queue so new jobs take precedence:
class LowPriorityRetryWorker
include Sidekiq::Worker
sidekiq_options queue: 'default', retry_queue: 'bulk' # send retries to the 'bulk' queue
def perform(...)
end
end
You can disable retry support for a particular worker.
class NonRetryableWorker
include Sidekiq::Worker
sidekiq_options retry: false # job will be discarded if it fails
def perform(...)
end
end
Skip retries, send a failed job straight to the Dead set:
class NonRetryableWorker
include Sidekiq::Worker
sidekiq_options retry: 0
def perform(...)
end
end
You can disable a job going to the Dead set:
class NoDeathWorker
include Sidekiq::Worker
sidekiq_options retry: 5, dead: false # will retry 5 times and then disappear
def perform(...)
end
end
The retry delay can be customized using sidekiq_retry_in
, if needed.
class WorkerWithCustomRetry
include Sidekiq::Worker
sidekiq_options retry: 5
# The current retry count and exception is yielded. The return value of the
# block must be an integer. It is used as the delay, in seconds. A return value
# of nil will use the default.
sidekiq_retry_in do |count, exception|
case exception
when SpecialException
10 * (count + 1) # (i.e. 10, 20, 30, 40, 50)
end
end
def perform(...)
end
end
After retrying so many times, Sidekiq will call the sidekiq_retries_exhausted
hook on your Worker if you've defined it. The hook receives the queued message as an argument. This hook is called right before Sidekiq moves the job to the Dead set.
class FailingWorker
include Sidekiq::Worker
sidekiq_retries_exhausted do |msg, ex|
Sidekiq.logger.warn "Failed #{msg['class']} with #{msg['args']}: #{msg['error_message']}"
end
def perform(*args)
raise "or I don't work"
end
end
The sidekiq_retries_exhausted
callback is specific to a Worker class. Starting in v5.1, Sidekiq can also fire a global callback when a job dies:
# this goes in your initializer
Sidekiq.configure_server do |config|
config.death_handlers << ->(job, ex) do
puts "Uh oh, #{job['class']} #{job["jid"]} just died with error #{ex.message}."
end
end
With this callback, you can email yourself, send a Slack message, etc so you know there is something wrong.
If the Sidekiq process segfaults or crashes the Ruby VM, any jobs that were being processed are lost. Sidekiq Pro offers a reliable queueing feature which does not lose those jobs.
Sidekiq's retry mechanism is a set of best practices but many people have suggested various knobs and options to tweak in order to handle their own edge case. This way lies madness. Design your code to work well with Sidekiq's retry mechanism as it exists today or patch the JobRetry class to add your own logic. I'm no longer accepting any functional changes to the retry mechanism unless you make an extremely compelling case for why Sidekiq's thousands of users would want that change.
Previous: Using Redis Next: Advanced Options
Home | The Basics | Best Practices | Using Redis | Error Handling | Advanced Options | Problems?
This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.