Job getting lost in case of failed retry #4138

tycooon · 2019-04-03T16:19:22Z

Ruby 2.6.2 with Sidekiq Pro 4.0.5.

I inspected the code and it turns out that in case of exception in Sidekiq::JobRetry#attempt_retry we are still acknowledging the job here which leads to the job being removed from Redis. If we didn't ack the job it would eventually get restored by the SuperFetch as I understand.

Maybe it would be better not to ack the job in case of all unhandled exceptions in Processor#process?

The text was updated successfully, but these errors were encountered:

mperham · 2019-04-03T19:53:20Z

Can you supply the error and backtrace you are seeing?

…

On Apr 3, 2019, at 09:19, Yuri Smirnov ***@***.***> wrote: Ruby 2.6.2 with Sidekiq Pro 4.0.5. I inspected the code and it turns out that in case of exception in Sidekiq::JobRetry#attempt_retry we are still acknowledging the job here which leads to the job being removed from Redis. If we didn't ack the job it would eventually get restored by the SuperFetch as I understand. Maybe it would be better not to ack the job in case of all unhandled exceptions in Processor#process? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

tycooon · 2019-04-04T08:44:12Z

There actually were 2 errors:

Redis::CannotConnectError in JobRetry#local (Error connecting to Redis on 10.208.64.70:6379 (Errno::ECONNREFUSED))

<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:353:in `rescue in establish_connection'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:336:in `establish_connection'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:104:in `block in connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:299:in `with_reconnect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:103:in `connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:372:in `ensure_connected'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:224:in `block in process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:330:in `logging'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:223:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:193:in `call_pipelined'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:163:in `block in call_pipeline'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:299:in `with_reconnect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:161:in `call_pipeline'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:2454:in `block in multi'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `block in synchronize'
<rbenv>/versions/2.6.2/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:2446:in `multi'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:184:in `block in raw_push'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:183:in `raw_push'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-pro-4.0.5/lib/sidekiq/batch/client.rb:41:in `raw_push'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:74:in `push'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:132:in `push'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/queue_adapters/sidekiq_adapter.rb:30:in `enqueue_at'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/enqueuing.rb:51:in `block in enqueue'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:17:in `block (3 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `block in tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `block in tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:28:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:16:in `block (2 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:136:in `run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/enqueuing.rb:49:in `enqueue'
<app>/releases/20190329160821/server/config/initializers/sidekiq.rb:46:in `block in enqueue'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/database/transactions.rb:36:in `block in after_commit'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/connection_pool/sharded_threaded.rb:130:in `hold'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/database/connecting.rb:270:in `synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/database/transactions.rb:31:in `after_commit'
<app>/releases/20190329160821/server/config/initializers/sidekiq.rb:46:in `enqueue'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/configured_job.rb:15:in `perform_later'
<app>/releases/20190329160821/server/lib/sidekiq_recurrent_job.rb:25:in `schedule!'
<app>/releases/20190329160821/server/lib/sidekiq_recurrent_job.rb:32:in `perform'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:39:in `block in perform_now'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/rails/active_job.rb:18:in `capture_and_reraise_with_sentry'
<app>/shared/bundle/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/rails/active_job.rb:12:in `block (2 levels) in included'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/i18n-1.6.0/lib/i18n.rb:297:in `with_locale'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/translation.rb:9:in `block (2 levels) in <module:Translation>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:26:in `block (4 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/notifications.rb:168:in `block in instrument'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/notifications/instrumenter.rb:23:in `instrument'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/notifications.rb:168:in `instrument'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:25:in `block (3 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `block in tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `block in tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:28:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:22:in `block (2 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:136:in `run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:38:in `perform_now'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:24:in `block in execute'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/railtie.rb:28:in `block (4 levels) in <class:Railtie>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:83:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/railtie.rb:27:in `block (3 levels) in <class:Railtie>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:136:in `run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:22:in `execute'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/queue_adapters/sidekiq_adapter.rb:42:in `perform'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:185:in `execute_job'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:167:in `block (2 levels) in process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:128:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-pro-4.0.5/lib/sidekiq/batch/middleware.rb:56:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/sidekiq.rb:9:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian/sidekiq_raven_middleware.rb:9:in `block in call'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian.rb:49:in `block in run'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian/logger.rb:27:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian.rb:49:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian/sidekiq_raven_middleware.rb:7:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:133:in `invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:166:in `block in process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:137:in `block (6 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:108:in `local'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:43:in `block in call'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:42:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:243:in `stats'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_logger.rb:8:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:73:in `global'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:125:in `block in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:48:in `with_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:42:in `with_job_hash_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:124:in `dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:165:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:83:in `process_one'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:71:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:16:in `watchdog'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:25:in `block in safe_thread'

Redis::CommandError in JobRetry#global (LOADING Redis is loading the dataset in memory)

<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:124:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:107:in `block in connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:299:in `with_reconnect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:103:in `connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:372:in `ensure_connected'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:224:in `block in process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:330:in `logging'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:223:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:123:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1576:in `block in zadd'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `block in synchronize'
<rbenv>/versions/2.6.2/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1570:in `zadd'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:176:in `block in attempt_retry'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq.rb:97:in `block in redis'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq.rb:94:in `redis'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:175:in `attempt_retry'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:123:in `rescue in local'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:107:in `local'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:43:in `block in call'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:42:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:243:in `stats'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_logger.rb:8:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:73:in `global'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:125:in `block in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:48:in `with_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:42:in `with_job_hash_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:124:in `dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:165:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:83:in `process_one'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:71:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:16:in `watchdog'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:25:in `block in safe_thread'

Then it tried to ack the job (perform LREM) and it succeeded since Redis was already started up. I know that probably in this particular case the reliable_push feature could have helped, but we are not using it in our app because in other cases it's much better to get an error than to lose a job because of Redis config being wrong for some reason.

mperham · 2019-04-05T00:39:09Z

The fetch/ack process is distinct from the retry process so if Redis or the network blips at the right moment, the behavior can be unpredictable. The right solution would be to make those two actions one atomic action; that would require a major architectural overhaul.

One possibly easy patch would be to retry the retry process N times if it fails with a RedisError; that would narrow the window further and might handle your situation where Redis disappears for a few seconds.

tycooon · 2019-04-05T08:40:20Z

OK, and maybe you can clarify why we should ack the job in case of unhandled exceptions in process?

mperham · 2019-04-05T17:23:47Z

How do we distinguish between handled and unhandled?

Once we get to dispatch and inside the retry block, it's assumed that any errors are handled by the retry layer. It will push a copy of the job into the retry set; we need to acknowledge the existing job so we don't create a duplicate.

tycooon · 2019-04-05T20:21:46Z

But as far as I can see, exceptions that were handled by the retry block are not reraised into dispatch? Or am I wrong?

mperham · 2019-04-05T21:06:04Z

Yes, they are reraised here. Once we pass this point it is assumed the retry block inside dispatch handles exceptions. Loss of Redis means we won't create a retry but it also means we shouldn't successfully ack either. Failing the retry operation but succeeded in the ack is what leads to this bad behavior. That kind of "blip" in Redis should be exceedingly rare.

tycooon · 2019-04-05T21:32:20Z

Oh, now I see, thanks. Maybe we can wrap all handled exceptions in similar way as we do in local block using Skip wrapper?

Under just the right conditions, we could lose a job: - Job raises an error - Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception - Processor catches Redis exception and thinks a retry was created - Redis comes back online just in time for the job to be acknowledged and lost That's a very specific and rare set of steps but it can happen. Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases: 1. Job is successful: job is acknowledged. 2. Job fails, retry is created, Processor rescues specific error: job is acknowledged. 3. Sidekiq::Shutdown is raised: job is not acknowledged Now there is another case: 4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.

mperham · 2019-04-05T23:04:14Z

Moving to #4141

(#4141) Under just the right conditions, we could lose a job: - Job raises an error - Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception - Processor catches Redis exception and thinks a retry was created - Redis comes back online just in time for the job to be acknowledged and lost That's a very specific and rare set of steps but it can happen. Instead have the Retry subsystem raise a specific error signaling that it created a retry. There will be three common cases: 1. Job is successful: job is acknowledged. 2. Job fails, retry is created, Processor rescues specific error: job is acknowledged. 3. Sidekiq::Shutdown is raised: job is not acknowledged Now there is another case: 4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.

mperham closed this as completed Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job getting lost in case of failed retry #4138

Job getting lost in case of failed retry #4138

tycooon commented Apr 3, 2019

mperham commented Apr 3, 2019 via email

tycooon commented Apr 4, 2019 •

edited

mperham commented Apr 5, 2019 •

edited

tycooon commented Apr 5, 2019

mperham commented Apr 5, 2019

tycooon commented Apr 5, 2019

mperham commented Apr 5, 2019

tycooon commented Apr 5, 2019

mperham commented Apr 5, 2019

Job getting lost in case of failed retry #4138

Job getting lost in case of failed retry #4138

Comments

tycooon commented Apr 3, 2019

mperham commented Apr 3, 2019 via email

tycooon commented Apr 4, 2019 • edited

mperham commented Apr 5, 2019 • edited

tycooon commented Apr 5, 2019

mperham commented Apr 5, 2019

tycooon commented Apr 5, 2019

mperham commented Apr 5, 2019

tycooon commented Apr 5, 2019

mperham commented Apr 5, 2019

tycooon commented Apr 4, 2019 •

edited

mperham commented Apr 5, 2019 •

edited