Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job getting lost in case of failed retry #4138

Closed
tycooon opened this issue Apr 3, 2019 · 9 comments
Closed

Job getting lost in case of failed retry #4138

tycooon opened this issue Apr 3, 2019 · 9 comments

Comments

@tycooon
Copy link

tycooon commented Apr 3, 2019

Ruby 2.6.2 with Sidekiq Pro 4.0.5.

I inspected the code and it turns out that in case of exception in Sidekiq::JobRetry#attempt_retry we are still acknowledging the job here which leads to the job being removed from Redis. If we didn't ack the job it would eventually get restored by the SuperFetch as I understand.

Maybe it would be better not to ack the job in case of all unhandled exceptions in Processor#process?

@mperham
Copy link
Collaborator

mperham commented Apr 3, 2019 via email

@tycooon
Copy link
Author

tycooon commented Apr 4, 2019

There actually were 2 errors:

  1. Redis::CannotConnectError in JobRetry#local (Error connecting to Redis on 10.208.64.70:6379 (Errno::ECONNREFUSED))
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:353:in `rescue in establish_connection'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:336:in `establish_connection'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:104:in `block in connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:299:in `with_reconnect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:103:in `connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:372:in `ensure_connected'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:224:in `block in process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:330:in `logging'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:223:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:193:in `call_pipelined'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:163:in `block in call_pipeline'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:299:in `with_reconnect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:161:in `call_pipeline'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:2454:in `block in multi'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `block in synchronize'
<rbenv>/versions/2.6.2/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:2446:in `multi'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:184:in `block in raw_push'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:183:in `raw_push'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-pro-4.0.5/lib/sidekiq/batch/client.rb:41:in `raw_push'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:74:in `push'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/client.rb:132:in `push'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/queue_adapters/sidekiq_adapter.rb:30:in `enqueue_at'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/enqueuing.rb:51:in `block in enqueue'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:17:in `block (3 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `block in tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `block in tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:28:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:16:in `block (2 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:136:in `run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/enqueuing.rb:49:in `enqueue'
<app>/releases/20190329160821/server/config/initializers/sidekiq.rb:46:in `block in enqueue'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/database/transactions.rb:36:in `block in after_commit'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/connection_pool/sharded_threaded.rb:130:in `hold'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/database/connecting.rb:270:in `synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/sequel-5.18.0/lib/sequel/database/transactions.rb:31:in `after_commit'
<app>/releases/20190329160821/server/config/initializers/sidekiq.rb:46:in `enqueue'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/configured_job.rb:15:in `perform_later'
<app>/releases/20190329160821/server/lib/sidekiq_recurrent_job.rb:25:in `schedule!'
<app>/releases/20190329160821/server/lib/sidekiq_recurrent_job.rb:32:in `perform'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:39:in `block in perform_now'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/rails/active_job.rb:18:in `capture_and_reraise_with_sentry'
<app>/shared/bundle/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/rails/active_job.rb:12:in `block (2 levels) in included'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/i18n-1.6.0/lib/i18n.rb:297:in `with_locale'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/translation.rb:9:in `block (2 levels) in <module:Translation>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:26:in `block (4 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/notifications.rb:168:in `block in instrument'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/notifications/instrumenter.rb:23:in `instrument'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/notifications.rb:168:in `instrument'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:25:in `block (3 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `block in tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `block in tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:28:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/tagged_logging.rb:71:in `tagged'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:46:in `tag_logger'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/logging.rb:22:in `block (2 levels) in <module:Logging>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:136:in `run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:38:in `perform_now'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:24:in `block in execute'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/railtie.rb:28:in `block (4 levels) in <class:Railtie>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:83:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/railtie.rb:27:in `block (3 levels) in <class:Railtie>'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `instance_exec'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/callbacks.rb:136:in `run_callbacks'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/execution.rb:22:in `execute'
<app>/shared/bundle/ruby/2.6.0/gems/activejob-5.2.3/lib/active_job/queue_adapters/sidekiq_adapter.rb:42:in `perform'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:185:in `execute_job'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:167:in `block (2 levels) in process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:128:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-pro-4.0.5/lib/sidekiq/batch/middleware.rb:56:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sentry-raven-2.9.0/lib/raven/integrations/sidekiq.rb:9:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian/sidekiq_raven_middleware.rb:9:in `block in call'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian.rb:49:in `block in run'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian/logger.rb:27:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian.rb:49:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/lamian-1.2.0/lib/lamian/sidekiq_raven_middleware.rb:7:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/middleware/chain.rb:133:in `invoke'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:166:in `block in process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:137:in `block (6 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:108:in `local'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:43:in `block in call'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:42:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:243:in `stats'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_logger.rb:8:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:73:in `global'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:125:in `block in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:48:in `with_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:42:in `with_job_hash_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:124:in `dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:165:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:83:in `process_one'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:71:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:16:in `watchdog'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:25:in `block in safe_thread'
  1. Redis::CommandError in JobRetry#global (LOADING Redis is loading the dataset in memory)
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:124:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:107:in `block in connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:299:in `with_reconnect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:103:in `connect'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:372:in `ensure_connected'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:224:in `block in process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:330:in `logging'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:223:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:123:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1576:in `block in zadd'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `block in synchronize'
<rbenv>/versions/2.6.2/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `synchronize'
<app>/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1570:in `zadd'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:176:in `block in attempt_retry'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq.rb:97:in `block in redis'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
<app>/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq.rb:94:in `redis'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:175:in `attempt_retry'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:123:in `rescue in local'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:107:in `local'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:43:in `block in call'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/rails.rb:42:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:243:in `stats'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_logger.rb:8:in `call'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/job_retry.rb:73:in `global'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:125:in `block in dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:48:in `with_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/logging.rb:42:in `with_job_hash_context'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:124:in `dispatch'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:165:in `process'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:83:in `process_one'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/processor.rb:71:in `run'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:16:in `watchdog'
<app>/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.5/lib/sidekiq/util.rb:25:in `block in safe_thread'

Then it tried to ack the job (perform LREM) and it succeeded since Redis was already started up. I know that probably in this particular case the reliable_push feature could have helped, but we are not using it in our app because in other cases it's much better to get an error than to lose a job because of Redis config being wrong for some reason.

@mperham
Copy link
Collaborator

mperham commented Apr 5, 2019

The fetch/ack process is distinct from the retry process so if Redis or the network blips at the right moment, the behavior can be unpredictable. The right solution would be to make those two actions one atomic action; that would require a major architectural overhaul.

One possibly easy patch would be to retry the retry process N times if it fails with a RedisError; that would narrow the window further and might handle your situation where Redis disappears for a few seconds.

@tycooon
Copy link
Author

tycooon commented Apr 5, 2019

OK, and maybe you can clarify why we should ack the job in case of unhandled exceptions in process?

@mperham
Copy link
Collaborator

mperham commented Apr 5, 2019

How do we distinguish between handled and unhandled?

Once we get to dispatch and inside the retry block, it's assumed that any errors are handled by the retry layer. It will push a copy of the job into the retry set; we need to acknowledge the existing job so we don't create a duplicate.

@tycooon
Copy link
Author

tycooon commented Apr 5, 2019

But as far as I can see, exceptions that were handled by the retry block are not reraised into dispatch? Or am I wrong?

@mperham
Copy link
Collaborator

mperham commented Apr 5, 2019

Yes, they are reraised here. Once we pass this point it is assumed the retry block inside dispatch handles exceptions. Loss of Redis means we won't create a retry but it also means we shouldn't successfully ack either. Failing the retry operation but succeeded in the ack is what leads to this bad behavior. That kind of "blip" in Redis should be exceedingly rare.

@tycooon
Copy link
Author

tycooon commented Apr 5, 2019

Oh, now I see, thanks. Maybe we can wrap all handled exceptions in similar way as we do in local block using Skip wrapper?

mperham added a commit that referenced this issue Apr 5, 2019


Under just the right conditions, we could lose a job:

- Job raises an error
- Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception
- Processor catches Redis exception and thinks a retry was created
- Redis comes back online just in time for the job to be acknowledged and lost

That's a very specific and rare set of steps but it can happen.

Instead have the Retry subsystem raise a specific error signaling that it created a retry.  There will be three common cases:

1. Job is successful: job is acknowledged.
2. Job fails, retry is created, Processor rescues specific error: job is acknowledged.
3. Sidekiq::Shutdown is raised: job is not acknowledged

Now there is another case:

4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.
@mperham
Copy link
Collaborator

mperham commented Apr 5, 2019

Moving to #4141

@mperham mperham closed this as completed Apr 5, 2019
mperham added a commit that referenced this issue Apr 12, 2019
 (#4141)

Under just the right conditions, we could lose a job:

- Job raises an error
- Retry subsystem catches error and tries to create a retry in Redis but this raises a "Redis down" exception
- Processor catches Redis exception and thinks a retry was created
- Redis comes back online just in time for the job to be acknowledged and lost

That's a very specific and rare set of steps but it can happen.

Instead have the Retry subsystem raise a specific error signaling that it created a retry.  There will be three common cases:

1. Job is successful: job is acknowledged.
2. Job fails, retry is created, Processor rescues specific error: job is acknowledged.
3. Sidekiq::Shutdown is raised: job is not acknowledged

Now there is another case:

4. Job fails, retry fails, Processor rescues Exception: job is NOT acknowledged. Sidekiq Pro's super_fetch will rescue the orphaned job at some point in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants