New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to restart Sidekiq pro on Heroku - Redis::CommandError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context #3886
Comments
What changed yesterday?
… On Jul 13, 2018, at 06:23, Hugo BARTHELEMY ***@***.***> wrote:
ruby 2.3.7p456
Sidekiq Pro 3.5.3
Redis 3.3.5
Hi all,
Since yesterday, we see a lot of errors appear in Sidekiq on one of our application on Heroku. Since this morning we can not restart Sidekiq. After each restart of the application, Sidekiq crashes with the trace below
Here is our configuration:
require "sidekiq/worker_killer"
SIDEKIQ_REDIS_CONFIG = begin
uri = URI.parse(APP_CONFIG["sidekiq_redis_url"])
if APP_CONFIG["redis_ssl_enabled"]
uri.scheme = "rediss"
uri.port = Integer(uri.port) + 1
end
{
url: uri.to_s,
driver: :ruby,
ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE }
}
end
# Use global logger & log level for Sidekiq
Sidekiq::Logging.logger = LogStrategy.logger
if Rails.env.test?
## Testing
require "sidekiq/testing"
Rails.logger.warn "SIDEKIQ configured in fake mode (test env)"
Sidekiq::Testing.fake!
# Sidekiq API does not support testing mode so
# we need a running Redis even in test mode
# see https://github.com/mperham/sidekiq/wiki/Testing#api
Sidekiq.configure_client { |config| config.redis = SIDEKIQ_REDIS_CONFIG }
Sidekiq.configure_server { |config| config.redis = SIDEKIQ_REDIS_CONFIG }
else
# Duration a job status is kept in Redis (countdown restarts when status is updated)
sidekiq_status_expiration = 15.minutes
## Server configuration
Sidekiq.configure_server do |config|
# Reliability
config.super_fetch!
config.reliable_scheduler!
# Redis connection config
config.redis = SIDEKIQ_REDIS_CONFIG
# Additional middleware(s)
config.server_middleware do |chain|
chain.add Sidekiq::Status::ServerMiddleware, expiration: sidekiq_status_expiration
# Enable query cache during jobs execution
chain.add Sidekiq::QueryCacheMiddleware
# Enable worker killer
chain.add Sidekiq::WorkerKiller, max_rss: APP_CONFIG["sidekiq_killer_max_rss"],
grace_time: APP_CONFIG["sidekiq_killer_grace_time"]
end
config.client_middleware do |chain|
chain.add Sidekiq::Status::ClientMiddleware, expiration: sidekiq_status_expiration
end
# Poll interval by each sidekiq process
config.average_scheduled_poll_interval = 5 # seconds
# Specific DB pool size matching max concurrency
# => Should at least match concurrency defined in sidekiq.yml
config = ActiveRecord::Base.configurations[Rails.env]
config['pool'] = Sidekiq.options[:concurrency] + 2
ActiveRecord::Base.establish_connection(config)
end
# This should not go in a Sidekiq.configure_{client,server} block.
Sidekiq::Client.reliable_push! unless Rails.env.test?
## Client configuration
Sidekiq.configure_client do |config|
# Redis connection config
config.redis = SIDEKIQ_REDIS_CONFIG
# Additional middleware(s)
config.client_middleware do |chain|
chain.add Sidekiq::Status::ClientMiddleware, expiration: sidekiq_status_expiration
end
end
## Override default options
Sidekiq.default_worker_options = {
queue: "api_default",
retry: 3
}
end
Here is our backtrace:
app[worker.1]: I, [2018-07-13T12:35:11.411473 #4] INFO -- : Schedules Loaded
app[worker.1]: I, [2018-07-13T12:35:11.414173 #4] INFO -- : Starting processing, hit Ctrl-C to stop
app[worker.1]: I, [2018-07-13T12:35:11.421439 #4] INFO -- : SuperFetch activated
app[worker.1]: W, [2018-07-13T12:35:11.448078 #4] WARN -- : SuperFetch: recovered 1 jobs
app[worker.1]: I, [2018-07-13T12:35:11.557614 #4] INFO -- : start
app[worker.1]: I, [2018-07-13T12:35:11.566388 #4] INFO -- : start
app[worker.1]: I, [2018-07-13T12:35:11.575462 #4] INFO -- : start
app[worker.1]: APNSNotifierWorker - starting job for device 9d1c6655-6ef4-4dca-a0c6-722956f621ed
app[worker.1]: APNSNotifierWorker - starting job for device cd42a290-c645-49d0-aedc-022c35bcfe51
app[worker.1]: APNSNotifierWorker - starting job for device 5d81312a-5380-4fc9-a4b8-146cc1806281
app[worker.1]: Socket was remotely closed
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/net-http2-0.14.1/lib/net-http2/client.rb:104:in `rescue in block (2 levels) in ensure_open'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/net-http2-0.14.1/lib/net-http2/client.rb:98:in `block (2 levels) in ensure_open'
app[worker.1]: ** [NewRelic][2018-07-13 12:35:12 +0000 worker.1 (4)] INFO : Starting Agent shutdown
app[worker.1]: W, [2018-07-13T12:35:12.239228 #4] WARN -- : {"context":"processor"}
app[worker.1]: W, [2018-07-13T12:35:12.239329 #4] WARN -- : Redis::CommandError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context
app[worker.1]: W, [2018-07-13T12:35:12.239443 #4] WARN -- : /app/vendor/bundle/ruby/2.3.0/bundler/gems/redis-rb-95229df721f7/lib/redis/client.rb:121:in `call'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/newrelic_rpm-3.17.0.325/lib/new_relic/agent/instrumentation/redis.rb:42:in `block in call'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/newrelic_rpm-3.17.0.325/lib/new_relic/agent/datastores.rb:111:in `wrap'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/newrelic_rpm-3.17.0.325/lib/new_relic/agent/instrumentation/redis.rb:41:in `call'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/bundler/gems/redis-rb-95229df721f7/lib/redis.rb:752:in `block in set'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/bundler/gems/redis-rb-95229df721f7/lib/redis.rb:58:in `block in synchronize'
app[worker.1]: /app/vendor/ruby-2.3.7/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/bundler/gems/redis-rb-95229df721f7/lib/redis.rb:58:in `synchronize'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/bundler/gems/redis-rb-95229df721f7/lib/redis.rb:750:in `set'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-pro-3.5.3/lib/sidekiq/pro/super_fetch.rb:60:in `block in orphan_check?'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq.rb:95:in `block in redis'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq.rb:92:in `redis'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-pro-3.5.3/lib/sidekiq/pro/super_fetch.rb:59:in `orphan_check?'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-pro-3.5.3/lib/sidekiq/pro/super_fetch.rb:20:in `initialize'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/processor.rb:38:in `new'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/processor.rb:38:in `initialize'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/manager.rb:95:in `new'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/manager.rb:95:in `block in processor_died'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/manager.rb:92:in `synchronize'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/manager.rb:92:in `processor_died'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/processor.rb:76:in `rescue in run'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/processor.rb:67:in `run'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/util.rb:17:in `watchdog'
app[worker.1]: /app/vendor/bundle/ruby/2.3.0/gems/sidekiq-4.2.10/lib/sidekiq/util.rb:26:in `block in safe_thread'
heroku[worker.1]: State changed from up to crashed
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Unfortunately, we have not changed anything |
One other customer has reported this but we were never able to determine a cause or fix.
… On Jul 13, 2018, at 06:55, Hugo BARTHELEMY ***@***.***> wrote:
Unfortunately, we have not changed anything
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
We are simply more attentive because it is our staging API and the mobile test its new version of application. But production works well. On the other hand we noticed that the error |
Hello, thanks a lot @mperham for your quick reply ! I found out that the
Not sure why but all the logs we have either in staging or production point to this cause. After upgrading net-http2 from version UPDATE I didn't manage to really reproduce the issue, even by going back to net-http2 One other interesting thing I found is that I don't have an :error callback configured for my net-http2 connections. Looking at the documentation, it says :
From my point of view, it could be the most probable cause for the problem. So I also added an handler like this to my project. I will try to keep you posted on the result :
|
I'm pretty sure that original issue that @hugobarthelemy is experiencing is this: ostinelli/apnotic#68 Basically For us it surfaced a bigger problem that I described in this issue: ostinelli/apnotic#69 The safe way would be to create a pool manually and make sure to assign an error handler (as @ccyrille pointed out): class Worker
POOL = ConnectionPool.new(size: 5) do
connection = Apnotic::Connection.new(...)
connection.on(:error) do |err|
Bugsnag.notify(ConnectionError.new(err.inspect))
end
connection
end
end We actually see a few dozen |
@mperham Is it possible that |
No, super_fetch will never purposefully delete or drop a job. Poison pill jobs will continue to accumulate until the bug is fixed and they can be executed successfully. |
I can confirm that bumping |
ruby 2.3.7p456
Sidekiq Pro 3.5.3
Redis 3.3.5
Hi all,
Since yesterday, we see a lot of errors appear in Sidekiq on one of our application on Heroku. Since this morning we can not restart Sidekiq. After each restart of the application, Sidekiq crashes with the trace below
Here is our configuration:
Here is our backtrace:
The text was updated successfully, but these errors were encountered: