Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ent] Occasional Redis::CommandError in concurrent limiter #3854

Closed
krasnoukhov opened this issue May 31, 2018 · 15 comments
Closed

[Ent] Occasional Redis::CommandError in concurrent limiter #3854

krasnoukhov opened this issue May 31, 2018 · 15 comments

Comments

@krasnoukhov
Copy link
Contributor

Ruby version: 2.5.1
Sidekiq / Pro / Enterprise version(s): 5.1.3 / 4.0.3 / 1.7.1

ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

This is a message of exception that we're getting sometimes, seems like it's coming from limiter's scripting code.

Here's how we use the limiter:

limiter = Sidekiq::Limiter.concurrent("whatever", 1, wait_timeout: 0, lock_timeout: 300, ttl: 3600)
limiter.within_limit { yield }
Backtrace
gems/redis-3.3.5/lib/redis/client.rb:121:in `call': ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context (Redis::CommandError)
from gems/newrelic_rpm-4.6.0.338/lib/new_relic/agent/instrumentation/redis.rb:69:in `call'
from gems/ddtrace-0.11.4/lib/ddtrace/contrib/redis/patcher.rb:73:in `block in call'
from gems/ddtrace-0.11.4/lib/ddtrace/tracer.rb:272:in `trace'
from gems/ddtrace-0.11.4/lib/ddtrace/contrib/redis/patcher.rb:67:in `call'
from gems/redis-3.3.5/lib/redis.rb:2399:in `block in _eval'
from gems/redis-3.3.5/lib/redis.rb:58:in `block in synchronize'
from /home/mongrel/.rbenv/versions/2.5.1/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
from gems/redis-3.3.5/lib/redis.rb:58:in `synchronize'
from gems/redis-3.3.5/lib/redis.rb:2398:in `_eval'
from gems/redis-3.3.5/lib/redis.rb:2450:in `evalsha'
from gems/sidekiq-ent-1.7.1/lib/sidekiq-ent/scripting.rb:23:in `block in call'
from gems/connection_pool-2.2.1/lib/connection_pool.rb:64:in `block (2 levels) in with'
from gems/connection_pool-2.2.1/lib/connection_pool.rb:63:in `handle_interrupt'
from gems/connection_pool-2.2.1/lib/connection_pool.rb:63:in `block in with'
from gems/connection_pool-2.2.1/lib/connection_pool.rb:60:in `handle_interrupt'
from gems/connection_pool-2.2.1/lib/connection_pool.rb:60:in `with'
from gems/sidekiq-ent-1.7.1/lib/sidekiq-ent/scripting.rb:22:in `call'
from gems/sidekiq-ent-1.7.1/lib/sidekiq-ent/limiter/concurrent.rb:145:in `unlock'
from gems/sidekiq-ent-1.7.1/lib/sidekiq-ent/limiter/concurrent.rb:68:in `ensure in within_limit'
from gems/sidekiq-ent-1.7.1/lib/sidekiq-ent/limiter/concurrent.rb:68:in `within_limit'

I noticed that sidekiq-ent@1.7.1 has something about lua errors, any chance this is related? Thanks!

@mperham
Copy link
Collaborator

mperham commented May 31, 2018

That kind of error would happen if your code manually checked out a Redis connection from the connection pool to use for pubsub. Do you have any app code doing pubsub? If so, show me the subscriber code.

@krasnoukhov
Copy link
Contributor Author

No, we don't really use pubsub. Anything else you can think of that can be related?

@mperham
Copy link
Collaborator

mperham commented May 31, 2018

I can't see how that can happen normally. Can you show me your initializer?

@krasnoukhov
Copy link
Contributor Author

Can I email it to you?

@mperham
Copy link
Collaborator

mperham commented May 31, 2018

Of course, support at contribsys.com

@mperham
Copy link
Collaborator

mperham commented May 31, 2018

I don't see any smoking guns in the initializer either. It's a little more complicated than usual but it's not doing anything abnormal with Redis.

@krasnoukhov
Copy link
Contributor Author

Ok, it's not critical for us, happened just couple times in total. Let me know if you have any ideas how we can help debug this

@krasnoukhov
Copy link
Contributor Author

krasnoukhov commented Jun 29, 2018

I just noticed something in a logs that might be related. So I had these 2 workers that I stopped manually and here's what I saw in a log:

Log
^C2018-06-29T12:28:23.138Z 6878 TID-ox2oef08q INFO: Shutting down
2018-06-29T12:28:23.138Z 6878 TID-ox2oef08q INFO: Terminating quiet workers
2018-06-29T12:28:23.153Z 6878 TID-ox2k9wwh6 INFO: Scheduler exiting...
2018-06-29T12:28:23.212Z 6873 TID-ox2oef08t INFO: Shutting down
2018-06-29T12:28:23.212Z 6873 TID-ox2oef08t INFO: Terminating quiet workers
2018-06-29T12:28:23.222Z 6873 TID-ox2kajtq5 INFO: Scheduler exiting...
2018-06-29T12:28:23.289Z 6878 TID-ox2oef08q INFO: Pausing to allow workers to finish...
2018-06-29T12:28:23.417Z 6873 TID-ox2oef08t INFO: Pausing to allow workers to finish...
2018-06-29T12:28:23.723Z 6873 TID-ox2dqtmz1 Sync::Eligible::PayerEnrollmentsJob::Processor JID-66853d896da52376e371a31c BID-VzEYSYz81sjefw INFO: done: 1.264 sec
2018-06-29T12:28:24.113Z 6873 TID-ox2kaiamd Tracking::Sync::UserJob JID-5fbc5dfb94fadec336193b84 BID-HDCzEv4Wa0HXYg INFO: done: 1.585 sec
2018-06-29T12:28:24.284Z 6873 TID-ox2kaibw1 Tracking::Sync::UserJob JID-766e14ab0452e7bee4a8742b BID-HDCzEv4Wa0HXYg INFO: done: 1.821 sec
2018-06-29T12:28:24.422Z 6873 TID-ox2kajsn1 Tracking::Sync::UserJob JID-277b9b1fbfee950db3aecffd BID-HDCzEv4Wa0HXYg INFO: done: 1.958 sec
2018-06-29T12:28:24.621Z 6878 TID-ox2heg7ti Sync::Eligible::PaymentStatusesJob::Processor JID-213b2c6f8f53c811d2cf239d BID-PCZE36SWpdrCPQ INFO: done: 5.319 sec
2018-06-29T12:28:24.626Z 6873 TID-ox2kajsft Tracking::Sync::UserJob JID-f071ac352b5504395132b946 BID-HDCzEv4Wa0HXYg INFO: done: 2.146 sec
2018-06-29T12:28:24.633Z 6873 TID-ox2bvpyvh Tracking::Sync::UserJob JID-27dfcace222ad23cb2646efc BID-HDCzEv4Wa0HXYg INFO: done: 2.134 sec
2018-06-29T12:28:24.640Z 6873 TID-ox2kajum5 Tracking::Sync::UserJob JID-ce014e2197cc5f8ae89bf650 BID-HDCzEv4Wa0HXYg INFO: done: 2.132 sec
2018-06-29T12:28:31.115Z 6878 TID-ox2oef08q WARN: Terminating 4 busy worker threads
2018-06-29T12:28:31.115Z 6878 TID-ox2oef08q WARN: Work still in progress [#<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"GenerateExportJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"GenerateExportJob\",\"job_id\":\"222968f7-5196-4d61-b8b3-1f0390872bce\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[283553],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"16763050aa37f4be93283d99\",\"created_at\":1530269309.26067,\"enqueued_at\":1530273372.9312081,\"overrated\":5}", local_queue="queue:sq|phx-a01:6878:82284a5c5393|default">, #<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"GenerateExportJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"GenerateExportJob\",\"job_id\":\"048a3a94-0fe1-4ef8-9843-852859fea528\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[235092],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"e07f2f7511e40af1501766a1\",\"created_at\":1530269309.20448,\"enqueued_at\":1530274483.6597302,\"error_message\":\"ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context\",\"error_class\":\"Redis::CommandError\",\"failed_at\":1530274461.20487,\"retry_count\":0}", local_queue="queue:sq|phx-a01:6878:82284a5c5393|default">, #<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"Tracking::PaymentsAnalyticsJob\",\"args\":[\"2018-06-29T05:25:00-07:00\"],\"retry\":3,\"queue\":\"default\",\"jid\":\"dd08afd38d04eee49d98cc57\",\"created_at\":1530275100.0115,\"enqueued_at\":1530275258.4291391,\"error_message\":\"execution expired\",\"error_class\":\"Google::Apis::TransmissionError\",\"failed_at\":1530275120.74721,\"retry_count\":1,\"retried_at\":1530275196.91254}", local_queue="queue:sq|phx-a01:6878:82284a5c5393|default">, #<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"GenerateExportJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"GenerateExportJob\",\"job_id\":\"709ed151-c690-4506-909d-0b3c1b579f53\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[271856],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"163f7480eb4ed9b9104b8e89\",\"created_at\":1530269309.24508,\"enqueued_at\":1530274407.1796002,\"error_message\":\"Mysql2::Error: This connection is still waiting for a result, try again once you have the result: ROLLBACK\",\"error_class\":\"ActiveRecord::StatementInvalid\",\"failed_at\":1530274374.04449,\"retry_count\":0}", local_queue="queue:sq|phx-a01:6878:82284a5c5393|default">]
2018-06-29T12:28:31.117Z 6878 TID-ox2oef08q INFO: SuperFetch: Moving job from queue:sq|phx-a01:6878:82284a5c5393|default back to queue:default
2018-06-29T12:28:31.124Z 6878 TID-ox2oef08q INFO: SuperFetch: Moving job from queue:sq|phx-a01:6878:82284a5c5393|default back to queue:default
2018-06-29T12:28:31.126Z 6878 TID-ox2oef08q INFO: SuperFetch: Moving job from queue:sq|phx-a01:6878:82284a5c5393|default back to queue:default
2018-06-29T12:28:31.129Z 6878 TID-ox2oef08q INFO: SuperFetch: Moving job from queue:sq|phx-a01:6878:82284a5c5393|default back to queue:default
2018-06-29T12:28:31.163Z 6873 TID-ox2oef08t WARN: Terminating 3 busy worker threads
2018-06-29T12:28:31.163Z 6873 TID-ox2oef08t WARN: Work still in progress [#<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:bulk", job="{\"class\":\"Sync::Eligible::PaymentStatusesJob::Processor\",\"args\":[23917324],\"retry\":3,\"queue\":\"bulk\",\"jid\":\"fea27aefcbb9367ad9175176\",\"created_at\":1530249663.43369,\"bid\":\"PCZE36SWpdrCPQ\",\"enqueued_at\":1530275302.390151}", local_queue="queue:sq|phx-a01:6873:2b2d0130ab06|bulk">, #<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"GenerateExportJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"GenerateExportJob\",\"job_id\":\"4052c4d7-a2b8-4718-b9de-71b677b9843d\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[120201],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"44431f4413f686c49e302e50\",\"created_at\":1530269309.17516,\"enqueued_at\":1530274406.4322174,\"error_message\":\"ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context\",\"error_class\":\"Redis::CommandError\",\"failed_at\":1530274373.29925,\"retry_count\":0}", local_queue="queue:sq|phx-a01:6873:2b2d0130ab06|default">, #<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"GenerateExportJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"GenerateExportJob\",\"job_id\":\"9c623662-40cf-444e-9250-8b70b8f5c748\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[223752],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"ca627a04f85fec1f13d45020\",\"created_at\":1530269309.20067,\"enqueued_at\":1530274942.529558,\"error_message\":\"EMAIL must have a value\",\"error_class\":\"Vpim::InvalidEncodingError\",\"failed_at\":1530273026.00873,\"retry_count\":1,\"retried_at\":1530274870.82409}", local_queue="queue:sq|phx-a01:6873:2b2d0130ab06|default">]
2018-06-29T12:28:31.166Z 6873 TID-ox2oef08t INFO: SuperFetch: Moving job from queue:sq|phx-a01:6873:2b2d0130ab06|default back to queue:default
2018-06-29T12:28:31.168Z 6873 TID-ox2oef08t INFO: SuperFetch: Moving job from queue:sq|phx-a01:6873:2b2d0130ab06|default back to queue:default
2018-06-29T12:28:31.177Z 6873 TID-ox2oef08t INFO: SuperFetch: Moving job from queue:sq|phx-a01:6873:2b2d0130ab06|bulk back to queue:bulk
2018-06-29T12:28:31.339Z 6878 TID-ox2oef08q INFO: Bye!
2018-06-29T12:28:31.346Z 6873 TID-ox2kajssh Sync::Eligible::PaymentStatusesJob::Processor JID-fea27aefcbb9367ad9175176 BID-PCZE36SWpdrCPQ INFO: done: 8.857 sec
2018-06-29T12:28:31.348Z 6873 TID-ox2kajssh ERROR: Unable to remove job from private queue queue:sq|phx-a01:6873:2b2d0130ab06|bulk: 0
2018-06-29T12:28:31.349Z 6873 TID-ox2oef08t INFO: Bye!
2018-06-29T12:28:31.417Z 6873 TID-ox2kaid3h GenerateExportJob JID-ca627a04f85fec1f13d45020 INFO: fail: 368.882 sec
2018-06-29T12:28:31.553Z 6873 TID-ox2kaibhl GenerateExportJob JID-44431f4413f686c49e302e50 INFO: fail: 905.107 sec
2018-06-29T12:28:31.762Z 6878 TID-ox2k9wymu Tracking::PaymentsAnalyticsJob JID-dd08afd38d04eee49d98cc57 INFO: fail: 53.332 sec
2018-06-29T12:28:31.857Z 6878 TID-ox2k9wx9u GenerateExportJob JID-16763050aa37f4be93283d99 INFO: fail: 1938.906 sec
2018-06-29T12:28:31.859Z 6878 TID-ox2k9vo5u GenerateExportJob JID-163f7480eb4ed9b9104b8e89 INFO: fail: 904.661 sec
2018-06-29T12:28:31.932Z 6878 TID-ox2k9wxk2 ERROR: Unable to remove job from private queue queue:sq|phx-a01:6878:82284a5c5393|default: 0
$

GenerateExportJob is using a concurrent limiter. I'm also curious what "ERROR: Unable to remove job from private queue" means?

@mperham
Copy link
Collaborator

mperham commented Jun 29, 2018

Those two errors are actually normal in that flow: those jobs were already pushed back to default so they "disappeared" from the private queue and the job couldn't remove itself from Redis upon completion -- that's a warning that that job will be executed twice.

2018-06-29T12:28:31.346Z 6873 TID-ox2kajssh Sync::Eligible::PaymentStatusesJob::Processor JID-fea27aefcbb9367ad9175176 BID-PCZE36SWpdrCPQ INFO: done: 8.857 sec
2018-06-29T12:28:31.348Z 6873 TID-ox2kajssh ERROR: Unable to remove job from private queue queue:sq|phx-a01:6873:2b2d0130ab06|bulk: 0

@krasnoukhov
Copy link
Contributor Author

Ok, good to know. The issue with ERR only (P)SUBSCRIBE ... is still puzzling though

@mperham
Copy link
Collaborator

mperham commented Jun 29, 2018

Yeah, still no idea what's causing that.

@krasnoukhov
Copy link
Contributor Author

Is sidekiq using pub/sub internally? I also had this idea, what if I try to use separate redis pool for limiters, do you think it might help? Thanks

@mperham
Copy link
Collaborator

mperham commented Jul 3, 2018 via email

@mperham
Copy link
Collaborator

mperham commented Jul 14, 2018

See #3886

@krasnoukhov
Copy link
Contributor Author

Yup, that's it. I just discovered the same thing this Thursday, but wanted to make sure that was the case first before following up. I'll close this as a duplicate and post my finding on #3886 since it has more useful info in there already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants