Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locks can't be found in UI? #735

Open
JeremiahChurch opened this issue Sep 28, 2022 · 6 comments
Open

Locks can't be found in UI? #735

JeremiahChurch opened this issue Sep 28, 2022 · 6 comments

Comments

@JeremiahChurch
Copy link
Contributor

Describe the bug

  1. Attempt to queue a job
  2. Receive an error that the job can't be enqueued because an existing lock exists.
  3. search web UI for cited lock - Locks, Expiring Locks, & Changelogs - nothin'

Attempting to run down some unexpected lock behavior. Turn logging up. Plenty of locks are shown in the UI component. But I guess not all of them?

Expected behavior
Expect to be able to find locks in UI

Current behavior
Locks can't be found in UI

Worker class

class ActivityItems::ItemCreateJob
  include Sidekiq::Worker

  sidekiq_options retry: 3, queue: :activity_feed, lock: :until_and_while_executing, on_conflict: :log

  def perform
    Item.create_job
  end
end
> ActivityItems::ItemCreateJob.perform_async
2022-09-28T20:44:03.851Z pid=1295 tid=8wf uniquejobs=client until_and_while_executing=uniquejobs:1d6a8b849137793a8368e6513b58f436 INFO: Skipping job with id (26bbc9f025e624bd48a1aaad) because lock_digest: (uniquejobs:1d6a8b849137793a8368e6513b58f436) already exists
=> nil

note unique job key uniquejobs:1d6a8b849137793a8368e6513b58f436

image

image

Additional context

Gemfile.lock

sidekiq (6.5.7)
  connection_pool (>= 2.2.5)
  rack (~> 2.0)
  redis (>= 4.5.0, < 5)
sidekiq-scheduler (4.0.2)
  redis (>= 4.2.0)
  rufus-scheduler (~> 3.2)
  sidekiq (>= 4)
  tilt (>= 1.4.0)
sidekiq-unique-jobs (7.1.27)

Sidekiq/unique jobs config

Sidekiq.default_job_options = { 'backtrace' => true }

SidekiqUniqueJobs.configure do |config|
  config.lock_info = true # extra key but gives us logging history
end

Sidekiq.configure_server do |config|
  config.client_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Client
  end

  config.server_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Server
  end

  SidekiqUniqueJobs::Server.configure(config)
end

Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Client
  end
end

with default wildcard search other locks are present
image

Big love for the gem ❤️ I've been using it across projects for many years now!

@JeremiahChurch
Copy link
Contributor Author

JeremiahChurch commented Sep 30, 2022

further debugging: "Delete All" on the web UI - locks are shown as cleared out on the UI. However the job still reports that the same key already exists.

> SidekiqUniqueJobs::Digests.new.entries
{"uniquejobs:f4f7dbd00db80a837590354401621908"=>1664566736.618991,
 "uniquejobs:e5a9ea4f8e153a35f6ded5f23fb1e527:RUN"=>1664566738.4878461,
 "uniquejobs:9a10d00cc95305531be30f13b956005c:RUN"=>1664566738.8416214,
 "uniquejobs:2b19449a7adaa1461106a72cabe12a6f:RUN"=>1664566738.8636317,
 "uniquejobs:51cf0c68620db7b16d433b178e39d7e1:RUN"=>1664566739.4959993}

no match in the entries

however directly deleting by digest does succeed and releases the lock

> SidekiqUniqueJobs::Digests.new.delete_by_digest 'uniquejobs:1d6a8b849137793a8368e6513b58f436'
2022-09-30T19:41:59.911Z pid=16756 tid=ktw INFO: delete_by_digest(uniquejobs:1d6a8b849137793a8368e6513b58f436) completed in 1ms
=> 2
> ActivityItems::ItemCreateJob.perform_async
=> "324a590dbf84f5315c32cb09"

However then jobs can just be queued repeatedly even with until_and_while_executing. This seems the same as #734?

> ActivityItems::ItemCreateJob.perform_async
=> "4200cb3e8be0a9c5ba32ff7f"
> ActivityItems::ItemCreateJob.perform_async
=> "630c1eba5d46b279ba232a20"
> ActivityItems::ItemCreateJob.perform_async
=> "52389da6ff7d8daf89352f8a"
> ActivityItems::ItemCreateJob.perform_async
=> "1eda77d1672570eedada7b8f"

@lemboywp
Copy link

We had the exact same problem today.

sidekiq (5.2.9)
sidekiq-unique-jobs (7.1.27)

@pdkproitf
Copy link

pdkproitf commented Nov 1, 2022

Same problem here:

sidekiq (6.4.1)
sidekiq-unique-jobs (7.1.7)

I'm also unable to find the digest in the locked list SidekiqUniqueJobs::Digests.new.entries
Here is where it check job_id by digest: CLICK FOR LINK:

  • The previous digest still exists which leads to prev_job_id being different from new_job_id:
  • Example Digest: uniquejobs:4cd6c68aa2b68f076fb793b18fd49f1e.

Screenshot_2022-11-02_at_3_24_06_PM

irb(main):419:0> Redis.current.get('uniquejobs:4cd6c68aa2b68f076fb793b18fd49f1e')
=> "5837f8f8bdd9977f02d0642d"

It seems like Cleanup Dead Locks couldn't delete the digest.

@pdkproitf
Copy link

pdkproitf commented Nov 4, 2022

My current solutions:

SidekiqUniqueJobs.reflect do |on|
  on.lock_failed do |job_hash|
    # There is a problem where job is executed and digest has been remove from SidekiqUniqueJobs::Digests.new.entries BUT the job_id still exists.
    # If your worker using SidekiqUniqueJobs's lock stratery with reject on conflit, then it will reject new job from being added to the queue even the job nolonger available.
    # This aims to handle that case and clear the invalid digest
    if job_hash['class'] == 'YourWokerClassName' &&
        Redis.current.get(job_hash['lock_digest']) &&
        Redis.current.zrank('uniquejobs:digests', job_hash['lock_digest']).nil?
      SidekiqUniqueJobs::Digests.new.delete_by_digest(job_hash['lock_digest'])
      job_hash['class'].constantize.perform_async(*job_hash['args'])
    end
  end
end

@JeremiahChurch
Copy link
Contributor Author

pdkproitf's fix has been running for us for a few months working pretty well. Reliability tapered off the last few weeks.

we did the sidekiq 7.x & unique jobs 8.x upgrade to take a few fixes to see if that helped out.

no positive results.

This is impacting our production enough that I'm diving in and really trying to get to the root of it. I'm wildly unprepared for lua or redis internals.

I cannot get any reproduction of problems on small result sets in dev/test. this is strictly a prod problem for us.

Best I can find the most obvious difference is between digest entries & digest page

[2] [production] pry(main)> ActivityItems::UserItemCreateJob.perform_async 
2023-03-10T01:19:52.703Z pid=19697 tid=g91 uniquejobs=client until_and_while_executing=prod_unique_jobs:cb057bf79139c15aefd77584ea6b5cd9 INFO: Skipping job with id (a319e5a0f9176ccd9f6c2ca2) because lock_digest: (prod_unique_jobs:cb057bf79139c15aefd77584ea6b5cd9) already exists
=> nil
[3] [production] pry(main)> key = 'prod_unique_jobs:cb057bf79139c15aefd77584ea6b5cd9'
=> "prod_unique_jobs:cb057bf79139c15aefd77584ea6b5cd9"
[4] [production] pry(main)> digests = SidekiqUniqueJobs::Digests.new
=> #<SidekiqUniqueJobs::Digests:0x000000000bbb8a18 @key="uniquejobs:digests">
[5] [production] pry(main)> digests.count
=> 44050
[6] [production] pry(main)> digests.entries pattern: key
=> {"prod_unique_jobs:cb057bf79139c15aefd77584ea6b5cd9"=>"1678411182.5969353"}
[7] [production] pry(main)> digests.page pattern: key
=> [45329, 30976, []]

seems that others are having the same/similar problem as #758, #734

Any suggestions or ideas are much appreciated!

@haanhduclinh
Copy link

we had exact same issue for new version of sidekiq and sidekiq-uniq-jobs

sidekiq (7.0.9)
sidekiq-unique-jobs (8.0.2)

I think this problem is quite serious because it affects the stability of the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants