Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs are being allowed to run even though there is a lock #734

Open
daapower opened this issue Aug 30, 2022 · 9 comments
Open

Jobs are being allowed to run even though there is a lock #734

daapower opened this issue Aug 30, 2022 · 9 comments

Comments

@daapower
Copy link

Describe the bug
I'm not sure if I am doing something wrong. The unique jobs used to work but recently upgraded sidekiq and unique jobs.

Below is a job that is supposed to be locked until and while executing but as you can see there are 2 of them
["worker.1:3:9598347b346b",
"14frb",
{"queue"=>"default",
"payload"=>
{"retry"=>false,
"queue"=>"default",
"backtrace"=>true,
"lock"=>"until_and_while_executing",
"args"=>[],
"class"=>"RunSponsoredAdsQueue",
"jid"=>"8e11befa3b4600d7c4749941",
"created_at"=>1661867359.8881626,
"lock_timeout"=>nil,
"lock_ttl"=>nil,
"lock_prefix"=>"uniquejobs",
"lock_args"=>[],
"lock_digest"=>"uniquejobs:82157bcb53420755466db7f4590d54e6",
"enqueued_at"=>1661867361.2786112},
"run_at"=>1661867361}],
["worker.1:3:9598347b346b",
"1hwq7",
{"queue"=>"default",
"payload"=>
{"retry"=>false,
"queue"=>"default",
"backtrace"=>true,
"lock"=>"until_and_while_executing",
"args"=>[],
"class"=>"RunSponsoredAdsQueue",
"jid"=>"ecafefaf4c6b71ce01daaee4",
"created_at"=>1661867957.9982724,
"lock_timeout"=>nil,
"lock_ttl"=>nil,
"lock_prefix"=>"uniquejobs",
"lock_args"=>[],
"lock_digest"=>"uniquejobs:82157bcb53420755466db7f4590d54e6",
"enqueued_at"=>1661867959.0355272},
"run_at"=>1661867959}],

Expected behavior
If 1 job is running and has a lock, there shouldn't be another one that is allowed to run

Current behavior
Currently 7 of these jobs are running at the same time

Worker class

class RunSponsoredAdsQueue
  include Sidekiq::Worker
  sidekiq_options queue: 'default', retry: false, lock: :until_and_while_executing

  def perform()
    PpcJob.run_ppc_queue()
  end
end

**Additional context**
    sidekiq (6.5.6)
      connection_pool (>= 2.2.5)
      rack (~> 2.0)
      redis (>= 4.5.0, < 5)
    sidekiq-cron (1.7.0)
      fugit (~> 1)
      sidekiq (>= 4.2.1)
    sidekiq-unique-jobs (7.1.27)
      brpoplpush-redis_script (> 0.1.1, <= 2.0.0)
      concurrent-ruby (~> 1.0, >= 1.0.5)
      sidekiq (>= 5.0, < 8.0)
      thor (>= 0.20, < 3.0)
@mhenrixon
Copy link
Owner

It only looks like 7 are running, 6 of them would just be waiting for the first one to complete.

@daapower
Copy link
Author

daapower commented Aug 30, 2022

I don't fully understand, are you saying 1 is running, the other 6 are waiting for that one to finish before running?

If I execute from the console, unique jobs usually prevents them from running it would just say "start" then immediately "done" so that only 1 job is running.

But for some reason, these jobs are popping up (screenshot below). My worry is that the jobs entail a lot of database read/write and are locking each other in the database preventing them from finishing

image

@mhenrixon
Copy link
Owner

You would probably be better off using until_executed. It looks like you might be hitting a weird edge case.

Is it a problem if a subsequent job is dropped when another is already on the queue waiting to be picked off by the sidekiq worker?

@daapower
Copy link
Author

I think I used to have it until_executed but it was still queueing the jobs so I changed it to until_and_while_executing.

I'm going to try installing another Redis, perhaps something is messed up with my jobs.

In the meantime I added this code to check if there is already a Sidekiq job with my process name, if there is I don't run the code in the worker

workers = Sidekiq::Workers.new
workers.each do |process_id, thread_id, work|
if work.dig("payload","class") ==

Thank you for the help

@Audrius
Copy link

Audrius commented Aug 31, 2022

Investigating, something is also strange on our part too.

Screenshot 2022-08-31 at 16 48 54

module Mailers
  # Worker activated by an AWS CloudWatch ping every 3 minutes. (as of 13/12/2018)
  class ScheduleInstantWorker < BaseMailerWorker
    sidekiq_options queue: 'sidekiq_instants', lock: :until_executed
sidekiq-unique-jobs (7.1.27)
     brpoplpush-redis_script (> 0.1.1, <= 2.0.0)
     concurrent-ruby (~> 1.0, >= 1.0.5)
     sidekiq (>= 5.0, < 8.0)
     thor (>= 0.20, < 3.0)
     

@DCrow
Copy link

DCrow commented Sep 8, 2022

Not sure if this is the same bug, but it is a until_and_while_executing lock bug.

I am able to reproduce this behavior like so

TestJob.perform_async
sleep(5)
SidekiqUniqueJobs::Orphans::Reaper.call
TestJob.perform_async

Sample app here

@mhenrixon
Copy link
Owner

@DCrow That actually does not reproduce anything. Your sidekiq process isn't running and since you have configured a global TTL of 1 hour the reaper would just consider the lock as active after 5 seconds.

@DCrow
Copy link

DCrow commented Sep 12, 2022

@mhenrixon Sorry if I wasn't clear. Prior to executing this script, sidekiq process should also be running.
After executing sidekiq process should output to stdout something like this

2022-09-12T10:06:20.714Z pid=36479 tid=109n class=TestJob jid=14b076f9462dafc5897706d4 INFO: start
"Test start"
2022-09-12T10:06:25.722Z pid=36479 tid=1093 class=TestJob jid=b26285ad778e75147a298288 INFO: start
"Test start"

@jpriollaud
Copy link

@daapower do you have a rescue in your WorkerClass or PpcJob somewhere? If so, what is it doing?

Currently investigating similar behavior in our project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants