Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRIMED key grows indefinitely when job re-queues itself in after_unlock callback #787

Open
ttstarck opened this issue May 30, 2023 · 1 comment

Comments

@ttstarck
Copy link
Contributor

ttstarck commented May 30, 2023

Describe the bug
PRIMED key grows indefinitely for jobs that requeue themselves in after_unlock callback if the job is completed and re-queued again within 1 second of when the initial job was queued.

Expected behavior
PRIMED key is removed or stays at a small length when job is locked or unlocked.

Current behavior
PRIMED key grows indefinitely, with "1" as a value when job is re-queued within 1 second of initially being queued.

Worker class

class AfterUnlockWorker
  include Sidekiq::Job
  sidekiq_options lock: :until_executed

  def perform(args)
    @args = args
  end

  def after_unlock
    self.class.perform_async(@args)
  end
end

Additional context
Sidekiq Version: 6.5.8
SidekiqUniqueJobs Version: 7.1.29

Rspec Example:

# frozen_string_literal: true

require "rspec"
require "sidekiq"
require "sidekiq-unique-jobs"
require "sidekiq_unique_jobs/testing"

Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add(SidekiqUniqueJobs::Middleware::Client)
  end
end

RSpec.configure do |config|
  config.before(:each) do
    Sidekiq::Worker.clear_all
    Sidekiq::Testing.server_middleware do |chain|
      chain.add(SidekiqUniqueJobs::Middleware::Server)
    end
  end
end

RSpec.describe "PRIMED key spec" do
  it "grows PRIMED key indefinitely" do
    Sidekiq::Testing.fake!
    AfterUnlockWorker.perform_async(1)
    lock_digest = AfterUnlockWorker.jobs.first["lock_digest"]
    primed_key_length = -> { Sidekiq.redis { |r| r.llen("#{lock_digest}:PRIMED") } }
    primed_key_values = -> { Sidekiq.redis { |r| r.lrange("#{lock_digest}:PRIMED", 0, -1) }.uniq }
    100.times do |i|
      AfterUnlockWorker.perform_one
      puts "Test #{i} - Length of PRIMED key #{primed_key_length.call}" if i % 10 == 0
    end
    puts "Test Finished - Length of PRIMED key #{primed_key_length.call}"
    puts "Test Finished - Unique PRIMED key values #{primed_key_values.call}"
    sleep(1)
    puts "Test Finished - Length of PRIMED key after 1 second sleep #{primed_key_length.call}"
  end
end

Output from running the above spec:

Test 0 - Length of PRIMED key 1
Test 10 - Length of PRIMED key 11
Test 20 - Length of PRIMED key 21
Test 30 - Length of PRIMED key 31
Test 40 - Length of PRIMED key 41
Test 50 - Length of PRIMED key 51
Test 60 - Length of PRIMED key 61
Test 70 - Length of PRIMED key 71
Test 80 - Length of PRIMED key 81
Test 90 - Length of PRIMED key 91
Test Finished - Length of PRIMED key 100
Test Finished - Unique PRIMED key values ["1"]
Test Finished - Length of PRIMED key after 1 second sleep 0

This looks to be because when unlock LUA script on a job, it pushes "1" to the QUEUED key:
https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/unlock.lua#L95-L96

I unfortunately don't have enough context around the locking/unlocking algorithm to know if this is actually necessary or not.

I will say that after doing some more reading, Redis suggests a much simpler algorithm for setting up Distributed locking for a single master redis instance (which I think most people are running).

I'd love to hear about the reasoning behind why the QUEUED and PRIMED keys were added.

@mhenrixon
Copy link
Owner

Yes, redis suggests a simpler algorithm, but I needed more, so I dreamed up my own algorithm.

People requested that they lock only during runtime, not during queuing, meaning sidekiq picks the job up, runs it through the server middleware, and the server constrains any duplicates from running simultaneously.

People also requested that the locks stay active as long as the job is in Sidekiq. If you want something simpler, I recommend https://github.com/rwojsznis/sidekiq-lock or sidekiq-pro if you have the money.

I don't think you should schedule a new job as part of the old one still running. Sidekiq needs to be done processing the current job when you push the next one. Preferably, you should do that outside of this "Redis transaction" or "unit of work."

The queued and primed keys are used to block as part of processing the job. The push of "1" is accompanied by an expiration to ensure that listeners on the queued key can keep listening.

There are things I can do to improve your situation, but you know you are doing an infinite recursion, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants