Support for unique jobs #4

davidcelis · 2012-02-07T07:38:12Z

I'd love to use sidekiq in lieu of Resque, but I'm currently also using resque-loner. It would be nice to see a way to create jobs in sidekiq that would only be queued up once per payload.

If I can, I'll look through resque-loner's code and see what would be involved in adapting it for use in sidekiq, but I also wanted to open this issue to get input.

ryanlecompte · 2012-02-07T11:32:55Z

Hi David,

sidekiq currently doesn't support a way to queue up jobs uniquely once per payload as you describe. However, I think support for this could be fairly easily added to the project in a generic way. Normally extensions to sidekiq would be done via custom middleware, but in this case we would want to perform a uniqueness check before the middleware chain would ever get executed.

We could keep track of a redis set that contains a base64 value of all currently queued payloads. If a configuration option is enabled to only enqueue unique items, then we would first consult this set before adding a new job to the queue to make sure it doesn't already exist in the to-be-processed queue.

I went ahead and implemented this approach in a separate branch with tests. @mperham, this may not be the direction we would want to go here, feel free to comment otherwise:

https://github.com/mperham/sidekiq/compare/unique_payloads

mperham · 2012-02-07T17:20:04Z

Ideally, message processing should be idempotent so duplicate jobs won't matter, aside from the performance overhead of duplicate work. I'm going to let this one sit for a few days so I can think about it.

Ryan, I'd also consider using middleware here for the server half, I want to avoid adding non-core features to the manager. The manager is a singleton and adding more network traffic to his code path will affect scalability.

ryanlecompte · 2012-02-07T17:24:23Z

That definitely sounds reasonable to me. I agree that the server half should be done with middleware. I'll explore that idea later tonight in the topic branch that I created and update this issue! Thanks.

davidcelis · 2012-02-07T17:28:16Z

Well, that was fast. I was considering the same sort of logic implemented here, but this also removes the ability to have both unique jobs and original-style jobs.

As far as the duplicate jobs not mattering, it very well may be that they would matter much less while using sidekiq over Resque, but in my use case it would still end up being wasted time as the job itself can get time consuming depending on the number of users in the system. My own project Recommendable currently uses Resque to enqueue users to have the collaborative filtering logic run on them. With a lot of other users in the system, the job can take a couple of minutes (I've been trying to think of ways to optimize this, but the collab filtering is currently O(n^2) and I've been banging my head about it). Because of this, I use resque-loner to prevent any users from being queued twice; both jobs would end up yielding identical results.

Anyway, I'm glad you guys are looking into it, and I'm impressed that it's happening this quickly. Thank you!

rsim · 2012-02-07T19:36:50Z

I'm currently using resque-lock-timeout to support that only one job per customer account can run in parallel. Would be good that instead of just class/arguments uniqueness it would be possible to generate custom job identifier/lock key which would be used to check that only one job with such identifier can run in parallel.

ryanlecompte · 2012-02-08T02:28:01Z

I went ahead and moved the server part to an after-style middleware. I also changed the name of the configuration option. When this option is enabled (ignore_duplicate_jobs), subsequent attempts to schedule a job with the same payload will not be added to the queue if the job is currently queued or is currently executing.

It doesn't address @rsim's comment regarding being able to specify a unique job identifier. I think you could get around this by simply adding your own identifier to the job's payload.

Most recent diff can be seen here: https://github.com/mperham/sidekiq/compare/unique_payloads

mperham · 2012-02-08T15:36:16Z

It came to me last night that sidekiq::client should probably have pluggable middleware too so you can modify the message before sending it to redis or, in this case, stop the send altogether.

ryanlecompte · 2012-02-08T16:14:06Z

I do like the idea of having pluggable middleware for the client, a la Faraday. Nice way to separate the concerns and allow injecting logic from the client side, similar to what can be done now on the server side.

rsim · 2012-02-08T16:14:19Z

In my use case when I use resque-lock-timeout I want that job is pushed to job queue, I just don't want that any worker would pick up this job until previous job with the same generated job identifier (in my case "customer account id") isn't finished.

mperham · 2012-02-11T03:46:33Z

Fixed by @ryanlecompte's client side middleware work.

smathy · 2013-07-31T00:30:30Z

Sorry to be a dunce here, but can I confirm (@mperham or @ryanlecompte or anyone else who knows) the fix for how to do this using the middleware would be something like...

In the call method of my middleware class, search through the Sidekiq::Queue I'm passed looking for a job with a matching payload, and only yield if I don't find it.

Do I also have to check Sidekiq::Workers or will the job still be in Sidekiq::Queue even if it's being worked on?

mperham · 2013-07-31T00:59:05Z

Jason, you cannot implement unique jobs using the sidekiq API. There's many many race conditions in what you suggest.

On Jul 30, 2013, at 17:30, Jason King notifications@github.com wrote:

Sorry to be a dunce here, but can I confirm (@mperham or @ryanlecompte or anyone else who knows) the fix for how to do this using the middleware would be something like...

In the call method of my middleware class, search through the Sidekiq::Queue I'm passed looking for a job with a matching payload, and only yield if I don't find it.

Do I also have to check Sidekiq::Workers or will the job still be in Sidekiq::Queue even if it's being worked on?

—
Reply to this email directly or view it on GitHub.

smathy · 2013-07-31T01:10:44Z

Hmm, I am confused then, thanks. I thought this was part of the reason to do it on the client, that's why I was asking about the Sidekiq::Workers actually, because I was wondering whether/how my client would have visibility of those.

So then, square one, how is the implementation of client middleware a solution to this unique job issue? You use some other semaphore/mutex entirely?

mperham · 2013-07-31T02:33:27Z

Please see the various unique jobs impls that are already available.

On Jul 30, 2013, at 18:10, Jason King notifications@github.com wrote:

Hmm, I am confused then, thanks. I thought this was part of the reason to do it on the client, that's why I was asking about the Sidekiq::Workers actually, because I was wondering whether/how my client would have visibility of those.

So then, square one, how is the implementation of client middleware a solution to this unique job issue? You use some other semaphore/mutex entirely?

—
Reply to this email directly or view it on GitHub.

ryanlecompte added a commit that referenced this issue Feb 7, 2012

add unique payloads support, addresses issue #4

62045af

mperham closed this as completed Feb 11, 2012

jakemack mentioned this issue Jul 26, 2012

Several Potential Enhancements #314

Closed

endymonium mentioned this issue May 3, 2013

Custom Job ID #896

Closed

mperham mentioned this issue Jan 27, 2016

Sidekiq worker is stuck, doesn't respond to TTIN #2796

Closed

mperham mentioned this issue Jul 13, 2018

Impossible to restart Sidekiq pro on Heroku - Redis::CommandError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context #3886

Closed

mperham mentioned this issue Aug 30, 2019

Sidekiq 6.0.0 with Enterprise crashes on boot #4234

Closed

mperham mentioned this issue Oct 29, 2019

Frozen Sidekiq process #4352

Closed

resistorsoftware mentioned this issue Mar 10, 2022

How to clean out these Busy jobs. #5237

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for unique jobs #4

Support for unique jobs #4

davidcelis commented Feb 7, 2012

ryanlecompte commented Feb 7, 2012

mperham commented Feb 7, 2012

ryanlecompte commented Feb 7, 2012

davidcelis commented Feb 7, 2012

rsim commented Feb 7, 2012

ryanlecompte commented Feb 8, 2012

mperham commented Feb 8, 2012

ryanlecompte commented Feb 8, 2012

rsim commented Feb 8, 2012

mperham commented Feb 11, 2012

smathy commented Jul 31, 2013

mperham commented Jul 31, 2013

smathy commented Jul 31, 2013

mperham commented Jul 31, 2013

Support for unique jobs #4

Support for unique jobs #4

Comments

davidcelis commented Feb 7, 2012

ryanlecompte commented Feb 7, 2012

mperham commented Feb 7, 2012

ryanlecompte commented Feb 7, 2012

davidcelis commented Feb 7, 2012

rsim commented Feb 7, 2012

ryanlecompte commented Feb 8, 2012

mperham commented Feb 8, 2012

ryanlecompte commented Feb 8, 2012

rsim commented Feb 8, 2012

mperham commented Feb 11, 2012

smathy commented Jul 31, 2013

mperham commented Jul 31, 2013

smathy commented Jul 31, 2013

mperham commented Jul 31, 2013