New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for unique jobs #4
Comments
Hi David, sidekiq currently doesn't support a way to queue up jobs uniquely once per payload as you describe. However, I think support for this could be fairly easily added to the project in a generic way. Normally extensions to sidekiq would be done via custom middleware, but in this case we would want to perform a uniqueness check before the middleware chain would ever get executed. We could keep track of a redis set that contains a base64 value of all currently queued payloads. If a configuration option is enabled to only enqueue unique items, then we would first consult this set before adding a new job to the queue to make sure it doesn't already exist in the to-be-processed queue. I went ahead and implemented this approach in a separate branch with tests. @mperham, this may not be the direction we would want to go here, feel free to comment otherwise: |
Ideally, message processing should be idempotent so duplicate jobs won't matter, aside from the performance overhead of duplicate work. I'm going to let this one sit for a few days so I can think about it. Ryan, I'd also consider using middleware here for the server half, I want to avoid adding non-core features to the manager. The manager is a singleton and adding more network traffic to his code path will affect scalability. |
That definitely sounds reasonable to me. I agree that the server half should be done with middleware. I'll explore that idea later tonight in the topic branch that I created and update this issue! Thanks. |
Well, that was fast. I was considering the same sort of logic implemented here, but this also removes the ability to have both unique jobs and original-style jobs. As far as the duplicate jobs not mattering, it very well may be that they would matter much less while using sidekiq over Resque, but in my use case it would still end up being wasted time as the job itself can get time consuming depending on the number of users in the system. My own project Recommendable currently uses Resque to enqueue users to have the collaborative filtering logic run on them. With a lot of other users in the system, the job can take a couple of minutes (I've been trying to think of ways to optimize this, but the collab filtering is currently O(n^2) and I've been banging my head about it). Because of this, I use resque-loner to prevent any users from being queued twice; both jobs would end up yielding identical results. Anyway, I'm glad you guys are looking into it, and I'm impressed that it's happening this quickly. Thank you! |
I'm currently using resque-lock-timeout to support that only one job per customer account can run in parallel. Would be good that instead of just class/arguments uniqueness it would be possible to generate custom job identifier/lock key which would be used to check that only one job with such identifier can run in parallel. |
I went ahead and moved the server part to an after-style middleware. I also changed the name of the configuration option. When this option is enabled (ignore_duplicate_jobs), subsequent attempts to schedule a job with the same payload will not be added to the queue if the job is currently queued or is currently executing. It doesn't address @rsim's comment regarding being able to specify a unique job identifier. I think you could get around this by simply adding your own identifier to the job's payload. Most recent diff can be seen here: https://github.com/mperham/sidekiq/compare/unique_payloads |
It came to me last night that sidekiq::client should probably have pluggable middleware too so you can modify the message before sending it to redis or, in this case, stop the send altogether. |
I do like the idea of having pluggable middleware for the client, a la Faraday. Nice way to separate the concerns and allow injecting logic from the client side, similar to what can be done now on the server side. |
In my use case when I use resque-lock-timeout I want that job is pushed to job queue, I just don't want that any worker would pick up this job until previous job with the same generated job identifier (in my case "customer account id") isn't finished. |
Fixed by @ryanlecompte's client side middleware work. |
Sorry to be a dunce here, but can I confirm (@mperham or @ryanlecompte or anyone else who knows) the fix for how to do this using the middleware would be something like... In the Do I also have to check |
Jason, you cannot implement unique jobs using the sidekiq API. There's many many race conditions in what you suggest. On Jul 30, 2013, at 17:30, Jason King notifications@github.com wrote:
|
Hmm, I am confused then, thanks. I thought this was part of the reason to do it on the client, that's why I was asking about the So then, square one, how is the implementation of client middleware a solution to this unique job issue? You use some other semaphore/mutex entirely? |
Please see the various unique jobs impls that are already available. On Jul 30, 2013, at 18:10, Jason King notifications@github.com wrote:
|
I'd love to use sidekiq in lieu of Resque, but I'm currently also using resque-loner. It would be nice to see a way to create jobs in sidekiq that would only be queued up once per payload.
If I can, I'll look through resque-loner's code and see what would be involved in adapting it for use in sidekiq, but I also wanted to open this issue to get input.
The text was updated successfully, but these errors were encountered: