Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unique jobs #105

Open
rosa opened this issue Dec 30, 2023 · 12 comments
Open

Support for unique jobs #105

rosa opened this issue Dec 30, 2023 · 12 comments
Assignees

Comments

@rosa
Copy link
Member

rosa commented Dec 30, 2023

We need this feature, but I'm still not sure what it'll look like for Solid Queue. We have two use cases for it that couldn't be more different 😅 :

  • Prevent identical jobs from being enqueued together, keeping just one. In this case, when a job starts running, we want to allow other identical jobs to be enqueued right away. The uniqueness constraint would only apply while the jobs are waiting to be run. It wouldn't apply to scheduled jobs, we could have identical jobs scheduled, and if they run at different times, they'd be allowed to do so. Nope! I realised this is not necessarily true for our use case. We could have a restriction that applied to scheduled jobs and jobs waiting to be run, but that would have to be lifted as soon as jobs are ready to run. This restriction could apply to the solid_queue_ready_executions table alone. A new uniqueness_key with a unique index would work for this case.
  • Truly unique jobs: identical jobs are completely prevented from existing in the system, even after a job has already run (for the time jobs are preserved in the system, which depends on clear_finished_jobs_after). This restriction would apply to the solid_queue_jobs table. A uniqueness_key with a unique index would work in this case. I'd like this feature for Implement cron-style, recurring tasks #104, to prevent multiple jobs being enqueued for the same recurring task at a given time.

I'd like a common way to support both, but that might be tricky as it also needs to be performant. If I end up with two different implementations, they should be different enough not to be confusing. I could also reframe the second case, and instead of making it part of unique jobs, make it part of the implementation for cron jobs. They are different enough to grant that distinction.

After realising that the first case can work with the jobs table too, because all we need is to lift the restriction when a job is moved to ready, I think there's a good path for a common solution 🤔

@rosa rosa self-assigned this Dec 30, 2023
@davidpan
Copy link

davidpan commented Jan 4, 2024

My current stupid method is :

ExistJob = SolidQueue::Job.where(concurrency_key:"TestJob/#{id}").where(finished_at: nil).order(:scheduled_at).first
TestJob.set(wait: WaitTime ).perform_later(id) if ExistJob.nil? ||  Time.now + WaitTime + 5 * 60 < ExistJob.scheduled_at

repeating jobs are allowed to have customized execution times. This allows for the creation of multiple repeating jobs with different scheduled execution times. However, repeating jobs with execution times that are close to each other are not allowed. avoid resource waste .

@tnclong
Copy link

tnclong commented Jan 4, 2024

Is "Truly unique jobs" the responsibility of queue? I think it should be implemented through business tables(e.g. Add a unique index or a flag field to orders table).

If referring to AWS SQS, "unique over a period of time" might make more sense?

any messages sent with the same message deduplication ID are accepted successfully but aren't delivered during the 5-minute deduplication interval.

@rosa
Copy link
Member Author

rosa commented Jan 4, 2024

@tnclong, it's definitely "unique over a period of time", in part because nothing could be guaranteed for longer than the period you keep existing jobs. I'm sorry that wasn't clear! This feature is intended mostly to avoid unnecessary work, rather than to guarantee uniqueness across the app's domain and data, which definitely isn't the responsibility of the queue backend.

@benoist
Copy link

benoist commented Jan 11, 2024

I'd like a common way to support both, but that might be tricky as it also needs to be performant.

If the uniqueness is based on the params of the job, wouldn't the difference between unique jobs in transit and truly unique jobs not be the difference between cleaning up a register of uniqueness keys. A unique key would be removed after execution and truly unique only after X amount time, so it supports the unique over a period of time? X amount can be indefinite to support really only once, but that might not be a required in the real world.

Also in combination with the concurrency limitation, you can have multiple unique jobs queued, but only x amount of concurrent jobs with the same key.
For example a AccountSendEmailJob.perform_later(account_id, subject, message) could limit the concurrent sending based on account_id and the uniqueness on account_id, subject and message. This would only send 1 email at a time, but allows you to schedule different emails.

@nhorton
Copy link

nhorton commented Jan 16, 2024

Just a note that this is our biggest need to move from Sidekiq as well. We need the "for a period of time" version.

@nhorton
Copy link

nhorton commented Jan 16, 2024

Small note that the really great thing here would be if we got Upsert in ActiveRecord and could have that underlying a really performance implementation of this that did not need to either have a best-effort behavior or locking. We can survive with best effort but this is a great example of where Upsert would really be helpful.

@rosa
Copy link
Member Author

rosa commented Jan 16, 2024

@benoist,

not be the difference between cleaning up a register of uniqueness keys

In theory, yes! In practice, you need to account for the time cleaning up and how that cleaning up is done, how you guarantee that it happens after that X period of time, and what if that fails... and so on.

@nhorton,

the really great thing here would be if we got Upsert in ActiveRecord and could have that underlying a really performance implementation of this

Yes, totally. This is what I wanted to leverage as well, but it's not trivial to do depending on where in the job lifecycle you want to impose the uniqueness constraints 🤔

I need to put this aside for the next couple of weeks to focus on something else at work, but we really need this as well for HEY, so rest assured we'll come up with something.

@nhorton
Copy link

nhorton commented Jan 27, 2024

Yes, totally. This is what I wanted to leverage as well, but it's not trivial to do depending on where in the job lifecycle you want to impose the uniqueness constraints 🤔
Totally understood.
Our company does AI data analysis and we have crazy amounts of logic around queueing because our jobs are often long-running and will crash data platforms if we don't gate the load. We have a combination of simple uniqueness on the enqueue side that Upsert would solve, and on the dequeue side we need uniqueness as well as complex, dynamic rate limiting. I say all that for the point that I worry about variants of this a lot and would be happy to contribute in though or code.

But most of all - thank you for the work on Rails in general and this feature!

@devsaimsohail
Copy link

Hey. I have been searching a lot that you can we handle the CRON jobs using the Solid Queue but Unfortunately did not get any viewpoint from any Resource from any where. As I am shifting my application from Sidekiq to Solid Queue and I have many background jobs that automatically trigger themselves using the CRON.
For Example:
update_all_calls_data:
every: '1h'
class: Schedules::UpdateAllCallsDataJob
As I am shifting from solid Queue so I also want to control all my jobs using the Solid Queue at that moment.

And also a second thing that you have mentioned that cron-like tasks are coming very soon. I just want to know that when cron-like tasks will be available?

@rosa
Copy link
Member Author

rosa commented Jan 30, 2024

Hey @devsaimsohail, you can follow #104 to be notified when there is any news.

@nhorton
Copy link

nhorton commented May 17, 2024

@rosa - I was looking at what it would take to implement a version of this for ourselves to get unblocked, and it seems like we could do a before_enqueue that just did SolidQueue::Job.where(<search on what we care about>).exists? and aborted the enqueue if there was something there. Is there any reason we can't do that?

Note that I think that a few code samples of the above, and maybe a couple convenience scopes on Job might be enough to shut down several of these open issues.

@cmoel
Copy link

cmoel commented May 18, 2024

I've been looking forward to this feature as well. I wonder if an exists? query might be prone to timing issues, e.g., 2 processes that are trying to create the same unique job? Would we be able to use a unique index and upsert? Are there any possible issues with this approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants