Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rufus job's schedule time to determine if a specific job instance has ran before #181

Open
snmgian opened this issue Apr 27, 2017 · 19 comments
Labels

Comments

@snmgian
Copy link
Contributor

snmgian commented Apr 27, 2017

Job's schedule time is the timestamp at which a rufus job instance was scheduled to be triggered.

rufus-scheduler gets EoTime.now and then triggers each job passing that time to #trigger method.

This could cause problems if Rufus::Scheduler#trigger_jobs is invoked a second after a particular jobs has been triggered.

@snmgian
Copy link
Contributor Author

snmgian commented Apr 27, 2017

@benjaminwood
Copy link

Hi @snmgian - can this cause a job to be scheduled twice in a row? I believe I encountered this recently.

@snmgian
Copy link
Contributor Author

snmgian commented Jul 31, 2017

Hi @benjaminwood, what schedule types (https://github.com/moove-it/sidekiq-scheduler#schedule-types) are you using within your configuration?

@benjaminwood
Copy link

The job that was run twice was using the cron schedule-type. Specifically:

cron: '0 2 * * *'

@snmgian
Copy link
Contributor Author

snmgian commented Aug 5, 2017

Yes, it can happen under some circumstances.

Basically we are saving the timestamps (Time.now, seconds minimal resolution) at which a job is triggered and verifying that the timestamp was not not previously stored (which means the job has already run).

This approach works most of the time unless some runtime condition (cpu load, network latency, redis latency) causes the job to be triggered in the next second that the one was supposed to.

One of the possible approaches is to infer the timestamp at which the job should've exactly triggered ...

@benjaminwood
Copy link

@snmgian thanks for the response. Can you explain this a bit more?

One of the possible approaches is to infer the timestamp at which the job should've exactly triggered

I realize that sidekiq jobs should be idempotent, but the sad truth is that people regularly ignore this rule. So, this bug has the potential to do some damage in certain cases. Until this bug is fixed, I think we should add a notice/warning in the readme that jobs may be run multiple times, one after another. So, it is important (as always) to ensure that your scheduled jobs are idempotent.

@snmgian
Copy link
Contributor Author

snmgian commented Nov 16, 2017

One of the possible approaches is to infer the timestamp at which the job should've exactly triggered

Suppose we have cron: '0 2 * * *' (assuming UTC), and now is 2017-11-16T08:50:30Z, we can infer that next time will be 2017-11-17T02:00:00Z.

rufus-scheduler caculates the next time but from the outside, when the job is triggered, we don't have access to it, and every, interval and in will have different next_times when running in different nodes.

@snmgian
Copy link
Contributor Author

snmgian commented Nov 16, 2017

@benjaminwood #231

@benjaminwood
Copy link

Very good, hopefully the added documentation will provide a clue to new users that are experiencing duplicate execution.

Are you considering this a bug? Are there plans to address the issue so that it does not happen? I understand that it is a fundamental problem and the reliance on rufus-scheduler is at the center of it. Just curious how you view it at this point in time.

@snmgian
Copy link
Contributor Author

snmgian commented Nov 17, 2017

The gem wasn't designed with the idea of running in multiple nodes in mind. Today, I think this is a must have.

But before start to tackling this, the code and the overall design must be published. Some work is being done regarding that refactoring, but it's not finished yet.

@ronval
Copy link

ronval commented Jan 12, 2018

Hi snmgian, has this issue been fixed? My scheduled jobs are used to send out email reminders for work with in my production app. Right now this is causing people to get spammed pretty much. I was wondering if there was a fix for this or if you think I should just use rufus to schedule jobs in sidekiq?
I have 2 workers running sidekiq, not sure what to do its a bit unclear. Thanks for your time.

@snmgian
Copy link
Contributor Author

snmgian commented Jan 12, 2018

Hi @ronval, what do you have in your sidekiq scheduler configuration? Does your users receive email reminders twice?

@ronval
Copy link

ronval commented Jan 12, 2018

Yes They were getting it more than that actually. They were getting them 4 times. I have a pretty basic setup. Heroku 2 web dynos 2 workers dynos. The sidekiq config file is. (Its properly formated too). Thanks again for you help.
`development:
:concurrency: 5
production:
:concurrency: 6
:queues:

  • default
    :enabled: true
    :schedule:
    reminder_queue:
    every: ['3d', first_in: '120s']
    class: ReminderQueueWorker`

@snmgian
Copy link
Contributor Author

snmgian commented Jan 16, 2018

Hi @ronval

Don't use every for your use case, as the README states (https://github.com/moove-it/sidekiq-scheduler/#notes-about-running-on-multiple-hosts):

every, interval and in jobs will be pushed once per host

Why do you specified first_in: '120s'. If you do that, and then restart, the ReminderQueueWorker will be scheduled 120s after sidekiq starts up, then 3 days after start up, and so on.

So if you restart the worker (for any reason) every hour, the job will be scheduler 120 seconds after each startup. I'm not sure if that's what you want ... but I think it's not.

Another reason why every will not work for your use case, is that if you start the Heroku worker today, and you restart it again on Time.now + 2.days, and assuming you removed first_in option, ReminderQueueWorker will be scheduled on Time.now + 2.days + 3.days for the first time.

I think cron could work for you:

cron: '0 0 0 */3 * *'

You can also specify the time zone cron: '0 0 0 */3 * * America/Chicago'

Note: the snippet you provided here has bad formatting, but I think I got the idea. Anyway, you should wrap the code using ```, see this: https://guides.github.com/features/mastering-markdown/#examples

@ronval
Copy link

ronval commented Jan 25, 2018

Hi @snmgian
Thanks for your reply. I was reading into cron a bit more and it seems that when you use cron the schedule slowly deviates. I was reading more of the other thread where people had the same issue and what you wrote was to make one worker be the scheduler and have the others run through the queue
you said to make a initializer and a schedule.yml I have made these 2 things and I have also added it to the proc file which I will list below. But when I run sidekiq nothing happens. Can you let me know what I'm missing? Also you mentioned that you need to add the redis/rails config? But i already have that for the other worker running do I need to do that specifically for the new special worker? If so can you let me know how that would be awesome?
it seems this is the best way to do this but I'm just kinda confused on the setup. Thanks again for your time I know ur doing this for free :)

This is what the sidekiq server said:
INFO: Schedule empty! Set Sidekiq.schedule
INFO: Schedules Loaded

#INITALIZER
require 'sidekiq'
require 'sidekiq-scheduler'

Sidekiq.configure_server do |config|
schedule_file = 'config/schedule.yml'

config.on(:startup) do
if ENV['SCHEDULER'] == 'yes'
Sidekiq.schedule = YAML.load_file(schedule_file)

  Sidekiq::Scheduler.listened_queues_only = false
  Sidekiq::Scheduler.load_schedule!
end

end
end

this is the schedule.yml

reminder_queue:
every: ['2m', first_in: '120s']
class: ReminderQueueWorker

this is the procfile

web: bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -e production -C config/sidekiq.yml
scheduler: SCHEDULER=yes bundle exec sidekiq
release: rake db:migrate

@Yurokle
Copy link

Yurokle commented Aug 30, 2018

I'm also experiencing the same problem. When you have a heavy thread contention, rufus-scheduler schedule checks might be executed less frequently than 3 times/second – e.g. every other second. Since #idempotent_job_enqueue uses something like Time.now (actually EoTime) that comes from rufus-scheduler when it checks the schedule and uses that Time.now for idempotence protection, you'll experience double-queueing problem if you have thread-contention or simply run very low on CPU.
I think the problem could be solved partially if sidekiq-scheduler was using determenistic "next job run time" for idempotence protection, instead of Time.now. I think that'd solve the problem for CPU-heavy processes with "not-too-frequent" schedule. If you have jobs that run up to every minute – you'd be fine, but if you need to run a job every 1 - 20 seconds, you might still see double enqueueing issue.

Here is an example that shows how rufus-scheduler might deviate from the cron schedule under heavy thread contention:

require 'rufus-scheduler'
require 'time'

scheduler = Rufus::Scheduler.new

scheduler.cron '*/5 * * * * *' do |_, time|
  p time.iso8601(3)
end

10.times do
  # Creating thread-contention – increasing time between thread context switches. 
  Thread.new do
    loop do
      12341234 * 2341234 / 123123
    end
  end
end

scheduler.join
# Output:
"2018-08-29T17:41:00.185-07:00" <= OK
"2018-08-29T17:41:05.098-07:00" <= OK
"2018-08-29T17:41:10.033-07:00" <= OK
"2018-08-29T17:41:16.185-07:00" <= Off by 1 second
"2018-08-29T17:41:21.090-07:00" <= Off by 1 second
"2018-08-29T17:41:26.016-07:00" <= Off by 1 second
"2018-08-29T17:41:30.851-07:00" <= OK

@ronval
Copy link

ronval commented Aug 30, 2018

Thanks for the reply. I ended up making a statemachine. Just keeping track when it should run on the DB. If the date was the right date then it runs. It works fine now. Thanks :)

@snmgian snmgian added the bug label Dec 13, 2018
@Yaroslav-F
Copy link

Hello there,

I've been reading through multiple issues related to this matter and it boiled down to one single question for me: is it unsafe to use sidekiq-scheduler in auto-scalable sidekiq instance group just now? Or would you rather suggest using a lightweight process (rufus-scheduler/clockwork) on a separate instance which sole responsibility would be looking after the schedule and enqueue jobs for now?

PS: Thanks for the gem btw, idea of doing scheduling on redis side feels right.

@ronval
Copy link

ronval commented Dec 27, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants