Skip to content

Bulk Queueing

Mike Perham edited this page Feb 13, 2023 · 12 revisions

The push_bulk method allows us to push a large number of jobs to Redis. This method cuts out the redis network round trip latency. I wouldn't recommend pushing more than 1000 per call but YMMV based on network quality, size of job args, etc. A large number of jobs can cause a bit of Redis command processing latency.

This method takes the same arguments as #push except that args is expected to be an Array of Arrays. All other keys are duplicated for each job. Each job is run through the client middleware pipeline and each job gets its own JID as normal.

It returns an array of the of pushed jobs' JIDs. The number of jobs pushed can be less than the number given if the middleware stopped processing for one or more jobs.

200.times do |idx|
  # each loader job will push 1000 jobs of some other type
  LoaderWorker.perform_async(idx)
end

class LoaderWorker
  include Sidekiq::Job

  SIZE = 1000

  def perform(idx)
    # assume we want to create a job for each of 200,000 database records
    # query for our set of 1000 records
    array_of_args = SomeModel.limit(SIZE).offset(idx * SIZE).pluck(:id).map { |el| [el] }

    # push 1000 jobs in one network call to Redis, saves 999 round trips
    Sidekiq::Client.push_bulk('class' => SomeJob, 'args' => array_of_args)
  end
end

You can reference the push_bulk code in lib/sidekiq/client.rb:79-111

Sidekiq 6.3.0 introduced perform_bulk method that encodes the best practice of 1000 jobs at a time. The less code you have to write, the better.

class LoaderWorker
  include Sidekiq::Job

  def perform
    array_of_args = SomeModel.pluck(:id).zip
    
    # you can change the default batch_size if needed, default is 1000
    SomeJob.perform_bulk(array_of_args, batch_size: 500)
  end
end

This functionality is only available if you use the Sidekiq::Job API; it does not work with ActiveJob.