Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add push_bulk_in & push_bulk_in! #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

martinbottanek
Copy link

Allows you to specify when your jobs should run.

Internally, this works in the same way as Sidekiq’s perform_in.

Allows you to specify when your jobs should run.

Internally, this works in the same way as Sidekiq’s perform_in.
@originalhat
Copy link

Bump. This looks helpful.

@aprescott
Copy link
Owner

My apologies for such a long delay in responding to this PR.

This library is in a somewhat interesting state, given Sidekiq changes since I last did any work on this repo. I thought I'd capture my thoughts here for context. Not everything is directly relevant to the change you're proposing, but once you start pulling on the thread...

The summary is that I'm unlikely to merge this PR as-is because there are some required refactors to catch up with Sidekiq changes. (There would also need to be some tests for the new methods! 😅)

The gist of the situation:

  • This repo pre-dates a few Sidekiq features that have since been added.
  • Sidekiq has added a perform_bulk method on job classes. That makes it harder to see why this library exists at all in 2023, save for a difference in API. (Being able to pass a block, for example.)
  • Later versions of Sidekiq makes Sidekiq::Client batch-aware, which undermines an assumption originally baked into this library's implementation.
  • It's possible that there is a simpler API that incorporates at + batching than what can be done with Sidekiq::Client today. This gem could provide that API.

More detail below.


Sidekiq v6.3.0 introduced perform_bulk (sidekiq/sidekiq#5042) on job classes, with batched pushes. So with v6.3.0 or later, this works:

FooJob.perform_bulk([[1, "a"], [2, "b"], [3, "c"]], batch_size: 2)

Moreover, Sidekiq v7.1.0 included a change (sidekiq/sidekiq@1b2c3e8) which made Sidekiq::Client.push_bulk batch-aware, so this works:

Sidekiq::Client.push_bulk("class" => FooJob, "args" => [[1, "a"], [2, "b"], [3, "c"]], :batch_size => 2)

Without a batch_size specified, a default of 1000 is chosen.

(Later, the job-level perform_bulk was modified in sidekiq/sidekiq@c160719 to use Client.push_bulk.)

And, of course, Sidekiq v6.0.1 made Client.push_bulk aware of at (sidekiq/sidekiq#4243), which this PR relies on, so that this works:

Sidekiq::Client.push_bulk("class" => FooJob, "args" => [[1, "a"], [2, "b"], [3, "c"]], "at" => [1, 2, 3])

So, now what?

  1. If this library is used with Sidekiq v7.1.0 or later, the push_bulk method (from this library) is superseded by Sidekiq::Client.push_bulk's batching into groups of 1000. (This library will split jobs into groups of 10,000, but the underlying Client.push_bulk call will then group into chunks of 1000.)
  2. Similarly, this library'spush_bulk! no longer has the same semantics on v7.1.0 or later. It originally meant "push everything", but it will now batch by 1000.
  3. Since Sidekiq now provides perform_bulk on job classes, this library needs a stronger justification for its own existence. It originally provided what I saw as a fairly valuable operation that required a bunch of boilerplate, but Sidekiq's own perform_bulk is no more complicated than push_bulk. (It doesn't feel necessary to me to make this library exist as a replacement for perform_bulk for anyone on Sidekiq pre-v6.3.0.)
  4. Assuming it continues to exist, this library should be reworked to account for Sidekiq::Client.push_bulk's batch_size option.

With all that in place, at-aware bulk pushing has a fairly simple implementation as a generic version of this:

Sidekiq::Client.push_bulk(
  "class" => FooJob,
  "args" => [[1, "a"], [2, "b"], [3, "c"]],
  # An array of `Numeric`s, or `Time`s, or whatever else Sidekiq would accept.
  "at" => at_values,
  :batch_size => 2
)

A key question is whether this gem can provide a meaningful improvement over just using that approach. It might be able to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants