Sidekiq "Scale to Zero" workers #5571
Replies: 3 comments 7 replies
-
Shouldn't cost-conscious users just embed Sidekiq into puma until they are comfortable and have the budget for running a worker 24/7? I don't know exactly how Rails Autoscale works but I imagine it starts a thread running within the web process which checks queue latency and size within Redis every N seconds. It then issues a scale up or scale down command based on those metrics. The Sidekiq API has all of the various APIs you need to monitor queues programmically for this type of logic with the caveat that it's not stateful -- it won't tell you if the queue size has been 0 for the last 30 seconds, you'd need to implement any stateful logic. |
Beta Was this translation helpful? Give feedback.
-
Yes, but I'm not thinking about folks who have tiny budgets, like $5/mo for a hobby website. I'm thinking about larger production websites that don't get a lot of traffic during off hours, but see lots of activity during the day. It would help out ops teams if they didn't have to worry so much about capacity planning and auto-scaling with a worker scheduler that was capable of spawning its own machines to process jobs. In this case I see the cost savings applying to ops teams and devs of all sizes. Here's how:
Thanks! Good to know. If I'm understanding the terminology and docs correctly, I could have the |
Beta Was this translation helpful? Give feedback.
-
This feels backwards to me. Shouldn't you create a Fly Machine when the number of jobs in a queue is > 0? Your approach of creating a machine per job is likely to get incredibly expensive for anyone enqueueing more than a few jobs. Plus, the point of a queue is that you put things in it and they can sit there and wait to be processed. Spawning up a Fly Machine for every job basically makes the queue completely unnecessary because a new machine is always available to process the job. Queues have no purpose if availability of the downstream processor is 100%. |
Beta Was this translation helpful? Give feedback.
-
Context: picking up a Twitter thread from https://twitter.com/getajobmike/status/1574896874674614273 that I started with @mperham
I've been working on "Scale to Zero"/auto-scaling background workers at Fly.io. During my initial research & experimentation phase I was able to get a Rails process to boot a VM, run a job, and shut down, but I discovered I'd need to re-invent a lot of the functionality Sidekiq (and other Rails worker libraries) have to handle job failures, retries, job status, etc. There's more at https://fly.io/ruby-dispatch/rails-background-jobs-with-fly-machines/, particularly around the "how", that goes into some of the details.
My question for the Sidekiq community, and maintainers, if there's interest in "Scale to Zero" workers in Sidekiq? The benefits are:
What makes Sidekiq particularly interesting and exciting for this problem now is Embedding, introduced in 7.0. I'm imagine a world where an embedded Sidekiq worker is handling the launching of Fly Machines when a job comes in, keeping them around for however long they're needed to process jobs on the queue, and then shut them down when they're no longer needed. What's great about this approach is the heavy lifting is done on other machines, which means the app server won't have to break much of a sweat.
If there is interest, there's a lot of questions around implementation. Its probably not worth getting into the specifics now, but most of the questions would be around what an adapter API would look like between
Sidekiq::Job
and the process that monitors those job threads. There would be some discussion too around how to make theSidekiq::Manager
exit after a certain amount of time has passed without processing jobs, which is the mechanism that would scale down workers on the Machines. Ideally this is implemented in a way that can plug into different processes like threads, VMs, docker containers, etc. (not just a proprietary integration).I'm not familiar with how the Sidekiq community operates, so forgive me if I come across as aloof in my approach. I'm assuming most of the terminology I used above is 80% accurate and I'm not familiar with how features/enhancements make its way into
main
. I assume they go through Mike 😄.Beta Was this translation helpful? Give feedback.
All reactions