Skip to content

Ent Rate Limiting

Ben Brinckerhoff edited this page Jul 6, 2023 · 57 revisions

Often 3rd party APIs will enforce a rate limit, meaning you cannot call them faster than your SLA allows. Sidekiq Enterprise contains a rate limiting API supporting various styles of rate limiting.

The rate limiting API works in any Ruby process. It's not specific to Sidekiq jobs or limited to use within perform. For example, you can use this API to rate limit requests within Puma. Rate limiting is shared by ALL processes using the same Redis configuration. If you have 50 Ruby processes connected to the same Redis instance, they will all use the same rate limits.

Note: limiters can be expensive to create. Create limiter instances once during startup and reuse them (as with ERP_LIMIT below). They are thread-safe and designed to be shared.

Concurrent

The concurrent style means that only N concurrent operations can happen at any moment in time. For instance, I've used an ERP SaaS which limited each customer to 50 concurrent operations. Use a concurrent rate limiter to ensure your jobs or other processes all stay within that global rate limit:

ERP_LIMIT = Sidekiq::Limiter.concurrent('erp', 50, wait_timeout: 5, lock_timeout: 30)

def perform(...)
  ERP_LIMIT.within_limit do
    # call ERP
  end
end

Each limiter must have a name, consisting only of letters, numbers, hyphens and underscores. name can be static (e.g. "erp") for a global limiter or dynamic (e.g. "stripe-#{userid}") to create multiple context-specific limiters.

Since concurrent access has to hold a lock, the lock_timeout option ensures a crashed Ruby process does not hold a lock forever. You must ensure that your operations take less than this number of seconds. After lock_timeout seconds, the lock can be reclaimed by another thread wanting to perform an operation.

You can use a concurrent limiter of size 1 to make a distributed mutex, ensuring that only one process can execute a block at a time.

Concurrent limiters will pause up to wait_timeout seconds for a lock to become available. This API is blocking and as efficient as possible: it does not poll unlike most other locking or mutex libraries for Redis. Blocking ensures the lock will be made available to a waiter within milliseconds of it being released.

The policy option can be set to:

  • :raise (the default) – raise OverLimit if a token cannot be obtained within wait_timeout
  • :ignore – if the block should be skipped if the rate limit can't be fulfilled

The concurrent limiter does not limit the number of jobs that will be executed. If you have 20 threads available, 20 jobs will be taken from the queue and executed. However, if you have a concurrent limiter set to 5 concurrent jobs, for example, only 5 of those 20 jobs will be able to execute the code inside the within_limit block. The other 15 will wait for up to wait_timeout seconds for a lock to become available before erroring and retrying.

Concurrent Metrics

The concurrent rate limiter tracks the following metrics:

  • Held - the number of times this limiter held a lock, meaning the block was executed.
  • Held Time - total time a lock was held, in seconds.
  • Immediate - the number of times a lock was available immediately, without waiting
  • Waited - the number of times a thread had to wait for a lock to become available
  • Wait Time - total time threads waited for a lock
  • Overages - number of times the block took longer than lock_timeout to execute, this is bad
  • Reclaimed - number of times another thread reclaimed a lock that was over timeout, this is very bad and can lead to rate limit violations

Bucket

Bucket means that each interval is a bucket: you can perform 5 operations at 12:42:51.999 and then another 5 operations at 12:42:52.000 because they are tracked in a different bucket.

Here's an example using a bucket limiter of 30 per second (notice how the name includes the user's ID, making it a user-specific limiter). Let's say we want to call Stripe on behalf of a user:

def perform(user_id)
  user_throttle = Sidekiq::Limiter.bucket("stripe-#{user_id}", 30, :second, wait_timeout: 5)
  user_throttle.within_limit do
    # call stripe with user's account creds
  end
end

The limiter will try to perform the operation once per second until wait_timeout is passed or the rate limit is satisfied. It calls sleep to achieve this so the thread is paused during that sleep time. If the wait_timeout duration is passed, the limiter will raise Sidekiq::Limiter::OverLimit — that exception is caught in server middleware and will automatically reschedule the job in the future based on the limiter's config.backoff result. If an individual job is rescheduled by the limiter more than 20 times (approximately one day with the default linear backoff), the OverLimit will be re-raised as if it were a job failure and the job retried as usual.

You can also use :minute, :hour or :day buckets but they will not sleep until the next interval and retry the operation. They immediately raise Sidekiq::Limiter::OverLimit and the job will be rescheduled as above.

You can see recent usage history for bucket limiters in the Web UI.

Window

Window means that each interval is a sliding window: you can perform N operations at 12:42:51.999 but can't perform another N operations until 12:42:52.999.

Here's an example using a window limiter of 5 per second (notice how the name includes the user's ID, making it a user-specific limiter). Let's say we want to call Stripe on behalf of a user:

def perform(user_id)
  user_throttle = Sidekiq::Limiter.window("stripe-#{user_id}", 5, :second, wait_timeout: 5)
  user_throttle.within_limit do
    # call stripe with user's account creds
  end
end

In addition to :second, you can also use :minute, :hour, or :day intervals. No matter which interval is used, sleep(0.5) will be called until wait_timeout is passed or the rate limiter is satisfied. Because it calls sleep to achieve this, the thread is paused during that sleep time. If the wait_timeout duration is passed, the limiter will raise Sidekiq::Limiter::OverLimit — that exception is caught in server middleware and automatically reschedules the job in the future based on the limiter's config.backoff result. If an individual job is scheduled by the limiter more than 20 times, the OverLimit will be re-raised as if it were a job failure and the job retried as usual.

Note that if the wait_timeout value is shorter than the interval in seconds, the limiter will immediately raise Sidekiq::Limiter::OverLimit and the job will be rescheduled as above, subject to the limit of 20 reschedules. For example, with an interval of :minute, any wait_timeout value below 60 will cause an immediate OverLimit.

In addition to the :second, :minute, :hour and :day symbols, window limiters can accept an arbitrary amount of seconds for the window:

# allow 5 operations within a 30 second window
Sidekiq::Limiter.window("stripe-#{user_id}", 5, 30)

Leaky Bucket

Sidekiq Enterprise v2.2 adds the "leaky bucket" rate limiter. The idea is that you can fill the bucket very quickly but then have to slow down further calls to match the pace of the "drip" so the bucket doesn't overflow.

Sidekiq::Limiter.leaky("shopify-#{user_id}", 60, 60)
Sidekiq::Limiter.leaky("shopify-#{user_id}", 60, :minute) # equivalent

Here we've declared a limiter which allows 60 operations in a bucket that drains in one minute. The caller may call the limiter 60 times as fast as they want (the "burst") but after those 60 calls the bucket is full and they will be limited to one call every second (the "drip"). If they wait 5 seconds, they will be able to make 5 calls. After 60 seconds, the bucket will be empty and they can make the full burst of 60 calls again.

Note that you could declare the limiter like this:

Sidekiq::Limiter.leaky("shopify-#{user_id}", 1, 1)
Sidekiq::Limiter.leaky("shopify-#{user_id}", 1, :second) # equivalent

which gets you the same drip rate as before but the burst is only one. This is very likely not what you want as it would be very caller-unfriendly. It's normal for callers to make occasional bursts of API calls so it's better to declare something like 60,60 rather than 1,1.

A more complex example from a customer: a bucket size of 40 with leak rate of 2/sec. The first parameter is the bucket size so that will be 40. The second parameter is the seconds required to empty a full bucket: 40 / 2. The leaky limiter for this example would be leaky(name, 40, 20).

Leaky limiters default to wait_timeout: 5 and will sleep and call Redis for each drip within that timeout period. If you have a limiter of 60, 60, each drip is one second and the limiter will sleep 1 until there is space in the bucket or 5 seconds have passed. Set wait_timeout: 0 if you don't want any sleep; the limiter will instead raise an OverLimit exception and, if in a Sidekiq job, schedule a retry later.

Sidekiq::Limiter.leaky("shopify-#{user_id}", 60, :minute, wait_timeout: 0)

Leaky limiters track hit and miss counts along with the amount of time the caller spent sleeping while waiting.

Points

Sidekiq Enterprise 7.1 adds a "points-based" leaky bucket rate limiter, which is useful for various GraphQL query endpoints at Shopify, GitHub, etc which rate limit based on query complexity. See issue #5757 for a usecase and links to real-world rate limited endpoints and docs.

Your bucket has 1000 points initially and refills at 50 points per second. This looks like this:

lmt = Sidekiq::Limiter.points("shopify", 1000, 20)

Now every call needs to provide an estimate of the points required for a call, along with an optional step to correct the estimate after the fact. The remote service's query result will usually include the amount of points consumed by the query. You report that value back to the limiter with points_used(actual) to keep the rate limiting as accurate as possible:

query_estimate = 200 # use endpoint docs for logic to determine needed points
lmt.within_limit(estimate: query_estimate) do |handle|
  # make the actual query here #
  actual = 50 # maybe our estimate was pessimistic
  handle.points_used(actual)
end

For instance, here's the documentation for calculating the points required for a Shopify GraphQL query.

Unlimited

The unlimited limiter is a rate limiter which always executes its block. This useful for conditional rate limiting -- for example, admin users or customers at a certain tier of service don't have a rate limit.

ERP = Sidekiq::Limiter.concurrent("erp", 10)

def perform(...)
  lmtr = current_user.admin? ? Sidekiq::Limiter.unlimited : ERP
  lmtr.within_limit do
    # always executes for admins
  end
end

Limiting is not Throttling

Rate limiters do not slow down Sidekiq's job processing. If you push 1000 jobs to Redis, Sidekiq will run those jobs as fast as possible which may cause many of those jobs to fail with an OverLimit error. If you want to trickle jobs into Sidekiq slowly, the only way to do that is with manual scheduling. Here's how you can schedule 1 job per second to ensure that Sidekiq doesn't run all jobs immediately:

1000.times do |index|
  SomeWorker.perform_in(index, some_args)
end

Remember that Sidekiq's scheduler checks every 5 seconds on average so you can get a small clump of jobs running concurrently.

Reschedules and Backoff

If the rate limit is breached and cannot be satisfied within wait_timeout, the Limiter will raise Sidekiq::Limiter::OverLimit.

If you violate a rate limit within a Sidekiq job, Sidekiq will reschedule the job to run again soon using a linear backoff policy, growing approximately five minutes every time. After 20 rate limit failures (approx one day), the middleware will treat the failing job as a retry.

2015-05-28T23:25:23.159Z 73456 TID-oxf94yioo LimitedWorker JID-41c51a2123eef30dbad4544a INFO: erp over rate limit, rescheduling for later

Advanced Options

Place the Sidekiq::Limiter.configure block in your initializer to configure these options.

Back off

You can configure how the limiter middleware backs off by providing your own custom proc:

Sidekiq::Limiter.configure do |config|
  # job is the job hash, 'overrated' is the number of times we've failed due to rate limiting
  # limiter is the associated limiter that raised the OverLimit error
  # exception is the exception that triggered the rate limiting (Sidekiq::Limiter::OverLimit or one of your configured error classes)
  # By default, back off 5 minutes for each rate limit failure
  config.backoff = ->(limiter, job, exception) do
    (300 * job['overrated']) + rand(300) + 1
  end
end

Redis

Rate limiting is unusually hard on Redis for a Sidekiq feature. For this reason, you might want to use a different Redis instance for the rate limiting subsystem as you scale up.

Rate limiting is shared by ALL processes using the same Redis configuration. If you have 50 Ruby processes connected to the same Redis instance, they will all use the same rate limits. You can configure the Redis instance used by rate limiting:

Sidekiq::Limiter.configure do |config|
  config.redis = { size: 10, url: 'redis://localhost/15' }
end

By default, the Sidekiq::Limiter API uses Sidekiq's default Redis pool so you don't need to configure anything. As of Sidekiq Enterprise 7.1, the rate limiter data model is Cluster-safe so you can use a cluster of Redis instances to store millions of rate limiters.

Testing

The unlimited limiter does not use Redis so you can conditionally use it anywhere (like a test suite) where you don't want to require Redis or accidentally trip rate limits.

def test_myworker
  my = MyWorker.new
  my.limiter = Sidekiq::Limiter.unlimited
  my.perform(...)
end

You can also use simple stubs like this on your test_helper.rb file.

class ActiveSupport::TestCase
  def noop_window_limiter(&block)
    Sidekiq::Limiter.stub(:window, Sidekiq::Limiter.unlimited, &block)
  end
end

Custom Errors

If you have a library which raises a custom exception to signify a rate limit failure, you can add it to the list of errors which trigger backoff:

Sidekiq::Limiter.configure do |config|
  config.errors << SomeLib::SlowDownPlease
end

TTL

By default, Limiter metadata expires after 90 days. If you are creating lots of dynamic limiters and want to minimize the memory overhead of having millions of unused limiters, you can pass in a ttl option with the number of seconds to live. I don't recommend a value lower than 24 hours.

Sidekiq::Limiter.window("stripe-#{user_id}", 5, 30, ttl: 2.weeks)

Reschedules

By default, the limiter middleware will reschedule any job that raises OverLimit up to 20 times with a linear backoff policy. You can configure the reschedule policy:

Sidekiq::Limiter.window("stripe-#{user_id}", 5, 30, reschedule: 20) # default, 20 times
Sidekiq::Limiter.window("stripe-#{user_id}", 5, 30, reschedule: 10) # only 10 times
Sidekiq::Limiter.window("stripe-#{user_id}", 5, 30, reschedule: 0) # don't reschedule at all

If the limiter fails that many times in a row, the middleware will give up and raise the exception to be handled by Sidekiq's standard retry subsystem.

Web UI

The Web UI contains a "Limits" tab which lists all limits configured in the system. Require the Enterprise web extensions in config/routes.rb:

require 'sidekiq-ent/web'

Concurrent limiters track a number of metrics and expose those metrics in the UI.

screenshot

Notes

  • Limiters are clock-sensitive. All your machines running Sidekiq should use NTP to sync their clocks. (Redis can't be used as a definitive source of time as Lua functions cannot access the clock.)
  • The same concurrent limiter (based on the name) may be used with different lock_timeout values which allows for different blocks of code to lock on the same resource with a different lock_timeout.