Skip to content
This repository has been archived by the owner on Jan 24, 2023. It is now read-only.

Proposal: Distributed API ratelimiting #2195

Open
MSevey opened this issue Aug 9, 2022 · 0 comments
Open

Proposal: Distributed API ratelimiting #2195

MSevey opened this issue Aug 9, 2022 · 0 comments

Comments

@MSevey
Copy link
Contributor

MSevey commented Aug 9, 2022

At the moment we don't ratelimit user's across all servers of a portal. They can simply exceed their tier's download bandwidth by downloading from multiple servers which are not aware of the user requesting resources from other servers. Since it's not feasible to read and write from/to the database for every single request, we propose a point system similar to skyd's ephemeral accounts.

Proposal:

Every user has a certain number of APs (action points) generated at a rate tied to their account's tier. e.g. 1 AP/s.

The maximum number of APs a user can have is capped at 2 minutes of generating them. So at 1 AP/s the limit is a 120 APs balance. We refer to this as the global balance.

These APs can then be used to "pay" for using the API. So every endpoint is assigned a certain value which depending on the endpoint can either be a flat cost (e.g. a registry lookup), a dynamic cost (e.g. keeping open a subscription) or a mix (e.g. downloads/uploads).

Since we want to avoid using our database excessively, payments can't happen through the global balance. Instead we only use the global balance to fill up a user's local balances. The local balance is the balance a user has with a single server and is stored in memory by that server. If the local balance is not used within 2 minutes, the server can prune it or refund it to the global balance. Maybe instead of pruning we can also use the same trickle mechanism we use to fill the global balance to prune the local ones. We should allocate some time to think about this and discuss the trade-offs.

When a user requests a resource on a server and the local balance is insufficient, the server withdraws a buffer (e.g. 10% of the global balance) into the local one. This requires a database interaction but it should be rare and only happen every 30 seconds or so.

When the global balance is insufficient as well, a http status 429 is returned although this should never happen to the average user. We will try to set the prices in a way that the average usage won't be enough to hit the limit. Only when applications are either malicious or they try to optimise their performance by doing things like uploading to multiple servers in parallel they should run into the ratelimit. In which case they can also handle it due to the 429.

Implementation considerations to keep in mind:

should be able to support websocket endpoints

each server within a cluster will one day have dynamic endpoint costs it can set itself

price tables will be exposed at some point for the kernel

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant