You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because of this, both Unicorn and Pitchfork don't peroperly balance load between workers, unless the deployment is at capacity, the first workers will handle disproportionately more work.
In some ways this behavior can be useful, but in other it may be undesirable. Most notably in can create a situation where some of the workers are only used when there is a spike of traffic, and when that spike happen, it hit colder workers.
Pitchfork helps with that cold worker issue thanks to reforking, however the first few requests after reforking are likely to hit page fault, so they still have some (smaller) cold worker problem.
We could explore opening multiple TCP servers with SO_REUSEPORT to split the load evenly between subgroups of workers. The downside if that this would create a round-robin between each group, so if one group get multiple much slower requests it may spike latency.
One way I'd like to explore would be to have intertwined groups, e.g.:
Say 32 workers (0, 1, ...)
8 re-usedports (A, B, C,...)
We could do something where:
Workers 0..7 listen to A
Workers 4..11 listen to B
Workers 8...15 listen to C
This way each worker listen into multiple request pools (2 in the example, but could be more).
I think such setup could be a good compromise between fairness and tail latency.
The text was updated successfully, but these errors were encountered:
I did some prior work here, and the key is finding the balance. Here's a bit of a braindump:
If you have one socket per worker, and each worker only polls on one socket, you have the fairest possible load balancing. The kernel will do consistent hashing of the incoming requests, and you'll see that the load is basically equal across workers.
Drawbacks:
Sub-optimal latency. It is possible a request can land on a queue that has work ahead of it, when an another queue is empty. This is described in the cloud flare article
You need to preserve these file descriptors and the count cannot change - the master process must create them, then share with the children. Otherwise if you lose a worker, it messes up the consistent hashing. Easy enough to work around this, but if you have dynamic worker counts, this can be a problem.
If the worker dies or reboots, requests will still pile up in the queue for that worker
On the other end, if you have every worker poll every socket, you'll end up basically at the base behaviour of a LIFO.
The key is finding some balance here and it is a bit tricky. Between these two poles, you basically have two sliders you can tune:
The number of fds that each worker polls
The number of fds (specifically, the ratio of fds to workers)
You can end up with "fairer" load balancing if you adjust it such that each worker polls a few sockets. This mitigates some of the issues:
It is less likely you'll have the suboptimal latency case, as you have multiple workers to pick up work from the socket
When one worker restarts, other workers polling the same fd can pick up the slack for it.
I seem to recall that for a values like:
32 workers
32 file descriptors
3-4 file descriptors per worker
We got pretty good latency characteristics, and pretty good load balancing. You still get "localized pockets" of LIFO, but the extremes are diminished. Ie, you might have 8 hot workers, and 8 colder workers, then 16 "medium warm" workers, but the extremes are much less than you see with the default pure LIFO.
Linux's
epoll+accept
queue is fundamentally LIFO (see a good writeup at https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/).Because of this, both Unicorn and Pitchfork don't peroperly balance load between workers, unless the deployment is at capacity, the first workers will handle disproportionately more work.
In some ways this behavior can be useful, but in other it may be undesirable. Most notably in can create a situation where some of the workers are only used when there is a spike of traffic, and when that spike happen, it hit colder workers.
Pitchfork helps with that cold worker issue thanks to reforking, however the first few requests after reforking are likely to hit page fault, so they still have some (smaller) cold worker problem.
We could explore opening multiple TCP servers with SO_REUSEPORT to split the load evenly between subgroups of workers. The downside if that this would create a round-robin between each group, so if one group get multiple much slower requests it may spike latency.
One way I'd like to explore would be to have intertwined groups, e.g.:
We could do something where:
This way each worker listen into multiple request pools (2 in the example, but could be more).
I think such setup could be a good compromise between fairness and tail latency.
The text was updated successfully, but these errors were encountered: