-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accurate Backpressure Metrics from Puma Server #1577
Comments
backing up a little: if formerly we considered (but I'll admit I didn't yet think deeply about your request lifecycle description-- I'm first simply comparing what we were looking at before to what we can look at now in the same place) |
Some of this is clearly subtle enough that I'm not going to fully understand and reflect it here. So: take my comments with a grain of salt. And some of it I'm restating just to make sure I understand it. I agree on approach 1 that there doesn't seem to be a good mechanism to get the information you want from TCP (pending connections without accepting them) or other highly-relevant metrics (wait time before acceptance, requests that were rejected due to queue size.) And while you could maybe reimplement enough to do that, ew, don't. There are some attempts in that direction (Goliath, ZeroMQ.) But I don't see them, or that approach, helping you. Hm. So approach 2 would do more work in the parent process (hard to scale), use more memory, have more of a single point of failure, and then have to pass the full-size request from master process to worker. In return, it would have one obvious place to handle backpressure and slow requests. Right now, based on #1563, it sounds like you have a choice of a hard-to-reason-about working algorithm, or a simpler, easier-to-explain method that doesn't work well. You tried switching to the simpler method, then reverted it. Am I understanding that correctly? Later the parent process of approach #2 might be a good choice for Guilds - multiple threads with multiple VM locks. But of course those aren't available yet. |
What we do at Discourse with Unicorn/NGINX is this: We set HTTP_X_REQUEST_START header and then have a metric of how long reqs are waiting between NGINX and unicorn. This allows us quite extensive amount of excellent alerting ... we also use raindrops https://bogomips.org/raindrops/ which give us slightly cruder metrics (N Queued) We use this kind of information for amazingly awesome features like: https://github.com/discourse/discourse/blob/master/lib/middleware/anonymous_cache.rb#L122-L131 We will force clients into an anonymous mode (and to use the blazing fast anonymous cache) if stuff starts queueing on a single route for logged in users. EG: someone posted ... OH NO OUTAGE AT EVE ONLINE on reddit and suddenly a herd of logged in users flood us. I don't know if you require NGINX in your architecture or not, but fronting stuff with it simplifies this so so much. |
The issue is in the implementation. @todo only grows when there are threads that are not doing work. The only time that @todo should be higher than zero is during the slight race condition where a request is added to the @todo array, but is not yet picked up by an idle worker. With this patch it means that @todo should always be small, usually zero but sometimes one. If we use @todo - @waiting then we should always get zero. Previously that metric should have been close to zero, but there was essentially a bug that when the above mentioned race condition happened it could let say 6 requests in before trying to process them. Basically even before this patch, the metric wasn't actually all that useful. Or at least wasn't doing what I thought it was. @SamSaffron Thanks for the info. that's helpful. I also didn't know you used unicorn. Is discourse threadsafe?
That's a good question and something to look into. We do not require nginx in front of Puma. We talked about using it as a reactor at last RailsConf (phoenix) but punted on it. I'm curious if @FooBarWidget has any insights. Is there a way to measure this kind of "backpressure" in passenger? Can you get any meaningful metrics out of nginx directly? |
I hope Discourse is threadsafe... I'm using it configured with a lot of Puma threads... |
Another idea: instead of measuring the capacity we don't have which is what i'm trying to do with "backpressure" we could instead measure the capacity that I do have. Basically we could report idle time and when all servers are reporting zero (or low) idle time for a period then we know we are at capacity. For example if I know that I have 2 workers and 5 threads, if my service isn't getting hit much then maybe it's telling me my idle capacity is 10. If I get hit with a New York times mention (or whatever) then suddenly I see that my idle capacity is down to zero, then if I scale out by adding another server I should get some slack re-introduced into my system. |
Yes Discourse is very threadsafe it has been so since day 1. |
can you point me to the area of code where this happens? does the Server decide which thread gets the request, or does the threadpool do that? what do you mean by "meter"? if i were to naively design such a system myself, i would make
Are there such structures in Puma? If not, is that for a particular reason? |
okay i found this in ThreadPool
So I guess you are saying that the size of this queue doesn't matter and there is something further up which indicates lack of capacity. So the answers to my other questions above will lead me in the right direction. Thanks! |
A thought/realization I just had: in #1512, I have min threads set to 1 and max threads set to whatever (5 or 6 typically). from my understanding of things, the delta between So if that's true, then the part of puma that decides to spool up and down threads (when min and max are not identical) also "knows" if capacity is available. is this interesting? |
[close #1577] Negative Backpressure Metric
Can someone help me understand that if we have a puma backlog that is waiting for the full request b4 putting them into todo list. And a worker always pop's the request b4 processing them, then how "There are cases where a two slower clients might get their requests accepted and then there is a temporary backlog, otherwise the backlog metric should be zero."? Is it when writing back? |
TLDR; The existing backlog metric is flawed as a backpressure mechanism and may soon be basically meaningless with #1563.
I want to be able to measure whether a Ruby web app is being under-provisioned for resources. One way that this can be accomplished is by measuring the number of requests that are fully formed that a server is not able to process.
Currently as a proxy I'm using Puma's "backlog" (which is different than than TCP backlog).
Backlog Behavior
For a request to appear in the backlog metric it must go through this flow:
Puma::Server
instance gets notified that there is capacity in its thread poolPuma::Server
accepts the request from the socket and puts it into the Reactor@todo
array size has been incremented by oneIf there are requests in
@todo
that have not been poped off and are being worked on then they show up as "backlog".At first this looks like an ideal metric, except for the
Puma::Server
meters request based on threadpool capacity. In an ideal world it will never try to add a request if there is not capacity. There are cases where a two slower clients might get their requests accepted and then there is a temporary backlog, otherwise the backlog metric should be zero.Possible solutions
1) TCP backlog metric
We could measure the number of requests that are queued at the TCP socket and report on that.
Unfortunately: There doesn't seem to be an API for getting this data. We can see the max size of the TCP backlog, but not actually how many clients are connected and awaiting being processed. Also even if we could, then we're risking weird conditions for example if someone is slow client attacking us, there could be tons of things in the backlog that will never be fully requests that are ready to be processed by the server.
2) 1 Reactor, many servers
Change the architecture so instead of 1 reactor for every process, we have one reactor on the parent process (or another sidecar process). It tries to buffer requests as fast as possible with no waiting. It can pass the fully buffered requests via some mechanism. Perhaps a unix domain socket though I don't know what it would look like for windows and jruby (though they can't run in clustered mode anyway).
If we did this than the backpressure metric would be requests that have been fully buffered by the reactor but have not been accepted by a cluster or threadpool. Assuming we can measure that somehow.
3) Any other ideas?
???
cc/ @jjb
The text was updated successfully, but these errors were encountered: