New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second Request Very Slow in Clustered Mode on 5.0.3, 5.0.4 and master. #2484
Comments
That issue is separately documented in #2398 and is fixed in puma master. The slow requests problem seems separate, but difficult to debug without a reproducible test case. Have you tried running your application with puma on the master branch? It would be good to know if the issue is present there, too, or if it has since been fixed. gem 'puma', github: 'puma/puma', branch: 'master' |
Thanks for the response, @cjlarose. I tested both issues on Just to give you a better idea of what I'm talking about, here's what I'm seeing, without giving anything too private away. Postman on left, Terminal console log on the right. To summarize:
5.0.2
|
Ok so it seem like it might be a regression introduced in 5.0.3. Since we don't have a public reproducible test case, it would helpful if you could perform a After cloning the puma repository, in the puma directory, git checkout v5.0.2
bundle install
bundle exec rake compile In your application's Gemfile:
Verify that you get the behavior you expect on git bisect start
git bisect good
git checkout v5.0.3 Restart your application and verify that it's bad. git bisect bad From that point, Let me know if anything's not clear! |
Will do! I didn't even know I'll make some time this weekend to run through this and let you know my results. Thanks for the help. |
Ah okay! |
Hi @cjlarose, Sorry for the delay. I used Here is the results of the investigation: a76d390 a76d3905d89780840b8cf4407c51c1ba384812c7 is the first bad commit
commit a76d3905d89780840b8cf4407c51c1ba384812c7
Author: Will Jordan <wjordan@users.noreply.github.com>
Date: Tue Oct 6 06:22:53 2020 -0700
Reactor refactor (#2279)
* Refactor Reactor and Client request buffering
Refactor Reactor into a more generic IO-with-timeout monitor,
using a Queue to simplify the implementation.
Move request-buffering logic into Server#reactor_wakeup.
Fixes bug in managing timeouts on clients.
Move, update and rewrite documentation to match updated class structure.
* Fix a few concurrency bugs
- In `Reactor#shutdown`, `@selector` can be closed before the call to `#wakeup`, so catch/ignore the `IOError` that may be thrown.
- `Reactor#wakeup!` can delete elements from the `@timeouts` array so calling it from an `#each` block can cause the array iteration to miss elements. Call @block directly instead.
- Change `Reactor#add` to return `false` if the reactor is already shut down instead of invoking the block immediately, so a client-request currently being processed can continue, rather than re-adding to the thread-pool (which may already be shutting down and unable to accept new work).
Co-authored-by: Nate Berkopec <nate.berkopec@gmail.com>
History.md | 2 +
lib/puma/client.rb | 27 ++-
lib/puma/queue_close.rb | 24 +++
lib/puma/reactor.rb | 434 ++++++++----------------------------------------
lib/puma/server.rb | 202 +++++++++++-----------
5 files changed, 221 insertions(+), 468 deletions(-)
create mode 100644 lib/puma/queue_close.rb I know very little about actually fixing this but I can reproduce it very readily now so let me know how I can help in that regard. Thanks for looking into this! Joshua |
@joshuapinter looks like great work with bisect! Have you tried to reproduce this with a stripped down version of your app your app that you can share? I wonder what could be causing this, and without a repro I feel we are still in the dark |
Hey @dentarg. I hear you. That refactor commit is fairly large so tough to nail down what might be the issue. Our app is a fairly hefty Rails 5.2 app. Tough to break out without some considerable work. Not sure if there's anything else I can test via If not, then I'll just have to put this on the back burner until I have more time to create a reproducible example. |
paging @wjordan just in case something about this bug description + having your reactor refactor in mind immediately jumps out at you |
This will be hard to track down without narrowing down the issue to something we can reproduce locally. Any chance you can repro the actual requests to the rails app with |
Yes, I can definitely do that. FYI, the requests are happening via Postman to hit the API, not in a browser. Not sure what it is using underneath but I’ll confirm the same results occur with cURL.
…On Nov 14, 2020, 12:12 PM -0700, Will Jordan ***@***.***>, wrote:
This will be hard to track down without narrowing down the issue to something we can reproduce locally. Any chance you can repro the actual requests to the rails app with curl at least, to rule out anything unusual going on with the browser-based requests?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thanks for investigating this. I've been working on a test framework rewrite, part of its purpose is to make it easier to write tests for issues like this. Running Puma with 0:5 threads and two workers, with an 'Hello World' app. I created 20 threads, each creating 100 client sockets, each sending two requests spaced 200 ms apart. I started at 10 threads, and all passed, but with 20, macOS was not happy. Anyway, Ubuntu had the following output. Note times are in seconds, totaled for all requests:
So, I can't see the long response time issue. If anything, it may indicate that the 'accept' processing may slow down for some reason. Obviously, something is causing your problem. One thing you might try. Using |
Thanks for the feedback @MSP-Greg. Some new updates:
Here's the kicker...
So, not sure what to tell you guys from here. I'm happy to close this now and see if anybody else encounters it, since it's not an issue on Ruby 2.7.2. But I'm also happy to test anything else for you if you want to drill deeper. Let me know and thanks for the consideration. 🙏 |
Interesting. I ran CI using Ruby 2.5.7, 2.6.5, & 2.7.1, and results were similar to the above. Maybe I'll review the changelogs of the recent Ruby versions... Puma isn't used as an isolated app, and it's obviously often used with Rails. If there's an issue with particular versions of Ruby/Rails/Nio4r, etc, we're interesting in knowing about them. So, close if you'd like, @nateberkopec may have something to add. He's a Rails guru, I am not... Thanks for investigating further. |
Okay, scratch that. It is still happening on Ruby 2.7.2. It took a few more requests for the issue to re-appear, which is why I didn't notice it right away, but it's exhibiting the same issue. We'll keep our |
So it is not always the second request that is very slow? |
@dentarg It appeared to be always the 2nd request when running Ruby 2.6.5 but with Ruby 2.7.2 it appeared to vary between 2nd, 3rd and 4th. It's strange, sorry. Terrible bug report. :/ |
My 2 cents: I'm also having this problem (I'm using Ruby 2.7.2 and Rails 6.0.3.4). I ended up downgrading to Puma 5.0.2 for now. Sorry for not having more information to help more, just wanted to share that @joshuapinter isn't the only one having problems with the latest versions 🙂 |
@joshuapinter just to clarify, the issue reproduces in 2.7.2 only with postman/ Also, is it a specific request path/route that reproduces the issue and not any other routes? Even if the app is proprietary and you can't narrow down the code into something you can share publicly, is there anything else about the particular route that triggers the issue that is unique or that you're able to share? Finally, what's the specific Mac OS version you're seeing the issue on? @lucascaton it would be helpful if you could share more specific detail on the problem you're experiencing:
|
Speaking of logs, adding Today I ran Puma as a separate process (not a sub process as in CI), and ran three requests per client/socket. No real difference in times, and I was running loops of 800. IOW, I'm trying to repo, but I can't... |
We seem to be seeing same issue with this config(Ruby 2.7.2 and Rails 6.0x). Being on puma 5.0.2 fixes and makes puma "just run" on Edit: Just to note that we have been facing this issue for a while, but we thought it to be strange or something wrong with our own setup, etc. as it was sporadic. Noting this because many teams might be facing it but feeling something might be wrong with their own setup and not reporting it as there's no errors reported from puma side. |
@lucascaton @vipulnsward @joshuapinter does the problem still exist in Puma 5.2? can you provide any more details? see #2484 (comment) |
Hey @dentarg - I've just tested here, UPDATE: After a few days running, the issue started again 😞 |
Hey @dentarg. I tested 5.2.1 and still seeing the same issue. Here's the videos with 5.0.2 (baseline)5.0.2.mp45.2.15.2.1.mp4The only thing changing between these two videos is the version of I know it's difficult to reproduce with an isolate case, but I can easily reproduce it on our side if there's anything else you want us to test or diagnose. Thanks! |
Thanks for the reply. In the mp4's above, is the only 'client' hitting Puma the 'client app' you show? If so, that means the request count is low, so, could you please turn on Puma request logging? I've been using Puma quite a bit locally, both testing with 'high volume' code generated requests and as a local server for the files used for https://msp-greg.github.io/, both tcp & ssl. I have never seen request timing like you're seeing. I don't/can't speak for the other maintainers, but without a means to repo this, or more info, I can't be of any help... JFYI, I've been working on a new test framework, which allows generating lots of requests. Just tested 10 threads, each creating 200 clients (all keep-alive), each client sending 10 requests. Tested with both 10kb & 200kb response bodies. With the 200kB bodies, 95% of the responses were complete in under 13mS. You're seeing timing that's orders of magnitude beyond that... |
@MSP-Greg Yes, the only requests being sent are when I hit the "Send" button from Postman. As you can see, the first one is fast one both versions. All subsequent requests are fast on 5.0.2. But subsequent requests on 5.2.1 (it actually started in 5.0.3) are extremely slow. Additionally, shutting down 5.0.2 is fast whereas 5.2.1 hangs for awhile before quitting. |
Thanks for the clarification. We don't need an mp4 of it, but if you turn on logging, what times are shown in the Puma logging for the slow requests? |
@MSP-Greg This is with 5.2.1.with.debug.mp4 |
Thanks, that helps. Most of the testing I've been doing is with GET requests, I'll try it with POST requests... |
@joshuapinter can you try the same thing again in Puma 5.2 but max threads 2 or more? Thanks :) |
@joshuapinter I also think it would be helpful if you can share You can can use Also, are you able to reproduce the issue with |
Thanks for the responses, I'll do my best to make some time this week for further testing and get you what you need.
I wasn't able to reproduce with |
@joshuapinter please share that code using net-http if you can, I think it would be helpful |
@dentarg Definitely will. |
Hi there. Just to let you know that this is still happening in Puma |
@lucascaton can you provide the information that was asked about in #2484 (comment)? |
Hey @dentarg, sure! OS (including specific OS version)Ubuntu via Dokku, which is powered by Docker and herokuish. ❯ dokku run uname -a
Linux b94572e9248a 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Note: I’m running Puma in Cluster Mode Server code usedThe code isn’t public/open-source and unfortunately, I couldn’t reproduce a minimal test app 😞 How the requests were made?Requests via browser by real users. Whether the issue reproduces consistently 100% of the time or if it’s intermittent/random, and if it’s always on the second request after the server bootsIt’s consistently 100% of the time but it’s never the 2nd request. In fact, for me, it takes 1-2 days for the issue to start happening. Any puma/rails logs, browser network-request logs, etc that you can capture+shareI believe the reason why the requests get very slow (at least in my case) is that it uses all available memory in the server (2GB) + its Swap (4GB). Using Puma 5.0.2 (the last version that didn’t have this issue), this app uses ~ 750MB, even after several days running. — Please let me know if I can help with anything else. |
@lucascaton your issue sounds very different from the issue discussed in this thread. Further, I can only assume the memory leak is coming from a bug in your application/configuration and not Puma itself until you can reproduce an issue in code you can make public so others can verify. |
Hey @wjordan,
Well, I agree it's different (I wasn't sure when I wrote my first comment here) but it's affected by the very same versions of Puma.
I don't think so. If I upgrade Puma from version 5.0.2 to 5.0.3+ and nothing else, the issue starts happening. When downgrading it back to 5.0.2, everything goes back to normal. I've done this a few times now.
I wish I could be more helpful here. I've been trying to reproduce it but it's not very easy since it takes 1-2 days (and/or several requests) for the issue to start happening. I was just reporting my case in case it helps somehow because it's affected by the same versions described in this PR 🙂 I'll keep trying to reproduce the issue. In the meantime, please let me know if I can help with anything else. |
Anyone still having issues? Anyone tried Puma 6.x? |
@dentarg Yes, just upgraded to 6.2.1 and seeing the same thing. First request is lightning quick and subsequent requests take 10+ seconds. |
@joshuapinter able to share how to reproduce? #2484 (comment) |
@dentarg Unfortunately not at the moment. Slammed with other things so we're still pinned at 5.0.2 for the time being. I will if things free up but tough to find time these days. |
@joshuapinter any chance you can just dump your Net::HTTP code that you are able to reproduce this with? so can others continue this mystery |
@dentarg I'll do my best but time is super limited (as always!). This is still something I'd like to get fixed but it's not a "production" issue so it gets deprioritized. Will let you know if I can. |
Describe the bug
I don't know all the details yet so I'll try and explain it the best I can. I noticed that the second (and subsequent) requests after launching the
puma
server was very slow compared to the original. e.g. 30 seconds instead of 0.3 seconds. I also notice that it takes much longer to quit / exit the server. Like 15 seconds instead of 0.5.This was appearing on any kind of cluster, even a cluster of 1. Commenting out
workers
entirely fixed the issue.I recently upgraded from 4.2.1 to 5.0.4 so I decided to roll back my upgrade and upgrade slowly to see which version it was introduced.
I was able to run things just fine on 5.0.2, but on 5.0.3 and then on 5.0.4, the issue starts appearing.
Puma config:
To Reproduce
I tried reproducing this with the
hello.ru
app and using mypuma.rb
file but I couldn't reproduce it there. The page requests were always fast. So it may be application, or Rails, specific but I can definitely reproduce it and the issue first appears in 5.0.3.5.0.2 is completely fine.
Expected behavior
Quick requests, like it is on 5.0.2.
Desktop (please complete the following information):
My apologies for the lack of information and reproducibility but I've narrowed it down the best I could to a single patch release, so hopefully we can dig deeper from here.
The text was updated successfully, but these errors were encountered: