New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UI] UI workers occasionally stops serving requests #724
Comments
UI04 on 8/20
No interesting journal data during this time. Restarted puma after a Nagios alert. |
Another alert appeared on 8/22 over the weekend. However this time the issue cleared up on its own. Nagios noted a recovery roughly 20 minutes after first alert. Additional notes:
|
Single mode vs Clustered mode. But what does it all mean? Source: https://www.speedshop.co/2015/07/29/scaling-ruby-apps-to-1000-rpm.html |
Please run the following on the worker before restarting the service
Look for how many and in what state the Puma threads are: |
After discussion with Scott and Ryan, Dryad will be adding a retry for presigned URL requests: We will also request ALB logging, and Ryan will continue to log timeouts to a spreadsheet. We'll match those against any new logging that is enabled. |
@elopatin-uc3 , I recommend that we break out the issue that Dryad is seeing from this issue. It may be the same root cause, but the symptoms are different. |
Root cause of unresponsiveness not found. Logs shows activity prior to problem, but not excessive. Librato does not reflect host being over-resourced.
A simple restart of Puma fixes the issue.
Occurences
Possible fixes
No effect
Next Steps
TMPDIR=...
Future Ideas
The text was updated successfully, but these errors were encountered: