New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of SSL max_version can cause excessive slow negotation, 502 on AWS w/ELB #1735
Comments
If i change only the |
Similarly if it use siege to hit puma on port 3000 directly i get no 502s. The problem is the interaction between AWS ALB and puma with SSL |
Is there a reason you're using ssl in the container for puma, as opposed to forwarding from https to http on the ALB? How are your listeners configured for the ALB? I came across this while trying to determine what the best strategy is for configuring puma for a rails app on ECS, so I'm curious as to what you're doing. I'm planning on terminating ssl at the alb, and relying on the x-forwarded-proto header to determine if the request was over https. |
@amichal - persistent_timeout = 75
+ persistent_timeout 75 I was facing a similar issue and was thinking setting the |
this is still a relevant issue for us after setting the right `persistent_timeout` value (> ALB idle timeout).
As you can see above, the patterns of the HTTP 502 errors and the target TLS negotiation error counts are similar and likely correlated. [updated] |
What version of OpenSSL is being used? |
Hi @MSP-Greg , thanks for responding :) for context, in our dockerized containers: $ bundle exec rails runner "require 'puma/minissl'; require 'puma/puma_http11'; Puma::Server.class; Puma::MiniSSL.check; puts Puma::MiniSSL::OPENSSL_LIBRARY_VERSION"
OpenSSL 1.1.1c 28 May 2019
$ bundle info puma
* puma (4.3.3) |
I don't know... I tried to find info about the AWS load balancers, and the docs I found seemed to indicate that they (at present) only support TLSv1.2? Apps vary as to a what level they participate in OpenSSL's negotiation. Puma (and many other apps) leave it to OpenSSL. But, with any version of 1.1.1, the negotiation starts are TLSv1.3 and is negotiated down. There is a way to force OpenSSL to start at TLSv1.2, but it isn't available in Puma, somewhat based on 'why would you want to do that?', but now there might be a reason for doing so... Anyway, I'm wondering if somehow OpenSSL is choking on all the (downward) negotiation, but I've never heard of that kind of issue. If so, one option would be building with OpenSSL 1.0.2? I basing this on:
|
@MSP-Greg
Indeed, it is only up to TLSv1.2 for now, based on the documentations and their underlying library's roadmap I am totally not a TLS/SSL expert but to my understanding, it would indeed be a negotiation down (of versions).
so it'd seem the negotiation should be fine (at least for most of the time). But yes, the negotiation downwards may take time which AWS ELBs can be critical with. The fact that the HTTP 502 is intermittent makes it hard to debug / reproduce. I could verify with our cloudMetrics on the AWS Load Balancer that prior to HTTPS setup, no HTTP 502 occur (consistent traffic before & after). I would try with forcing TLS v1.2 with Puma to see if it solves the issue 🙇 |
updates: We took the possible issue up with AWS tech support to see if they can shed some light, and it does hint at extending the idle_timeout to be > AWS ELB's idle connection timeout setting. Particularly:
Previously, we set the Puma idle_timeout config to be exactly +10 secs more than that of the AWS ALB and still encountered the intermittent 502s. As mentioned in the thread above, the TLS negotiation down (from TLS v1.3 to TLS v1.2 likely took some time and can cause TLS handshake errors based on thresholds). I've modified this to be a larger buffer beyond 10 secs and setup alerts on our CloudMetrics for any presence of HTTP 502. So far so good for a day of traffic (of course, an absence of HTTP 502 doesn't necessarily meant this fixed it but I think we can consider it to be so) :) Thank you @MSP-Greg and @nateberkopec for the look at this 🙇 |
Thanks for the update. As mentioned, I previously thought the need for having the equivalent of a If one has an app with fast response times, the time for downward negotiation is a more significant part of the total response time. Adding the method is relatively easy assuming OpenSSL 1.1.x, but gets a bit more involved when compatibility with 1.0.2 is desired... |
I've ignored this issue (it's very intermittent for us) for a long time but i just wanted drop back in and say it still happens in 4.3.5. I'm trying the |
correctly setting |
And now I think I understand the mechanism. The default |
I've created PR #2426 which allows a If anyone can try it and see if it helps with the AWS issue, it would be appreciated. |
I am getting a lot of 502, and after 502 somehow, container gets restarted. I am using ecs with AWS elb and ACM for ssl certification. I am also running rails server directly on 8080 instead of running it on 3000 and then using nginx to bind 80 port to 3000, will that effect the performance in terms of 502? |
The 502 on AWS ELB seem to be caused by the ssl connections being closed by puma without sending the shutdown alert, as desribed in #2675. The shutdown alerts were accidentally removed by the changes in #1334 a while ago. Increasing the Our production case:
|
Steps to reproduce
rails app running in
puma (Version 3.12.0 (ruby 2.5.3-p105
serving a simple text file out of/public
puma running as a systemd service
Configure a AWS Application load balancer to accept https traffic on (443) and forward to the puma instance above via https (port 3000)
Make lots of requests via a browser. Also can be automated. e.g.:
siege -b --no-parser -t1m -c30 https://lblb......eu-central-1.elb.amazonaws.com/robots.txt | grep 502
Expected behavior
100% 200 OK
Actual behavior
A small percentage ( 5-10 requests from 4000 completed in 1 minute) for the above test fails with a 502 Bad Gateway.
System configuration
ruby 2.5.3-p105
Rails 5.2.2
puma version 3.12.0
AWS ELB logs for failed requests show the request arriving -- taking <1ms to be read by AWS and <0.5s to be processed in total. They show puma connection closed without a reason and consistently processing 216 bytes from puma. Puma stdout shows nothing, neither does stderr. I've added x-aws-trace-id headers to my rails request logs to confirm that the rails app never gets a chance to log the failing requests.
ENV
RAILS_ENV = producton
RAILS_MAX_THREADS = 8
WEB_CONCURRENCY = 6
on a t2.xlarge with nothing else running on it
puma.rb
The text was updated successfully, but these errors were encountered: