New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use UUIDs as IDs for prepared statements in Postgresql #25827
Conversation
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @schneems (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
It seems pointlessly complex to me*. If we want to work in the face of arbitrary exceptions, we could just fix the method to increment the counter earlier (I'm pretty sure there's a PR open that does that)... but we haven't, because we don't want to claim that we expect or intend to function in such a hostile environment. cc @tenderlove * yes, it's only one line of code.. but a randomly generated UUID is substantially more complex than a counter, IMO |
Found the PR #17607 |
My opinion: In theory you can say yes, we can't fix every spot where a Timeout might create unrecoverable problems. However in practice I think this one spot is the cause of the majority of them. A lot of (most?) timeouts happen when the database is being slow to respond. That one line of code, generating the prepared statement, will be a spot where a lot of the timeouts happen since it's the first thing that happens when trying to run a query, and a stuck database will freeze the rails app right there until the timeout happens. This spot is right in the disjunction of the sets of "code where a timeout would happen a lot" and "code that generates unrecoverable errors when a timeout happens". Due to this, I would propose that the slippery slope argument does not hold since this one fix would solve the problem most of the time, and the other timeout-problems that are used to freeze work on this don't have nearly as high a chance of actually happening. The game of whack-a-mole is a silly argument to use if one mole is the cause for 90% of the problems in my opinion. Isn't the complexity of generating a UUID is insignificant compared to the RTT to the database? |
@dv I think we're agreeing on the facts, but disagreeing on the conclusion. There is a slippery-slope argument to be made, but it's not the main one I'm looking at: If we're not going to guarantee that Active Record works in the face of arbitrarily injected exceptions, then it seems more dangerous to have it work almost all the time -- that's how people end up with unsupported configurations deployed in production for a long time, and left utterly confused when it does eventually fail. As for the slippery-slope: there is always going to be a slowest operation that breaks in the face of an unexpected exception, and it will be responsible for most of the observed incidences. This is just a prepare: it involves a network round-trip, but compared to the time spent actually running a query, it's going to be pretty diminutive. |
FWIW a much simpler solution would be to attempt to deallocate the existing prepared statement with a given name before preparing it. AFAIK deallocating a prepared statement that doesn't exist is not an error (I'm assuming. We'd have to do it through SQL with a |
Does this issue still exist in Rails5? Looking at latest source it seems like the @counter is still being used to generate the keys. Having finally found this year-old thread which explains our intermittent production issues, I have to say the arguments for not merging this are pretty weak:
I really see no real cost to just fixing it and making it better. Thank you for any re-consideration. |
I don't understand the complexity argument at all, the patch removes code and makes it more easy to read and reason about, and removes state. @sgrif your solution adds more code and more complexity though, in my opinion that's not simpler. |
@trey-roadster we're on Rails 5.0.6 (long overdue for an upgrade) and we just ran into this issue. I haven't dug too deeply but I believe the UUID approach proposed in this PR would elegantly solve the issue with minimal complexity / corner cases. |
Almost 3 years later on this PR, and it looks like there's no input from the core team? This seems like an elegant solution to a common problem. |
This is happening to us on a Rails 5.1.6 project as well and when it happens it's pretty devastating. I understand the argument that Rack::Timeout is unsafe, but in our case the alternative to using it means timeouts frequently cascade because they tie up Rails workers until they finish, even if the client will never receive the response. If this isn't the right PR to solve this problem, then what is the recommended solution? |
We still run into it on Rails 5.2, and have decided to try just turning off prepared statements. |
We resolved this by monkey-patching in a rails initializer. I'm not understanding the resistance to merge a simple fix that does not complicate the implementation while also appeasing a legitimate issue being hit for many users. |
@mhuggins Would you be willing to share your initializer patch? |
Here's my monkey-patch. Please note that this is for a Rails 4.2.x project, meaning there might be changes for this to work in Rails 5.x or 6-beta. I simply placed this into # Monkey-patch to prevent prepared statements from failing due to thread interruption
# caused by rack-timeout.
#
# Issue details:
# * https://github.com/rails/rails/issues/22408
# * https://github.com/rails/rails/pull/17607
#
# Code taken from:
# * https://github.com/rails/rails/pull/25827/files
module ActiveRecord
module ConnectionAdapters
class PostgreSQLAdapter < AbstractAdapter
class StatementPool < ConnectionAdapters::StatementPool
def next_key
"a#{SecureRandom.uuid.delete('-')}"
end
end
end
end
end |
Thanks! I will see if it can be adapted to Rails 5. |
We also regularly face this problem. Any chance on having a proper fix merged soon? |
@schneems @tenderlove Any thoughts on this? It seems like quite an old issue and would be nice to have a fix for it. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@rails-bot I think it's still worth to fix this bug |
After much consideration I will back @matthewd argument at #25827 (comment) and close this PR. We don't want to claim that we expect or intend to function in such a hostile environment. |
This feels really short-sighted. |
I'm running into this using the AWS Aurora Serverless Postgres offering, whereas the database will autoscale CUs. |
Is this also an issue when using multi-AZ databases in AWS? I wonder if Aurora's version of Postgres is doing replication differently. |
Why not just turn it around, and first increment the counter, then prepare the statement? That seems much more interrupt-safe to me. FWIW, that's how Sequel does it. In the code, the counter is first incremented, then the prepared statement name is formed, then the prepared statement is created, and then it's stored in a hash. |
I created a PR #41034 that increments the counter before the statement. This would still allow the possibility that a postgres prepared statement could be "orphaned" (i.e., Rails wouldn't know to remove it), but at least that wouldn't cause the app to remain in a state of indefinite failure. 🤞 that Rails team is receptive to getting this fixed these days. |
…he query, so if the query gets interrupted without Rails knowledge we don't end up with a crashed app in a state of perpetual failure to prepare the next query. This commit directly inspired by the conversation at rails#25827 (and by a late night crash of our app due to this issue).
@rafaelfranca a hostile environment? Timeouts happen and we can attempt to increase performance but even with that, they will still happen. This wouldn't be accounting for functioning in a hostile environment but are in fact accounting for the reality that exceptions and timeouts happen. Seems completely ridiculous that a single request timeout can completely take down the entire application for all requests. The solution is simple and removes the possibility of this error. Not sure how a counter that needs to be incremented is simpler than a uuid... @matthewd |
Summary
Due to certain third-party interrupts (e.g. through Rack::Timeout), it's possible a prepared statement is generated in Postgresql, but never "stored" in Rails. Since the code was interrupted before storing the statement, the
@counter
variable was never incremented even though it was used to generate a prepared statement.Next time a prepared statement is tried to be generated, the generated name is identical to the previous "orphaned" prepared statement since the same
@counter
was used. This culminates in hundreds of the same "prepared statement already exists" error as Rails is unable to move past this one already-in-use name.This update fixes this problem by using UUID as the prepared-statement name and not relying on the
@counter
.