Use UUIDs as IDs for prepared statements in Postgresql #25827

dv · 2016-07-14T13:47:42Z

Summary

Due to certain third-party interrupts (e.g. through Rack::Timeout), it's possible a prepared statement is generated in Postgresql, but never "stored" in Rails. Since the code was interrupted before storing the statement, the @counter variable was never incremented even though it was used to generate a prepared statement.

Next time a prepared statement is tried to be generated, the generated name is identical to the previous "orphaned" prepared statement since the same @counter was used. This culminates in hundreds of the same "prepared statement already exists" error as Rails is unable to move past this one already-in-use name.

This update fixes this problem by using UUID as the prepared-statement name and not relying on the @counter.

rails-bot · 2016-07-14T13:47:45Z

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @schneems (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

rafaelfranca · 2016-07-17T07:46:37Z

This looks good for me.

@sgrif @matthewd thoughts?

matthewd · 2016-07-17T08:16:38Z

It seems pointlessly complex to me*.

If we want to work in the face of arbitrary exceptions, we could just fix the method to increment the counter earlier (I'm pretty sure there's a PR open that does that)... but we haven't, because we don't want to claim that we expect or intend to function in such a hostile environment.

cc @tenderlove

* yes, it's only one line of code.. but a randomly generated UUID is substantially more complex than a counter, IMO

rafaelfranca · 2016-07-17T08:21:40Z

Found the PR #17607

dv · 2016-07-18T08:49:10Z

My opinion:

In theory you can say yes, we can't fix every spot where a Timeout might create unrecoverable problems.

However in practice I think this one spot is the cause of the majority of them. A lot of (most?) timeouts happen when the database is being slow to respond. That one line of code, generating the prepared statement, will be a spot where a lot of the timeouts happen since it's the first thing that happens when trying to run a query, and a stuck database will freeze the rails app right there until the timeout happens.

This spot is right in the disjunction of the sets of "code where a timeout would happen a lot" and "code that generates unrecoverable errors when a timeout happens". Due to this, I would propose that the slippery slope argument does not hold since this one fix would solve the problem most of the time, and the other timeout-problems that are used to freeze work on this don't have nearly as high a chance of actually happening. The game of whack-a-mole is a silly argument to use if one mole is the cause for 90% of the problems in my opinion.

Isn't the complexity of generating a UUID is insignificant compared to the RTT to the database?

matthewd · 2016-07-18T11:42:26Z

@dv I think we're agreeing on the facts, but disagreeing on the conclusion. There is a slippery-slope argument to be made, but it's not the main one I'm looking at:

If we're not going to guarantee that Active Record works in the face of arbitrarily injected exceptions, then it seems more dangerous to have it work almost all the time -- that's how people end up with unsupported configurations deployed in production for a long time, and left utterly confused when it does eventually fail.

As for the slippery-slope: there is always going to be a slowest operation that breaks in the face of an unexpected exception, and it will be responsible for most of the observed incidences. This is just a prepare: it involves a network round-trip, but compared to the time spent actually running a query, it's going to be pretty diminutive.

sgrif · 2016-07-18T13:33:09Z

FWIW a much simpler solution would be to attempt to deallocate the existing prepared statement with a given name before preparing it. AFAIK deallocating a prepared statement that doesn't exist is not an error (I'm assuming. We'd have to do it through SQL with a DEALLOCATE command as libpq and therefore the pg gem doesn't expose a function for this, which doesn't document the behavior when the statement is undefined. However, from the PG wire protocol documentation: "It is not an error to issue Close against a nonexistent statement or portal name." and there's no reason to expect that the pure SQL form would behave differently)

trey-roadster · 2017-07-30T19:31:12Z

Does this issue still exist in Rails5? Looking at latest source it seems like the @counter is still being used to generate the keys.

Having finally found this year-old thread which explains our intermittent production issues, I have to say the arguments for not merging this are pretty weak:

whack-a-mole - This is basically saying that if we can't fix 100% of the problem, we don't want to fix 95%+ of the problem with a trivial change. This purity is hard to accept for someone trying to serve real customers in the real world with this software. I appreciate the argument could be valid in other cases (like the fix is a 6 page hack), but here we have a dirt-simple, arguably better change with low cost, low risk for immediate large benefit. When perfect isn't on the menu, choose better over worse.
"more complex" - I really don't get this. Big numbers are harder than little numbers? In what cases are people dealing with these keys such that the "complexity" would be an issue? The additional code complexity is a nit. The conceptual complexity is negligible (UUIDS are common and easy to understand). Is this complexity really an issue that weighs well against fixing a confusing, intermittent problem faced by lots of users?

I really see no real cost to just fixing it and making it better. Thank you for any re-consideration.

dv · 2017-07-31T13:02:09Z

I don't understand the complexity argument at all, the patch removes code and makes it more easy to read and reason about, and removes state.

@sgrif your solution adds more code and more complexity though, in my opinion that's not simpler.

machty · 2018-09-12T16:08:33Z

@trey-roadster we're on Rails 5.0.6 (long overdue for an upgrade) and we just ran into this issue.

I haven't dug too deeply but I believe the UUID approach proposed in this PR would elegantly solve the issue with minimal complexity / corner cases.

mhuggins · 2019-01-18T16:45:52Z

Almost 3 years later on this PR, and it looks like there's no input from the core team? This seems like an elegant solution to a common problem.

xtagon · 2019-04-24T02:35:27Z

This is happening to us on a Rails 5.1.6 project as well and when it happens it's pretty devastating. I understand the argument that Rack::Timeout is unsafe, but in our case the alternative to using it means timeouts frequently cascade because they tie up Rails workers until they finish, even if the client will never receive the response. If this isn't the right PR to solve this problem, then what is the recommended solution?

trey-roadster · 2019-04-24T05:04:15Z

We still run into it on Rails 5.2, and have decided to try just turning off prepared statements.

mhuggins · 2019-04-24T15:22:40Z

We resolved this by monkey-patching in a rails initializer. I'm not understanding the resistance to merge a simple fix that does not complicate the implementation while also appeasing a legitimate issue being hit for many users.

xtagon · 2019-04-24T16:10:12Z

@mhuggins Would you be willing to share your initializer patch?

mhuggins · 2019-04-24T16:41:11Z

Here's my monkey-patch. Please note that this is for a Rails 4.2.x project, meaning there might be changes for this to work in Rails 5.x or 6-beta. I simply placed this into config/initializers/active_record.rb

# Monkey-patch to prevent prepared statements from failing due to thread interruption
# caused by rack-timeout.
#
# Issue details:
#  * https://github.com/rails/rails/issues/22408
#  * https://github.com/rails/rails/pull/17607
#
# Code taken from:
#  * https://github.com/rails/rails/pull/25827/files
module ActiveRecord
  module ConnectionAdapters
    class PostgreSQLAdapter < AbstractAdapter
      class StatementPool < ConnectionAdapters::StatementPool
        def next_key
          "a#{SecureRandom.uuid.delete('-')}"
        end
      end
    end
  end
end

xtagon · 2019-04-24T16:52:10Z

Thanks! I will see if it can be adapted to Rails 5.

andreynering · 2019-06-25T16:27:39Z

We also regularly face this problem.

Any chance on having a proper fix merged soon?

gregmolnar · 2019-07-16T06:43:26Z

@schneems @tenderlove Any thoughts on this? It seems like quite an old issue and would be nice to have a fix for it.

rails-bot · 2019-12-17T22:52:44Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

andreynering · 2019-12-17T22:57:08Z

@rails-bot I think it's still worth to fix this bug

rafaelfranca · 2019-12-17T23:14:37Z

After much consideration I will back @matthewd argument at #25827 (comment) and close this PR. We don't want to claim that we expect or intend to function in such a hostile environment.

mhuggins · 2019-12-17T23:24:56Z

This feels really short-sighted.

masterkain · 2020-05-08T07:22:26Z

I'm running into this using the AWS Aurora Serverless Postgres offering, whereas the database will autoscale CUs.

uberllama · 2020-05-13T19:58:38Z

I'm running into this using the AWS Aurora Serverless Postgres offering, whereas the database will autoscale CUs.

Is this also an issue when using multi-AZ databases in AWS? I wonder if Aurora's version of Postgres is doing replication differently.

janko · 2020-10-27T16:24:57Z

Why not just turn it around, and first increment the counter, then prepare the statement? That seems much more interrupt-safe to me.

FWIW, that's how Sequel does it. In the code, the counter is first incremented, then the prepared statement name is formed, then the prepared statement is created, and then it's stored in a hash.

wbharding · 2021-01-06T17:10:18Z

I created a PR #41034 that increments the counter before the statement. This would still allow the possibility that a postgres prepared statement could be "orphaned" (i.e., Rails wouldn't know to remove it), but at least that wouldn't cause the app to remain in a state of indefinite failure. 🤞 that Rails team is receptive to getting this fixed these days.

…he query, so if the query gets interrupted without Rails knowledge we don't end up with a crashed app in a state of perpetual failure to prepare the next query. This commit directly inspired by the conversation at rails#25827 (and by a late night crash of our app due to this issue).

jose-airspace · 2021-03-17T17:47:30Z

@rafaelfranca a hostile environment? Timeouts happen and we can attempt to increase performance but even with that, they will still happen. This wouldn't be accounting for functioning in a hostile environment but are in fact accounting for the reality that exceptions and timeouts happen. Seems completely ridiculous that a single request timeout can completely take down the entire application for all requests. The solution is simple and removes the possibility of this error. Not sure how a counter that needs to be incremented is simpler than a uuid... @matthewd

Use UUIDs as IDs for prepared statements in Postgresql

e0bab30

rails-bot assigned schneems Jul 14, 2016

maclover7 added activerecord PostgreSQL labels Jul 14, 2016

matthewd mentioned this pull request Sep 22, 2016

Duplicate Prepared Statement with Thread#kill #26588

Closed

schneems mentioned this pull request Sep 22, 2016

Fix #1627: prepared statement pool can be corrupted by exception #17607

Closed

michaeldever mentioned this pull request Jun 11, 2018

Deallocate preexisting prepare statement identifiers #33116

Closed

rails-bot bot added the stale label Dec 17, 2019

rails-bot bot removed the stale label Dec 17, 2019

rafaelfranca closed this Dec 17, 2019

wbharding mentioned this pull request Jan 6, 2021

Prevent app crash when postgres prepared query keys become out of sync with Rails #41034

Closed

MGPalmer mentioned this pull request Feb 6, 2021

Prevent app crash when postgres prepared query keys become out of sync with Rails #41356

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use UUIDs as IDs for prepared statements in Postgresql #25827

Use UUIDs as IDs for prepared statements in Postgresql #25827

dv commented Jul 14, 2016

rails-bot commented Jul 14, 2016

rafaelfranca commented Jul 17, 2016

matthewd commented Jul 17, 2016 •

edited

rafaelfranca commented Jul 17, 2016

dv commented Jul 18, 2016

matthewd commented Jul 18, 2016

sgrif commented Jul 18, 2016

trey-roadster commented Jul 30, 2017

dv commented Jul 31, 2017 •

edited

machty commented Sep 12, 2018

mhuggins commented Jan 18, 2019

xtagon commented Apr 24, 2019

trey-roadster commented Apr 24, 2019

mhuggins commented Apr 24, 2019

xtagon commented Apr 24, 2019

mhuggins commented Apr 24, 2019

xtagon commented Apr 24, 2019

andreynering commented Jun 25, 2019

gregmolnar commented Jul 16, 2019

rails-bot bot commented Dec 17, 2019

andreynering commented Dec 17, 2019

rafaelfranca commented Dec 17, 2019

mhuggins commented Dec 17, 2019

masterkain commented May 8, 2020

uberllama commented May 13, 2020

janko commented Oct 27, 2020 •

edited

wbharding commented Jan 6, 2021

jose-airspace commented Mar 17, 2021 •

edited

Use UUIDs as IDs for prepared statements in Postgresql #25827

Use UUIDs as IDs for prepared statements in Postgresql #25827

Conversation

dv commented Jul 14, 2016

Summary

rails-bot commented Jul 14, 2016

rafaelfranca commented Jul 17, 2016

matthewd commented Jul 17, 2016 • edited

rafaelfranca commented Jul 17, 2016

dv commented Jul 18, 2016

matthewd commented Jul 18, 2016

sgrif commented Jul 18, 2016

trey-roadster commented Jul 30, 2017

dv commented Jul 31, 2017 • edited

machty commented Sep 12, 2018

mhuggins commented Jan 18, 2019

xtagon commented Apr 24, 2019

trey-roadster commented Apr 24, 2019

mhuggins commented Apr 24, 2019

xtagon commented Apr 24, 2019

mhuggins commented Apr 24, 2019

xtagon commented Apr 24, 2019

andreynering commented Jun 25, 2019

gregmolnar commented Jul 16, 2019

rails-bot bot commented Dec 17, 2019

andreynering commented Dec 17, 2019

rafaelfranca commented Dec 17, 2019

mhuggins commented Dec 17, 2019

masterkain commented May 8, 2020

uberllama commented May 13, 2020

janko commented Oct 27, 2020 • edited

wbharding commented Jan 6, 2021

jose-airspace commented Mar 17, 2021 • edited

matthewd commented Jul 17, 2016 •

edited

dv commented Jul 31, 2017 •

edited

janko commented Oct 27, 2020 •

edited

jose-airspace commented Mar 17, 2021 •

edited