New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent app crash when postgres prepared query keys become out of sync with Rails #41034
Conversation
I believe we need to add a test for this otherwise someone can just change the code back during a refactoring and the issue is back. Can you take a look? |
Hey @rafaelfranca I agree that it would be great to have a test for this, but figuring out how to write a test that interrupts a Rails query would take much longer than the time I had available this morning ;) I'm not even really sure where to start to simulate the interrupt of Rails during a pg query, do you (or anyone?) have any ideas on that? |
…he query, so if the query gets interrupted without Rails knowledge we don't end up with a crashed app in a state of perpetual failure to prepare the next query. This commit directly inspired by the conversation at rails#25827 (and by a late night crash of our app due to this issue).
8228ada
to
1d940f3
Compare
Can you explain how this error happened and why this is the correct fix? Reading #17607 (comment) I remember that this doesn't fix the problem either. If you kill the query does it means that the statement is not prepared in the database? In that case does it means the statement pool counter is one counter ahead of the postgresql pool? |
Hey @rafaelfranca thanks for the quick attention! Aaron's comment is correct in that if Rails isn't able to complete execution of the statement, there could be an orphaned prepared statement in the postgres session. In spite of that, I still believe this PR will be a improvement in gracefully recovering, because instead of a hard app crash (due to Rails perpetually trying to re-create the same impossible statement) we'll just have an orphaned pg statement in one session, which seems like a less-bad outcome. Per the pg docs,
Thus, it sounds like whatever negative impact could accrue from orphaned prepare statements would be limited to one database session. And given the relative rarity of this situation (we had it crash our site last night -- this was the second time we've observed the issue in a couple years of running the product), in our case at at least, we'd definitely prefer orphaned (until end of db session) pg statements to hard app crashes. |
To test this I believe we can pass stub the pgconnection to raise an error when rails/activerecord/test/cases/adapters/mysql2/mysql2_adapter_test.rb Lines 219 to 232 in 31461d8
|
Hello hello @wbharding and @rafaelfranca, I've taken the liberty of writing a test for this, my version is here: #41356 :) (update: fix wrong PR link) |
Hey @MGPalmer that's awesome, you're my hero. It looks like the link you provided is for this PR, but https://github.com/rails/rails/pull/41356/files shows your test, which looks good to me 👍 I'm not sure about the etiquette of how best to incorporate those changes. I could certainly copy changeset into a commit that I make, but that would wipe @MGPalmer from the commit history except for mentioning him in the changelog. I'm sure there is also a way that I could grab MG's branch and cherry pick those commits into here, but I won't have time to figure out the git incantation for that today, so perhaps we're best off just using that PR, given that it references to this one where people can see the discussion. |
😊 Thanks, fixed my link! Well for me it's fine of course to just use the newer PR. |
Summary
Increment @counter of prepared postgres statements prior to running the query, so if the query gets interrupted without Rails knowledge we don't end up with a crashed app in a state of perpetual failure to prepare the next query.
This commit directly inspired by the conversation at #25827 (and by a late night crash of our app due to this issue).
Other Information
I was unable to get Rails to run the test suite, but it's pretty hard to imagine this would break tests. I'll aim to try running tests again later when I'm in a different dev environment. Would be nice to finally close the issue underlying #25827 after almost five years (!)