Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

97jaz · 2019-07-25T16:22:08Z

Steps to reproduce

~~I haven't been able to create a minimal example that demonstrates the problem, unfortunately. (I tried.)~~

I put together a simple app that demonstrates the problem: https://github.com/careport/racy

At root, the problem is that, in system tests, a single database connection is shared among all server threads. This interacts badly with uses of unprepared_statement, since the latter modifies the @prepapred_statements instance variable on the connection. That change is visible to other threads sharing the connection. As a consequence, some queries wind up incorrectly parameterized.

I do not yet understand why this problem doesn't manifest before commit 853f568. I'll continue to look into it, but it might be more obvious to someone with more knowledge of AR internals.

Old description below:

When running a particular system test in our test suite, we often (though not always) get the error:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 0 parameters, but prepared statement "" requires 2

or, less frequently, the reverse:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 2 parameters, but prepared statement "a21" requires 0

These errors occur at the same point in the program execution. It's an ActiveRecord query that does not use any Arel methods. (I mention this, because previous bug reports with similar PG driver errors were caused by bad interaction between ActiveRecord and Arel. That is not what is going on here.

It appears to be a race between two requests. The problem starts with commit 853f568, which merged #36618.

I should mention that we're using rspec-rails 4.0.0.beta2, so it could be bad interaction between rspec and rails. But the error really does not happen prior to that commit.

Expected behavior

The database error described above should not occur.

Actual behavior

The database error does occur.

System configuration

Rails version: 6.0.0.rc2, but really anything on 6-0-stable starting with 853f568

Ruby version:
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin17]
(Though this also happens on our CI server, which runs linux, not OS X.)

The text was updated successfully, but these errors were encountered:

rafaelfranca · 2019-07-25T16:38:11Z

@eileencodes any idea of what could be causing this? It seems the query cache is now using the locked thread and maybe it was that?

eileencodes · 2019-07-25T17:01:25Z

Hrm, I'm not sure. Maybe we can't have the query cache in system tests 😢

cc/ @engwan as the original author of the commit that changed behavior. Do you have any ideas what could be going on here?

engwan · 2019-07-26T03:22:03Z

Hmm, I'm not sure how the query cache could cause this problem with bind params.

@97jaz Is it possible for you to give more information regarding the ActiveRecord query you're making here?

Also, to isolate the problem, can you try disabling the query cache for that controller and see if the problem is still there?

I see that the query cache uses a lock to synchronize access to the cache so I think it should be fine even if multiple threads use the same cache.

What web server for Capybara are you using here? Puma? Can you try setting the number of Puma threads to 1 to see if the problem still persists?

rafaelfranca · 2019-07-26T03:28:06Z

Those are questions to @97jaz .

97jaz · 2019-07-26T14:44:06Z

Is it possible for you to give more information regarding the ActiveRecord query you're making here?

@engwan The query is dead simple:

member = Member.find_by!(id: params[:member_id])

It should have two parameters, as a prepared statement, since it will translate into something like:

SELECT members.* FROM members WHERE id = ? LIMIT ?

Also, to isolate the problem, can you try disabling the query cache for that controller and see if the problem is still there?

If I disable the query cache in that controller action, I can no longer reproduce the problem.
Similarly, if I disable prepared statements globally, I can't reproduce it.
Also, if I run the controller action within a mutex, I can't reproduce it.

On the other hand:
If I disable prepared statements in just that controller action, I still can reproduce the problem, and when I also disabled them in another action that I thought might be involved in the race, it didn't help. Judging from log output, the other prepared statement involved in the race might be in authn/authz-related code that runs before the controller action. But the only line that rspec ever identifies as the source of the error is the one I mentioned above.

What web server for Capybara are you using here? Puma? Can you try setting the number of Puma threads to 1 to see if the problem still persists?

Yes, Puma.
However, running Puma on a single thread, I can still reproduce the problem (assuming that the query cache is enabled).

97jaz · 2019-07-26T15:01:33Z

However, running Puma on a single thread, I can still reproduce the problem (assuming that the query cache is enabled).

Oops -- my mistake. Capybara was configured to use webrick. When I changed it to use Puma and specified max 1 thread, I could not reproduce the problem. If I bumped that up to max 5 threads, I could.

(I was previously messing with the settings in config/puma.rb, which weren't being used by Capybara.)

97jaz · 2019-07-26T18:06:25Z

~~Maybe system tests simply aren't safe to run with more than 1 server thread?~~

97jaz · 2019-07-26T19:45:09Z

Huh, okay, I monkey-patched some code in ActiveRecord::ConnectionAdapters::PostgreSQLAdapter, and I see something odd happening.

When I get the error:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 2 parameters, but prepared statement "a21" requires 0

That query I mentioned above (Member.find_by!(id: params[:member_id])) is being called via #exec_cache and in that method, I'm printing out the statement key, the SQL, and the list of binds:

w/ cache: (a21) [SELECT "members".* FROM "members" WHERE "members"."id" = 2529 LIMIT 1], [#<ActiveRecord::Relation::QueryAttribute:0x00007feeb99c15f8 @name="id", @value_before_type_cast="2529", @type=#<ActiveModel::Type::Integer:0x00007feead628808 @precision=nil, @scale=nil, @limit=8, @range=-9223372036854775808...9223372036854775808>, @original_attribute=nil, @value="2529", @value_for_database=2529>, #<ActiveModel::Attribute::WithCastValue:0x00007fee9c87b198 @name="LIMIT", @value_before_type_cast=1, @type=#<ActiveModel::Type::Value:0x00007feead582b88 @precision=nil, @scale=nil, @limit=nil>, @original_attribute=nil, @value=1>]

The SQL doesn't contain any parameters, but parameter values are being passed anyway.

And when I get the reverse error:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 0 parameters, but prepared statement "" requires 2

the query is being called via exec_no_cache (which, under the hood, uses postgres's unnamed prepared statement), and, as you probably guessed, the SQL has formal parameters, but no binds are being passed:

w/o cache: [SELECT "members".* FROM "members" WHERE "members"."id" = $1 LIMIT $2], []

I had previously thought that these error messages indicated a race between two different queries, but they don't. It's just one query, and AR is confused about whether it has parameters or not.

97jaz · 2019-07-26T20:40:38Z

I'm now nearly certain of what's causing the problem. The system test in question goes through a code path that makes explicit use of connection.unprepared_statement { ... }. That temporarily changes the value of ActiveRecord::ConnectionAdapters::AbstractAdapter#prepared_statements, which, in turn, confuses queries in other threads that execute concurrently with the one that actually wants its statement not to be prepared.

Happily, though, the reason we're explicitly using unprepared_statement is because of #33702, which has since been fixed. So we don't need to use it any longer, and we can avoid this problem.

However, I think this is still a bug.

rafaelfranca · 2019-07-26T20:50:54Z

Is the same connection being shared between threads? We usually don't allow this, but I guess it changed for system tests.

engwan · 2019-07-26T20:57:17Z

Even before the query cache fix, system tests were already sharing the same connection. That's the main mechanism allowing us to use transactions with system tests.

What was happening then was that all system test threads were sharing a connection but the query cache was enabled on a different connection. So the fix changes this so that query cache is enabled for the correct connection.

I'm still not sure why query cache enabled would affect the logic around prepared statements. I think the statement cache is separate from the query cache? I'd have to take a look at the code again and check

97jaz · 2019-07-26T20:57:18Z

I believe that's the intended effect of the whole @lock_thread thing. Once that gets set, you always get back the same DB connection. (The purpose, I think, is so that you can rollback all changes at the end of the test, including ones that were used for test setup.) @eileencodes, is that right?

And I think I celebrated too early (regarding my ability to work around this problem). It turns out that the fix for #33702 didn't really work for cached queries, leading to #35286, which solves the problem by... using unprepared_statement.

Then again, I can just configure tests to use a single server thread. So still, not difficult to work around.

97jaz · 2019-07-26T21:02:41Z

@engwan Yes, I still don't understand exactly how your commit affects this. When I wrote "However, I think this is still a bug," I meant the interaction between sharing a connection between threads and using unprepared_statement.

engwan · 2019-07-26T21:06:12Z

@engwan Yes, I still don't understand exactly how your commit affects this. When I wrote "However, I think this is still a bug," I meant the interaction between sharing a connection between threads and using unprepared_statement.

Yes, that's definitely a bug. I'm curious to know why it only shows up with query cache enabled

97jaz · 2019-07-29T19:01:56Z

I now have a minimal app that reproduces the problem:
https://github.com/careport/racy

97jaz · 2019-07-29T20:44:54Z

After looking at the code involved, I think that the only way commit 853f568 is involved is that it gives the race condition more opportunity to manifest, but as far as I can see, in principle, the race exists with or without it. (Though, still, I cannot reproduce the issue before that commit, so maybe I'm wrong.)

ActiveRecord::ConnectionAdapters::QueryCache layers a select_all method atop the one provided by ActiveRecord::ConnectionAdapters::DatabaseStatements. It appears that when the query cache is used, DatabaseStatements#arel_from_relation and DatabaseStatements#to_sql_and_binds is called twice: once from the query cache's version of select_all and once from DatabaseStatements' version (since the former calls super).

Three of these methods:

QueryCache#select_all
DatabaseStatements#select_all
DatabaseStatements#to_sql_and_binds

... call ActiveRecord::ConnectionAdapters::AbstractAdapter#prepared_statements, the value of which is altered by calls to unprepared_statement. So there are plenty of opportunities for the race to manifest.

khasinski · 2019-08-01T18:04:33Z

Considering that #unprepared_statement basically temporarily disables prepared statements inside a block an easy solution would be to disable caching in this case.

In this case doing something like:

abstract_adapter.rb#261

      def unprepared_statement
        old_prepared_statements, @prepared_statements = @prepared_statements, false
        uncached { yield }
      ensure
        @prepared_statements = old_prepared_statements
      end

97jaz · 2019-08-01T18:18:30Z

I'll give that a try in racy shortly, but so far no one seems to understand why the query cache has anything to do with this.

khasinski · 2019-08-01T18:32:46Z

It is my understanding (might be wrong), that cache is created with preparable flag, which might lead to cache being read later with a wrong result.

Removing @lock.synchronize in cache creation (as it was before cache was shared between threads) resolves this issue, which leads me to believe it's a race condition.

Here is the relevant line in which cache is created with preparable, but read without this. I've tried adding preparable as a part of cache key, but unfortunately that didn't fix the problem.

query_cache.rb#select_all

cache_sql(sql, name, binds) { super(sql, name, binds, preparable: preparable) }

97jaz · 2019-08-01T21:41:20Z

@khasinski Your patch to AbstractAdapter does seem to fix the problem. (Or to make it so that I can't reproduce it anyhow.)

It looks like there are plenty of opportunities for the value of @prepared_statements to be read inconsistently during the processing of a single query, whether or not the query cache is enabled. That's why I'm still puzzled.

In system tests, a single database connection is shared among all the server threads. A call to unprepared_statement temporarily modifies an instance variable on the connection object, which is then visible to other concurrently running threads. This leads to a situation where prepared statements may end up with the wrong binds. Addresses rails#36763

In system tests, a single database connection is shared among all the server threads. Since connections have per-instance mutable state, this leads to race conditions, as in rails#36763. This patch changes the connection pool to wrap shared connections so that all access to them is synchronized on the connection's monitor lock.

In system tests, a single database connection is shared among all the server threads. A call to unprepared_statement temporarily modifies an instance variable on the connection object, which is then visible to other concurrently running threads. This leads to a situation where prepared statements may end up with the wrong binds. Addresses rails#36763

97jaz · 2019-08-14T19:00:45Z

Anyone have an opinion on the solution offered in #36871?

This fixes a race condition in system tests where prepared statements can be incorrectly parameterized when multiple threads observe the mutation of the @prepared_statements instance variable on the connection. Fixes rails#36763

Per rails#36949 we introduce a race condition fix for rails#36763 This refines the fix to avoid using Concurrent::ThreadLocalVar The implementation in the concurrent lib is rather expensive, culminating in a finalizer per object that spins off a thread to do cleanup work. None of this expense is needed as we can simply implement the desired behavior using Ruby primitives. Additionally this moves to a Fiber bound implementation vs a thread bound implementation, something that is not desired for this particular usage.

Per #36949 we introduce a race condition fix for #36763 This refines the fix to avoid using Concurrent::ThreadLocalVar The implementation in the concurrent lib is rather expensive, culminating in a finalizer per object that spins off a thread to do cleanup work. None of this expense is needed as we can simply implement the desired behavior using Ruby primitives. Additionally this moves to a Fiber bound implementation vs a thread bound implementation, something that is not desired for this particular usage.

rafaelfranca added this to the 6.0.0 milestone Jul 25, 2019

rafaelfranca added the activerecord label Jul 25, 2019

xaviershay mentioned this issue Jul 29, 2019

Postgres: allow enabling prepared statements for specific queries when globally disabled. #36795

Closed

97jaz mentioned this issue Aug 5, 2019

Execute unprepared statements atomically in system tests #36859

Closed

97jaz mentioned this issue Aug 6, 2019

Avoid race condition with unprepared_statement #36871

Closed

rafaelfranca modified the milestones: 6.0.0, 6.0.1 Aug 15, 2019

97jaz mentioned this issue Aug 16, 2019

Make prepared statement status thread and instance-specific #36949

Merged

rafaelfranca closed this as completed in #36949 Aug 16, 2019

SamSaffron mentioned this issue Sep 14, 2019

Avoid expensive tracking objects for prepared statements #37200

Closed

eileencodes mentioned this issue Sep 1, 2020

Ensure Puma runs in 0:1 configuration #40116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

97jaz commented Jul 25, 2019 •

edited

rafaelfranca commented Jul 25, 2019

eileencodes commented Jul 25, 2019

engwan commented Jul 26, 2019 •

edited

rafaelfranca commented Jul 26, 2019

97jaz commented Jul 26, 2019

97jaz commented Jul 26, 2019 •

edited

97jaz commented Jul 26, 2019 •

edited

97jaz commented Jul 26, 2019

97jaz commented Jul 26, 2019

rafaelfranca commented Jul 26, 2019

engwan commented Jul 26, 2019

97jaz commented Jul 26, 2019

97jaz commented Jul 26, 2019

engwan commented Jul 26, 2019

97jaz commented Jul 29, 2019

97jaz commented Jul 29, 2019

khasinski commented Aug 1, 2019

97jaz commented Aug 1, 2019

khasinski commented Aug 1, 2019

97jaz commented Aug 1, 2019

97jaz commented Aug 14, 2019

Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

Comments

97jaz commented Jul 25, 2019 • edited

Steps to reproduce

Old description below:

Expected behavior

Actual behavior

System configuration

rafaelfranca commented Jul 25, 2019

eileencodes commented Jul 25, 2019

engwan commented Jul 26, 2019 • edited

rafaelfranca commented Jul 26, 2019

97jaz commented Jul 26, 2019

97jaz commented Jul 26, 2019 • edited

97jaz commented Jul 26, 2019 • edited

97jaz commented Jul 26, 2019

97jaz commented Jul 26, 2019

rafaelfranca commented Jul 26, 2019

engwan commented Jul 26, 2019

97jaz commented Jul 26, 2019

97jaz commented Jul 26, 2019

engwan commented Jul 26, 2019

97jaz commented Jul 29, 2019

97jaz commented Jul 29, 2019

khasinski commented Aug 1, 2019

97jaz commented Aug 1, 2019

khasinski commented Aug 1, 2019

97jaz commented Aug 1, 2019

97jaz commented Aug 14, 2019

97jaz commented Jul 25, 2019 •

edited

engwan commented Jul 26, 2019 •

edited

97jaz commented Jul 26, 2019 •

edited

97jaz commented Jul 26, 2019 •

edited