Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

Closed
97jaz opened this issue Jul 25, 2019 · 21 comments · Fixed by #36949
Closed

Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763

97jaz opened this issue Jul 25, 2019 · 21 comments · Fixed by #36949
Milestone

Comments

@97jaz
Copy link
Contributor

97jaz commented Jul 25, 2019

Steps to reproduce

I haven't been able to create a minimal example that demonstrates the problem, unfortunately. (I tried.)

I put together a simple app that demonstrates the problem: https://github.com/careport/racy

At root, the problem is that, in system tests, a single database connection is shared among all server threads. This interacts badly with uses of unprepared_statement, since the latter modifies the @prepapred_statements instance variable on the connection. That change is visible to other threads sharing the connection. As a consequence, some queries wind up incorrectly parameterized.

I do not yet understand why this problem doesn't manifest before commit 853f568. I'll continue to look into it, but it might be more obvious to someone with more knowledge of AR internals.

Old description below:

When running a particular system test in our test suite, we often (though not always) get the error:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 0 parameters, but prepared statement "" requires 2

or, less frequently, the reverse:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 2 parameters, but prepared statement "a21" requires 0

These errors occur at the same point in the program execution. It's an ActiveRecord query that does not use any Arel methods. (I mention this, because previous bug reports with similar PG driver errors were caused by bad interaction between ActiveRecord and Arel. That is not what is going on here.

It appears to be a race between two requests. The problem starts with commit 853f568, which merged #36618.

I should mention that we're using rspec-rails 4.0.0.beta2, so it could be bad interaction between rspec and rails. But the error really does not happen prior to that commit.

Expected behavior

The database error described above should not occur.

Actual behavior

The database error does occur.

System configuration

Rails version: 6.0.0.rc2, but really anything on 6-0-stable starting with 853f568

Ruby version:
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin17]
(Though this also happens on our CI server, which runs linux, not OS X.)

@rafaelfranca rafaelfranca added this to the 6.0.0 milestone Jul 25, 2019
@rafaelfranca
Copy link
Member

@eileencodes any idea of what could be causing this? It seems the query cache is now using the locked thread and maybe it was that?

@eileencodes
Copy link
Member

Hrm, I'm not sure. Maybe we can't have the query cache in system tests 😢

cc/ @engwan as the original author of the commit that changed behavior. Do you have any ideas what could be going on here?

@engwan
Copy link

engwan commented Jul 26, 2019

Hmm, I'm not sure how the query cache could cause this problem with bind params.

@97jaz Is it possible for you to give more information regarding the ActiveRecord query you're making here?

Also, to isolate the problem, can you try disabling the query cache for that controller and see if the problem is still there?

I see that the query cache uses a lock to synchronize access to the cache so I think it should be fine even if multiple threads use the same cache.

What web server for Capybara are you using here? Puma? Can you try setting the number of Puma threads to 1 to see if the problem still persists?

@rafaelfranca
Copy link
Member

Those are questions to @97jaz .

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

Is it possible for you to give more information regarding the ActiveRecord query you're making here?

@engwan The query is dead simple:

member = Member.find_by!(id: params[:member_id])

It should have two parameters, as a prepared statement, since it will translate into something like:

SELECT members.* FROM members WHERE id = ? LIMIT ?

Also, to isolate the problem, can you try disabling the query cache for that controller and see if the problem is still there?

  • If I disable the query cache in that controller action, I can no longer reproduce the problem.
  • Similarly, if I disable prepared statements globally, I can't reproduce it.
  • Also, if I run the controller action within a mutex, I can't reproduce it.

On the other hand:
If I disable prepared statements in just that controller action, I still can reproduce the problem, and when I also disabled them in another action that I thought might be involved in the race, it didn't help. Judging from log output, the other prepared statement involved in the race might be in authn/authz-related code that runs before the controller action. But the only line that rspec ever identifies as the source of the error is the one I mentioned above.

What web server for Capybara are you using here? Puma? Can you try setting the number of Puma threads to 1 to see if the problem still persists?

Yes, Puma.
However, running Puma on a single thread, I can still reproduce the problem (assuming that the query cache is enabled).

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

However, running Puma on a single thread, I can still reproduce the problem (assuming that the query cache is enabled).

Oops -- my mistake. Capybara was configured to use webrick. When I changed it to use Puma and specified max 1 thread, I could not reproduce the problem. If I bumped that up to max 5 threads, I could.

(I was previously messing with the settings in config/puma.rb, which weren't being used by Capybara.)

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

Maybe system tests simply aren't safe to run with more than 1 server thread?

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

Huh, okay, I monkey-patched some code in ActiveRecord::ConnectionAdapters::PostgreSQLAdapter, and I see something odd happening.

When I get the error:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 2 parameters, but prepared statement "a21" requires 0

That query I mentioned above (Member.find_by!(id: params[:member_id])) is being called via #exec_cache and in that method, I'm printing out the statement key, the SQL, and the list of binds:

w/ cache: (a21) [SELECT "members".* FROM "members" WHERE "members"."id" = 2529 LIMIT 1], [#<ActiveRecord::Relation::QueryAttribute:0x00007feeb99c15f8 @name="id", @value_before_type_cast="2529", @type=#<ActiveModel::Type::Integer:0x00007feead628808 @precision=nil, @scale=nil, @limit=8, @range=-9223372036854775808...9223372036854775808>, @original_attribute=nil, @value="2529", @value_for_database=2529>, #<ActiveModel::Attribute::WithCastValue:0x00007fee9c87b198 @name="LIMIT", @value_before_type_cast=1, @type=#<ActiveModel::Type::Value:0x00007feead582b88 @precision=nil, @scale=nil, @limit=nil>, @original_attribute=nil, @value=1>]

The SQL doesn't contain any parameters, but parameter values are being passed anyway.

And when I get the reverse error:

     ActiveRecord::StatementInvalid:
       PG::ProtocolViolation: ERROR:  bind message supplies 0 parameters, but prepared statement "" requires 2

the query is being called via exec_no_cache (which, under the hood, uses postgres's unnamed prepared statement), and, as you probably guessed, the SQL has formal parameters, but no binds are being passed:

w/o cache: [SELECT "members".* FROM "members" WHERE "members"."id" = $1 LIMIT $2], []

I had previously thought that these error messages indicated a race between two different queries, but they don't. It's just one query, and AR is confused about whether it has parameters or not.

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

I'm now nearly certain of what's causing the problem. The system test in question goes through a code path that makes explicit use of connection.unprepared_statement { ... }. That temporarily changes the value of ActiveRecord::ConnectionAdapters::AbstractAdapter#prepared_statements, which, in turn, confuses queries in other threads that execute concurrently with the one that actually wants its statement not to be prepared.

Happily, though, the reason we're explicitly using unprepared_statement is because of #33702, which has since been fixed. So we don't need to use it any longer, and we can avoid this problem.

However, I think this is still a bug.

@rafaelfranca
Copy link
Member

Is the same connection being shared between threads? We usually don't allow this, but I guess it changed for system tests.

@engwan
Copy link

engwan commented Jul 26, 2019

Even before the query cache fix, system tests were already sharing the same connection. That's the main mechanism allowing us to use transactions with system tests.

What was happening then was that all system test threads were sharing a connection but the query cache was enabled on a different connection. So the fix changes this so that query cache is enabled for the correct connection.

I'm still not sure why query cache enabled would affect the logic around prepared statements. I think the statement cache is separate from the query cache? I'd have to take a look at the code again and check

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

I believe that's the intended effect of the whole @lock_thread thing. Once that gets set, you always get back the same DB connection. (The purpose, I think, is so that you can rollback all changes at the end of the test, including ones that were used for test setup.) @eileencodes, is that right?

And I think I celebrated too early (regarding my ability to work around this problem). It turns out that the fix for #33702 didn't really work for cached queries, leading to #35286, which solves the problem by... using unprepared_statement.

Then again, I can just configure tests to use a single server thread. So still, not difficult to work around.

@97jaz
Copy link
Contributor Author

97jaz commented Jul 26, 2019

@engwan Yes, I still don't understand exactly how your commit affects this. When I wrote "However, I think this is still a bug," I meant the interaction between sharing a connection between threads and using unprepared_statement.

@engwan
Copy link

engwan commented Jul 26, 2019

@engwan Yes, I still don't understand exactly how your commit affects this. When I wrote "However, I think this is still a bug," I meant the interaction between sharing a connection between threads and using unprepared_statement.

Yes, that's definitely a bug. I'm curious to know why it only shows up with query cache enabled

@97jaz
Copy link
Contributor Author

97jaz commented Jul 29, 2019

I now have a minimal app that reproduces the problem:
https://github.com/careport/racy

@97jaz
Copy link
Contributor Author

97jaz commented Jul 29, 2019

After looking at the code involved, I think that the only way commit 853f568 is involved is that it gives the race condition more opportunity to manifest, but as far as I can see, in principle, the race exists with or without it. (Though, still, I cannot reproduce the issue before that commit, so maybe I'm wrong.)

ActiveRecord::ConnectionAdapters::QueryCache layers a select_all method atop the one provided by ActiveRecord::ConnectionAdapters::DatabaseStatements. It appears that when the query cache is used, DatabaseStatements#arel_from_relation and DatabaseStatements#to_sql_and_binds is called twice: once from the query cache's version of select_all and once from DatabaseStatements' version (since the former calls super).

Three of these methods:

  • QueryCache#select_all
  • DatabaseStatements#select_all
  • DatabaseStatements#to_sql_and_binds

... call ActiveRecord::ConnectionAdapters::AbstractAdapter#prepared_statements, the value of which is altered by calls to unprepared_statement. So there are plenty of opportunities for the race to manifest.

@khasinski
Copy link
Contributor

Considering that #unprepared_statement basically temporarily disables prepared statements inside a block an easy solution would be to disable caching in this case.

In this case doing something like:

abstract_adapter.rb#261

      def unprepared_statement
        old_prepared_statements, @prepared_statements = @prepared_statements, false
        uncached { yield }
      ensure
        @prepared_statements = old_prepared_statements
      end

@97jaz
Copy link
Contributor Author

97jaz commented Aug 1, 2019

I'll give that a try in racy shortly, but so far no one seems to understand why the query cache has anything to do with this.

@khasinski
Copy link
Contributor

It is my understanding (might be wrong), that cache is created with preparable flag, which might lead to cache being read later with a wrong result.

Removing @lock.synchronize in cache creation (as it was before cache was shared between threads) resolves this issue, which leads me to believe it's a race condition.

Here is the relevant line in which cache is created with preparable, but read without this. I've tried adding preparable as a part of cache key, but unfortunately that didn't fix the problem.

query_cache.rb#select_all

cache_sql(sql, name, binds) { super(sql, name, binds, preparable: preparable) }

@97jaz
Copy link
Contributor Author

97jaz commented Aug 1, 2019

@khasinski Your patch to AbstractAdapter does seem to fix the problem. (Or to make it so that I can't reproduce it anyhow.)

It looks like there are plenty of opportunities for the value of @prepared_statements to be read inconsistently during the processing of a single query, whether or not the query cache is enabled. That's why I'm still puzzled.

97jaz added a commit to 97jaz/rails that referenced this issue Aug 5, 2019
In system tests, a single database connection is shared among all
the server threads. A call to unprepared_statement temporarily
modifies an instance variable on the connection object, which is
then visible to other concurrently running threads. This leads to
a situation where prepared statements may end up with the wrong
binds.

Addresses rails#36763
97jaz added a commit to 97jaz/rails that referenced this issue Aug 6, 2019
In system tests, a single database connection is shared among all
the server threads. A call to unprepared_statement temporarily
modifies an instance variable on the connection object, which is
then visible to other concurrently running threads. This leads to
a situation where prepared statements may end up with the wrong
binds.

Addresses rails#36763
97jaz added a commit to 97jaz/rails that referenced this issue Aug 6, 2019
In system tests, a single database connection is shared among
all the server threads. Since connections have per-instance
mutable state, this leads to race conditions, as in rails#36763.

This patch changes the connection pool to wrap shared
connections so that all access to them is synchronized on
the connection's monitor lock.
97jaz added a commit to 97jaz/rails that referenced this issue Aug 6, 2019
In system tests, a single database connection is shared among all
the server threads. A call to unprepared_statement temporarily
modifies an instance variable on the connection object, which is
then visible to other concurrently running threads. This leads to
a situation where prepared statements may end up with the wrong
binds.

Addresses rails#36763
@97jaz
Copy link
Contributor Author

97jaz commented Aug 14, 2019

Anyone have an opinion on the solution offered in #36871?

@rafaelfranca rafaelfranca modified the milestones: 6.0.0, 6.0.1 Aug 15, 2019
97jaz added a commit to 97jaz/rails that referenced this issue Aug 16, 2019
This fixes a race condition in system tests where prepared
statements can be incorrectly parameterized when multiple
threads observe the mutation of the @prepared_statements
instance variable on the connection.

Fixes rails#36763
SamSaffron added a commit to SamSaffron/rails that referenced this issue Sep 15, 2019
Per rails#36949 we introduce a race condition fix for rails#36763

This refines the fix to avoid using Concurrent::ThreadLocalVar

The implementation in the concurrent lib is rather expensive, culminating in
a finalizer per object that spins off a thread to do cleanup work.

None of this expense is needed as we can simply implement
the desired behavior using Ruby primitives. Additionally this moves to a Fiber bound implementation vs a thread bound implementation, something that is not desired for this particular usage.
kaspth pushed a commit that referenced this issue Sep 15, 2019
Per #36949 we introduce a race condition fix for #36763

This refines the fix to avoid using Concurrent::ThreadLocalVar

The implementation in the concurrent lib is rather expensive, culminating in
a finalizer per object that spins off a thread to do cleanup work.

None of this expense is needed as we can simply implement
the desired behavior using Ruby primitives. Additionally this moves to a Fiber bound implementation vs a thread bound implementation, something that is not desired for this particular usage.
rafaelfranca pushed a commit that referenced this issue Sep 16, 2019
Per #36949 we introduce a race condition fix for #36763

This refines the fix to avoid using Concurrent::ThreadLocalVar

The implementation in the concurrent lib is rather expensive, culminating in
a finalizer per object that spins off a thread to do cleanup work.

None of this expense is needed as we can simply implement
the desired behavior using Ruby primitives. Additionally this moves to a Fiber bound implementation vs a thread bound implementation, something that is not desired for this particular usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants