Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: allow enabling prepared statements for specific queries when globally disabled. #36795

Closed

Conversation

xaviershay
Copy link
Contributor

@xaviershay xaviershay commented Jul 29, 2019

Summary

When prepared statements are disabled globally for postgres, the prepare argument to exec_query (and any others) is now respected to enable them to be selectively enabled.
I don't believe there is a way to do this otherwise. I believe this is a bug in the current method, given the expectations implied by its method signature.

Per discussion in #24893 this is a private API, so I didn't want to add
extra test coverage for it to imply otherwise. (Separate to this PR, I would probably make an argument that this API is effectively public given widespread usage of exec_query.)

Other Information

We are currently using a monkey patch to apply this in our code base. To provide more context on this PR, and also for the benefit of anyone who finds this issue, I've include below our justification and specific patch for 5.2.3:

# AFAICT, the only way to use prepared statements directly is to use what is
# technically a private API. That API accepts a `prepare` argument to disabled
# prepared statements when enabled globally, but does _not_ enable them when
# disabled globally. We want to have it disabled globally because in general it
# doesn't give much benefit for our use cases but causes issues on deploy/migrate.
# 
# Below I've monkey patched the offending method to fix the bug and enable us
# to opt-in to prepared statements. Since this method is private, we can't
# subclass PostgreSQLAdapter to override it, which may hvae been a nicer
# approach. To mitigate future risk, I've explicitly locked it to our current
# version of Rails. Upgraders will need to verify it is still valid for future
# versions.
# 
# We could also run off our own fork of Rails, but that seems bad.
# 
# An alternative solution could possibly be to retry any transaction that fails
# with ActiveRecord::PreparedStatementCacheExpired. If we were building from
# scratch I would advocate for this, but given our existing code base I fear
# that we may be doing non-database things inside database transactions, so
# auto-retrying may cause unexpected side effects. A more rigorous audit would
# be required.
# 
# Another alternative solution would be to use direct interpolation rather than
# binds. Putting aside the security concerns of doing it this way, when I tried
# it it did not result in identical behaviour (particularly around date times).

if Rails::VERSION::STRING != "5.2.3"
  raise "Rails version was upgraded, need to verify that our monkey patch still applies."
end

# lib/active_record/connection_adapters/postgresql_adapter.rb
class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter < ActiveRecord::ConnectionAdapters::AbstractAdapter
  private

  def execute_and_clear(sql, name, binds, prepare: false)
    # Next two lines are new
    if prepare
      result = exec_cache(sql, name, binds)
    elsif without_prepared_statement?(binds)
      result = exec_no_cache(sql, name, [])
    elsif !prepare
      result = exec_no_cache(sql, name, binds)
    else
      result = exec_cache(sql, name, binds)
    end
    ret = yield result
    result.clear
    ret
  end
end

@rafaelfranca
Copy link
Member

exec_query is public API, can we write the test using it?

@xaviershay
Copy link
Contributor Author

xaviershay commented Jul 29, 2019

exec_query is public API, can we write the test using it?

oh sorry, I must have misinterpreted the comments on the other thread. I will endeavour to write a test! Will probably look something like #24893 (comment)

@matthewd
Copy link
Member

I think the three-argument form of exec_query might be effectively private API, because IIRC there's no public way to construct the argument it's expecting (and I think we've also previously changed its structure between releases).


If prepare: true now means prepare is forced, doesn't that invalidate the rest of the conditions? e.g. how is the else ever reached? Should prepare really override the existing "zero binds = no prepare" logic from without_prepared_statement??


Since this method is private, we can't subclass PostgreSQLAdapter to override it

Huh? 😕

@xaviershay
Copy link
Contributor Author

I think the three-argument form of exec_query might be effectively private API, because IIRC there's no public way to construct the argument it's expecting (and I think we've also previously changed its structure between releases).

ah that's helpful, thanks.

If prepare: true now means prepare is forced, doesn't that invalidate the rest of the conditions?

Yes, that seems kind of the point? Maybe see response below about zero binds = no prepare question.

e.g. how is the else ever reached?

Good catch, it isn't anymore, can be removed I think.

Should prepare really override the existing "zero binds = no prepare" logic from without_prepared_statement??

I thought about this and came out on the side of "yes". I feel like if I'm using such a low level API, I'd expect it to "do what I say, I know what I'm doing".

Since this method is private, we can't subclass PostgreSQLAdapter to override it

Huh? 😕

oh look at that, it totally works. Sorry, no idea why I thought that wasn't possible!

@matthewd
Copy link
Member

I feel like if I'm using such a low level API, I'd expect it to "do what I say, I know what I'm doing".

While not necessarily unreasonable, is that what our own existing caller(s) are expecting?

AFAICS, this change flips the method from having two modes "use prepare if you like" and "definitely don't use prepare", to instead "definitely do use prepare" and "maybe use prepare". While I see how that might fit better with a mental model that has prepare defaulted to off, I'm not sure it's obviously superior, nor obviously what the method should always have done. (Indeed, cbcdecd makes the "definitely don't" sound rather important.)

It feels like we're heading towards a point where we end up with a much more low level API than we had before, where its purpose seemed to be encapsulating the config check. I do see the callers look to be doing a fair bit of that work themselves these days... but that just leaves me wondering if it's even worth having such a method, whose only real job is to call one of two other methods, chosen based on a caller-supplied boolean argument. 🤷🏻‍♂️

@xaviershay
Copy link
Contributor Author

I feel like if I'm using such a low level API, I'd expect it to "do what I say, I know what I'm doing".

While not necessarily unreasonable, is that what our own existing caller(s) are expecting?

Good question. I tried to find existing uses.

I think select_prepared is the only case that would pass prepared: true here:
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb#L68

It appears the intent here though was that for any query that shouldn't be prepared, it is caught by the visitor and setting of preparable variable to determine calling. The caller is also checking the config level prepared_statements. If I've read that right (which I very well might not have!), then I believe this change is safe.

This usage in postgresql/connection_test I think is a noop, because of the surrounding conditional:
https://github.com/rails/rails/blob/master/activerecord/test/cases/adapters/postgresql/connection_test.rb#L132

Interestingly, the sqlite3 adapter appears to respect prepare: true as "always prepare":
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/sqlite3/database_statements.rb#L45

AFAICS, this change flips the method from having two modes "use prepare if you like" and "definitely don't use prepare", to instead "definitely do use prepare" and "maybe use prepare".

I started confusing myself in trying to respond, so let's make a table:

Before

Global Config prepare arg binds result
false false empty no prepare
false false non-empty no prepare
false true empty no prepare
false true non-empty no prepare
true false empty no prepare
true false non-empty no prepare
true true empty no prepare
true true non-empty prepare

After

Global Config prepare arg binds result
false false empty no prepare
false false non-empty no prepare
false true empty prepare
false true non-empty prepare
true false empty no prepare
true false non-empty no prepare
true true empty prepare
true true non-empty prepare

I think it's quite possible that this behaviour is fine (see above analysis), but for "smallest possible change" I strictly only need the (false, true, non-empty) case to change for my needs. Seems unsatisfying though - maybe someone actually does want to prep bind-less SQL? The docs talk about querying parsing and analysis as use case for preparation, not binds.

While I see how that might fit better with a mental model that has prepare defaulted to off, I'm not sure it's obviously superior, nor obviously what the method should always have done. (Indeed, cbcdecd makes the "definitely don't" sound rather important.)

per above

It feels like we're heading towards a point where we end up with a much more low level API than we had before, where its purpose seemed to be encapsulating the config check. I do see the callers look to be doing a fair bit of that work themselves these days... but that just leaves me wondering if it's even worth having such a method, whose only real job is to call one of two other methods, chosen based on a caller-supplied boolean argument. 🤷🏻‍♂️

I don't feel I have enough context to comment intelligently on this point.

So where does that leave us? I'm honestly not sure :( Some other alternatives that may be worth considering:

  • unprepared_statement exists as method that takes a block, with no counterpart prepared_statement - could try adding that? Since it toggles the "global" flag (should be global to connection, but see Rails 6 RC2 regression: (postgres) prepared statement improperly parameterized #36763 for a possibly relevant bug) it could be used to address strictly the (false, true, non-empty) case above, and could be doc'ed to say "but actually bind-less statements are never prepared". That feels unsatisfying for the same reasons as above.
  • A new method that isn't exec_query, something like exec_prepared_query.

Thank you all for your engagement 🙏

@matthewd
Copy link
Member

Quick note because I only just re-connected these dots: while I haven't dug through history to confirm, I'm pretty sure the "never prepare empty binds" behaviour is an earlier attempt at cbcdecd. Given that, and the newer more complete behaviour in that subsequent commit, I think ditch the empty-bind check & special case entirely.

As we've noted, the only caller that passes prepare: true has already checked the connection's prepared_statements value, so I agree it's fine for this method to always prepare when given a true.

Combining those two points, I think the condition does in fact collapse to:

if prepare
  result = exec_cache(sql, name, binds)
else
  result = exec_no_cache(sql, name, binds)
end

With that done, I'm strongly tempted to inline those two methods -- when they were split they were very different, but over time they've accumulated a lot of identical parallel content -- but that can be a thought for another day.

@matthewd
Copy link
Member

Thanks for writing out the tables, btw -- I'd managed to confuse myself into thinking that [true, false, non-empty] was changing to prepare, despite having earlier noted that the else was no longer a factor, so the explicit enumeration helped me think it through properly and see where I'd gone wrong.

@xaviershay xaviershay force-pushed the selective-prepared-statements branch from 2303074 to 8c19327 Compare August 2, 2019 10:07
…n disabled globally.

There is no way to do this otherwise.
@xaviershay xaviershay force-pushed the selective-prepared-statements branch from 8c19327 to a3b355e Compare August 2, 2019 10:11
@xaviershay
Copy link
Contributor Author

Updated with test coverage. It doesn't quite simplify down to two branches because of ActiveRecord::StatementCacheTest#test_unprepared_statements_dont_share_a_cache_with_prepared_statements, which fails if you pass binds through (rather than []). I spent a little bit of time trying to trace what's going on - it appears to be interpolating the bind params before getting to this method, but still passing the bind params along. That causes an error if sent to PG: ActiveRecord::StatementInvalid: PG::IndeterminateDatatype: ERROR: could not determine data type of parameter $1.

@97jaz
Copy link
Contributor

97jaz commented Aug 20, 2019

After #36949, would it be sufficient merely to add a ActiveRecord::ConnectionAdapters::AbstractAdapter#prepared_statement method (analogous to unprepared_statement), like:

def prepared_statement
  @prepared_statement_status.bind(true) { yield }
end

@97jaz
Copy link
Contributor

97jaz commented Aug 20, 2019

I think my suggestion will work, with an additional change. Right now, we have this code in AbstractAdapter:

        if self.class.type_cast_config_to_boolean(config.fetch(:prepared_statements) { true })
          @prepared_statement_status = Concurrent::ThreadLocalVar.new(true)
          @visitor.extend(DetermineIfPreparableVisitor)
        else
          @prepared_statement_status = Concurrent::ThreadLocalVar.new(false)
        end

Specifically, DetermineIfPreparableVisitor -- which is what adds the preparable property to the visitor -- is only available if the default is true. I think it can be unconditionally added. (But I'm not really sure why it's conditionally added now. I recently changed that code, but I preserved the fact that it was only conditionally added.)

To test this, I added a test to prepared_statements_disabled_test.rb:

  def test_prepared_statements_can_be_forced
    conn = Computer.connection
    rel = Computer.where(id: 1)

    unprepared_sql = conn.to_sql(rel.arel)
    prepared_sql = conn.prepared_statement do
      conn.to_sql(rel.arel)
    end

    assert_not unprepared_sql.include?('$1')
    assert prepared_sql.include?('$1')
  end

This seems like a pretty janky way to detect if a statement was prepared or not, but I'm not sure what the proper way is to do this. But the real issue here is that this test will pass even if DetermineIfPreparableVisitor isn't added to the arel_visitor. The reason is that connection.to_sql doesn't inspect the value of visitor.preparable. It just looks at prepared_statements. This seems inconsistent, but I'm also curious: why is visitor.preparable necessary, at all? Why not just use the value of prepared_statements?

@xaviershay
Copy link
Contributor Author

Your logic sounds good to me, but I don't have a lot of context here. (Nor do I know the answers to your questions.)

@rails-bot
Copy link

rails-bot bot commented Dec 17, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Dec 17, 2019
@talpava
Copy link

talpava commented Dec 17, 2019 via email

@rails-bot rails-bot bot removed the stale label Dec 17, 2019
@rails-bot
Copy link

rails-bot bot commented Mar 16, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Mar 16, 2020
@rails-bot rails-bot bot closed this Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants