Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move advisory lock to it's own connection #38235

Merged
merged 1 commit into from
Jan 23, 2020

Conversation

eileencodes
Copy link
Member

This PR moves advisory lock to it's own connection instead of
ActiveRecord::Base to fix #37748. As a note the issue is present on
both mysql and postgres. We don't see it on sqlite3 because sqlite3
doesn't support advisory locks.

The underlying problem only appears if:

  1. the app is using multiple databases, and therefore establishing a new
    connection in the abstract models
  2. the app has a migration that loads a model (ex Post.update_all)
    which causes that new connection to get established.

This is because when Rails runs migrations the default connections are
established, the lock is taken out on the ActiveRecord::Base
connection. When the migration that calls a model is loaded, a new
connection will be established and the lock will automatically be
released.

When Rails goes to release the lock in the ensure block it will find
that the connection has been closed. Even if the connection wasn't
closed the lock would no longer exist on that connection.

We originally considered checking if the connection was active, but
ultimately that would hide that the advisory locks weren't working
correctly because there'd be no lock to release.

We also considered making the lock more granular - that it only blocked
on each migration individually instead of all the migrations for that
connection. This might be the right move going forward, but right now
multi-db migrations that load models are very broken in Rails 6.0 and
master.

John and I don't love this fix, it requires a bit too much knowledge of
internals and how Rails picks up connections. However, it does fix the
issue, makes the lock more global, and makes the lock more resilient to
changing connections.

Co-authored-by: John Crepezzi john.crepezzi@gmail.com

cc/ @rafaelfranca @tenderlove @jhawthorn @matthewd @seejohnrun

@eileencodes
Copy link
Member Author

Ok I moved the AdvisoryLockBase into its own class, added _internal, and abstract_class. This fixed the test failures in Railties due to framework loading.

@eileencodes eileencodes force-pushed the fix-advisory-lock branch 2 times, most recently from 2557dba to 39704bd Compare January 14, 2020 22:09
@matthewd
Copy link
Member

My grasp of the connection model is slipping (I guess I don't understand how ApplicationRecord's connection-twiddling would affect a connection belonging to its superclass).. but does this literally create an additional TCP connection to the database server? If so, that seems Rather Undesirable.. a database in maintenance mode might only allow a single connection (or a single connection from a maintainer-level user).

IIRC there was a specific reason it was important for the lock to wrap the full set of migrations, but I don't recall what it was -- we might have to find the corresponding PR to check that.

@eileencodes
Copy link
Member Author

My grasp of the connection model is slipping (I guess I don't understand how ApplicationRecord's connection-twiddling would affect a connection belonging to its superclass)

A default connection is opened by https://github.com/rails/rails/blob/master/activerecord/lib/active_record/railtie.rb#L204 on ActiveRecord::Base. Then the advisory lock code takes out a lock https://github.com/rails/rails/blob/master/activerecord/lib/active_record/migration.rb#L1379. In multi-db we need a connects_to in order to connect the replicas in ApplicationRecord. When a migration runs that calls a model like Post it will load ApplicationRecord and call connects_to. connects_to eventually calls to establish_connection in https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L1042. establish_connection will look up the pool by the connection_specification_name. It will find that and remove it here https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L1049

Once it's removed it establishes a new connection, which means that we cannot hold onto the lock. Even worse, the way migrations work we need a lock for each connection so if the lock was meant to be global, even if it wasn't throwing an exception due to a closed connection, we're still not protecting all migrations from a concurrent migration.

but does this literally create an additional TCP connection to the database server

Yea it does, and I don't see any other way around this. I'm open to other ideas but everything we explored was not possible.

At the moment the only fix for this is saying we do not support advisory locks for multiple databases. We have no way of knowing if an app is multi-db so we can't even check for that and warn the user.

@eileencodes
Copy link
Member Author

@matthewd do you have any ideas how we can fix the problem? We either need a more global lock (that protects all migrations regardless of connection) or a more granular lock (only protects the current running migration).

@matthewd
Copy link
Member

I don't understand how ApplicationRecord's connection-twiddling would affect a connection belonging to its superclass

Okay, it turns out my confusion is because that behaviour's special-cased here:

self == Base || defined?(ApplicationRecord) && self == ApplicationRecord

Given that we're re-configuring the same-named connection, I wonder if we could/should recognise that the new connection config matches the already-active one, and therefore we don't need to dis/reconnect at all?

Otherwise I don't think I can suggest anything that wouldn't involve some sort of change in connection management, so if that's off the table, I guess it's just a matter of choosing the less-bad trade-off (which I agree is this PR -- we have to support multi-DB migrations), and apologising to any future reader who's here because migrations started taking two connections. 🤷🏻‍♂️

@eileencodes
Copy link
Member Author

Given that we're re-configuring the same-named connection, I wonder if we could/should recognise that the new connection config matches the already-active one, and therefore we don't need to dis/reconnect at all?

So don't remove the connection if it already exists? I did wonder why we do that in the first place. I think it's because you should establish a new connection with a different config but the same name. Eventually maybe that shouldn't be possible. Now that we're using objects everywhere it might be a lot easier for us to inspect the connection db config and figure out if it is the same and if it is the same don't remove it.

I think that's a little more involved and probably an improvement for 6.1. I'm thinking we should merge this for 6.0 and 6.1 because migrations are effectively broken and then work on not removing the connection in 6.1 and undoing this change if we're successful. Does that sound like a fair compromise? I'll add a CHANGELOG as well.

This PR moves advisory lock to it's own connection instead of
`ActiveRecord::Base` to fix rails#37748. As a note the issue is present on
both mysql and postgres. We don't see it on sqlite3 because sqlite3
doesn't support advisory locks.

The underlying problem only appears if:

1) the app is using multiple databases, and therefore establishing a new
connetion in the abstract models
2) the app has a migration that loads a model (ex `Post.update_all`)
which causes that new connection to get established.

This is because when Rails runs migrations the default connections are
established, the lock is taken out on the `ActiveRecord::Base`
connection. When the migration that calls a model is loaded, a new
connection will be established and the lock will automatically be
released.

When Rails goes to release the lock in the ensure block it will find
that the connection has been closed. Even if the connection wasn't
closed the lock would no longer exist on that connection.

We originally considered checking if the connection was active, but
ultimately that would hide that the advisory locks weren't working
correctly because there'd be no lock to release.

We also considered making the lock more granular - that it only blocked
on each migration individually instead of all the migrations for that
connection. This might be the right move going forward, but right now
multi-db migrations that load models are very broken in Rails 6.0 and
master.

John and I don't love this fix, it requires a bit too much knowledge of
internals and how Rails picks up connections. However, it does fix the
issue, makes the lock more global, and makes the lock more resilient to
changing connections.

Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
@eileencodes eileencodes merged commit 59d54b3 into rails:master Jan 23, 2020
@eileencodes eileencodes deleted the fix-advisory-lock branch January 23, 2020 20:08
eileencodes added a commit that referenced this pull request Jan 23, 2020
Move advisory lock to it's own connection
@trammel
Copy link

trammel commented Mar 20, 2020

This fix got merged to master, did it also make it in to the 6.0.2.2 branch/release?

@eileencodes
Copy link
Member Author

did it also make it in to the 6.0.2.2 branch/release?

No, 6.0.2.2 was only security and 6.0.2.1 was also only security. This will make it into the 6.0.3 release.

eileencodes added a commit to eileencodes/rails that referenced this pull request Oct 12, 2022
In rails#38235 I moved advisory locks to their own named connections, then
in rails#39758 the advisory lock was left on Base.connection but then moved
it it's own connection handler. I believe with rails#45450 that this change
was made obsolete and can be returned to the prior behavior without
having to open an additional connection. The tests added pass and I also
tested this in my local demo to ensure that this is working correctly.

When I originally changed the behavior here Matthew noted that this
could be surprising for some setups that expect only one connection for
a running migration. I thought there was an issue related to this but I
can't find it.
stevschmid added a commit to hakuna/kari that referenced this pull request May 13, 2023
1) Faster connection retrieval to establish advisory lock:

Migration of multiple tenants (schemas) has slowed down considerably in Rails 6.1+, see following comment: rails-on-services/apartment#147 (comment)
Rails 6.0 -> 6.1 introduced a change in how the advisory lock to prevent concurrent migrations is established, see PR: rails/rails@45add34
The PR seems to address the issue that the previous solution using a global ActiveRecord::Base AdvisoryLockBase changes the internal ActiveRecord::Base internal state, leaking into the Rails application, messing with the sharding configuration.
Reverting back to AdvisoryLockBase (as outlined in the first link) improves migration performance significantly but comes with the earlier mentioned side-effect. The issue with the new Rails 6.1 solution stems from the fact that a new connection pool is created for every migration.
If we "cache" this connection pool, similar how the default ActiveRecord connection pool is cached, we can re-use is in the same thread and thus for all tenant schemas. Rails 7 offers ActiveSupport::IsolatedExecutionState (thread/fiber state), for Rails 6.1 we fall back to Thread.current.

Why a new, separate connection/connection pool for this advisory lock is required in the first place, is explained in the PR introducing the original AdvisoryLockBase solution: rails/rails#38235

2) Tenant-specific lock_id for advisory lock:

Improves the performance quite a bit. A side bonus is that parallel migrations of tenant should be possible (not tested).

Separate note: A follow-up PR added `pool&.disconnect` to the `with_advisory_lock_connection` function: rails/rails#40101
I cannot reproduce the original issue, connections are cleaned up with this solution after the migration ends, even without the explicit disconnect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple databases and migrations using models problem
4 participants