Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Postgres] StandardLockManager does not work for parallel machines / Liquibase run with multiple instances in parallel lead to error messages (e.g. "already exists") #2315

Open
eaglerainbow opened this issue Jan 4, 2022 · 6 comments

Comments

@eaglerainbow
Copy link

eaglerainbow commented Jan 4, 2022

Environment

Liquibase Version: 4.5.0

Liquibase Integration & Version: spring-boot 2.6.2

Liquibase Extension(s) & Version: none

Database Vendor & Version: Postgres 12

Operating System Type & Version: Linux Ubuntu

Description

We have a PostgreSQL 12 server which contains a lot of schemas. Each schema is of the same structure (e.g. changelog).
On startup of the application, it needs to be ensured that the schemas are on the latest state. There may be multiple machines (!) starting up in parallel. Each machine then will have its own liquibase instance running.
During the process, it may happen that two machines try to work on the same schema (however, they will have the same changelog then - ensured by deployment&configuration) .

As of today, we observe that error messages may be raised because the first machine has already performed all updates of the changelog, whilst the second machine tries to apply them a second time.
Typical symptoms are:

  • Table is created, but is reported to be already existent.
  • Table is attempted to be deleted, but was already dropped
  • Index shall be created, but already exists
  • ...

Looking at

we are wondering why only for Oracle databases (introduced with 04e902d#diff-dd5bdbd0ad22e0eb0f99da0acdb40102ce01c5ecf383d35f1138e0c6cad8f047R32), an Exclusive Explicit locking is performed (using "FOR UDPATE"). Postgres also provides such a feature for a long time (proof). I would assume several other databases also support this option (perhaps with different dialects).
The code-review currently tells us that with only shared-locking the record on the table databasechangelog , two machines might enter the critical path at
Boolean locked = Scope.getCurrentScope().getSingleton(ExecutorService.class).getExecutor("jdbc", database).queryForObject(
at the same point in time. This would be one possible reason why we see that two liquibase servers are able to try updating the same schema at the same point in time.

Steps To Reproduce

Code Review, see links above

Actual Behavior

Two parallel liquibase machines may run the same changes in the changelog on the same schema (hence failing).

Expected/Desired Behavior

Locking works as expected (parallel execution on the same schema is prohibited) and subsequent executions detect that nothing has to be done.

@kataggart
Copy link
Contributor

Thank you @eaglerainbow for all the detail you have provided here. We are reviewing.

By the way if you haven't yet visited https://forum.liquibase.org/ we would love to have you join in the conversations over there. I know other members would clearly benefit from your experience.

Thanks again!

@eaglerainbow
Copy link
Author

Thank you eaglerainbow for all the detail you have provided here. We are reviewing.

Thanks for your response. The coding looks strangely asymmetric.

By the way if you haven't yet visited https://forum.liquibase.org/ we would love to have you join in the conversations over there. I know other members would clearly benefit from your experience.

Will have a look, if I can help there somewhere...

@schrieveslaach
Copy link
Contributor

I was looking into liquibase again because I wanted to double check if the changes of #2327 still work so that I'm able to spin up multiple OCI containers that all try to update the database schema on a PostgreSQL database.

I think that the locking works fine but ChangeLogHistoryService might be invalid because Liquibase doesn't update it correctly when the lock has been acquired. Here are some log entries, logs of a failing service and logs of the surviving service (I replaced some company internal names, so the logs do not contain all details):

Failing

info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT LOCKED FROM public.service_changelog_lock WHERE ID=1…
info   liquibase.lockservice.StandardLockService           Waiting for changelog lock…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT LOCKED FROM public.service_changelog_lock WHERE ID=1…
info   liquibase.lockservice.StandardLockService           Waiting for changelog lock…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT LOCKED FROM public.service_changelog_lock WHERE ID=1…
info   liquibase.lockservice.StandardLockService           Successfully acquired change log lock…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT MD5SUM FROM public.service_changelog WHERE MD5SUM IS NOT NULL LIMIT 1…
info   liquibase.executor.jvm.JdbcExecutor                 CREATE TABLE …
error  liquibase.changelog.ChangeSet                       Change Set liquibase/migrations/db.changelog-service-0001.xml::1::service failed.  Error: ERROR: relation "…" already exists
info   liquibase.lockservice.StandardLockService           Successfully released change log lock…
error  Database Migration FAILED                           Could not apply changeset. …
2023-02-24 09:17:46.425  13144b1b| liquibase.exception.MigrationFailedException: Migration failed for change set liquibase/migrations/db.changelog-service-0001.xml::1::service:
2023-02-24 09:17:46.425  13144b1b|      Reason: liquibase.exception.DatabaseException: ERROR: relation "…" already exists [Failed SQL: (0) CREATE TABLE public.…
2023-02-24 09:17:46.425  13144b1b|  at liquibase.changelog.ChangeSet.execute(ChangeSet.java:646)
2023-02-24 09:17:46.425  13144b1b|  at liquibase.changelog.visitor.UpdateVisitor.visit(UpdateVisitor.java:53)
2023-02-24 09:17:46.425  13144b1b|  at liquibase.changelog.ChangeLogIterator.run(ChangeLogIterator.java:83)
2023-02-24 09:17:46.425  13144b1b|  at liquibase.Liquibase.update(Liquibase.java:202)
2023-02-24 09:17:46.425  13144b1b|  at liquibase.Liquibase.update(Liquibase.java:179)
2023-02-24 09:17:46.425  13144b1b|  at liquibase.Liquibase.update(Liquibase.java:175)

Surviving

info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 CREATE TABLE public.service_changelog_lock (ID INTEGER NOT NULL, LOCKED BOOLEAN NOT NULL, LOCKGRANTED TIMESTAMP WITHOUT TIME ZONE, LOCKEDBY VARCHAR(255
info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 DELETE FROM public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 INSERT INTO public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 SELECT LOCKED FROM public.service_changelog_lock WHERE ID=1…
info   liquibase.lockservice.StandardLockService           Successfully acquired change log lock…
info   liquibase.changelog.StandardChangeLogHistoryServic  Creating database history table with name: public.service_changelog…
info   liquibase.executor.jvm.JdbcExecutor                 CREATE TABLE public.service_changelog (ID VARCHAR(255) NOT NULL, AUTHOR VARCHAR(255) NOT NULL, FILENAME VARCHAR(255) NOT NULL, DATEEXECUTED TIMESTAMP W
info   liquibase.executor.jvm.JdbcExecutor                 SELECT COUNT(*) FROM public.service_changelog_lock…
info   liquibase.executor.jvm.JdbcExecutor                 CREATE TABLE public.… (ID UUID NOT NULL, …
info   liquibase.changelog.ChangeSet                       Table … created…
info   liquibase.changelog.ChangeSet                       ChangeSet liquibase/migrations/db.changelog-service-0001.xml::1::service ran successfully in 40ms…

In the logs you can observe that both services try to apply the same changelog liquibase/migrations/db.changelog-service-0001.xml::1::service but the failing should have looked into the table again. But the locking in PostgreSQL works just fine.

Unfortunately, the test I introdubce in #2327 does not catch that case but I can reliable reproduce this issue.

@nvoxland, @kataggart, please, tell us what to do next? Create a new issue?

@filipelautert
Copy link
Collaborator

Hello @schrieveslaach - those logs were generated using which liquibase version? 4.20.0? Cause if I'm right we added some more resets to the locking mechanisms.

@schrieveslaach
Copy link
Contributor

Hi @filipelautert, these logs were generated with Liquibase 4.19.0 (sorry, I forgot to include the version number). Do you want me test it again with 4.20.0? I skimmed through the changelog and I'm not sure if you are referring to #3775 that should improve the locking mechanism.

@filipelautert
Copy link
Collaborator

Hello @schrieveslaach - it was #3396 , merged on 4.18.0 - time flies! So it should be fine on 4.19.0, and it means there is another point were we need to reset it.
I wonder if maybe in PR #2327 we should have used ChangeLogHistoryServiceFactory.getInstance().resetAll(); instead of chanlogService.reset() ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants