Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgres] Issue rollback after migration error #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bclarkx2
Copy link

@bclarkx2 bclarkx2 commented Sep 20, 2023

Dependencies

None

Documentation

Aims to close golang-migrate#581 on the upstream repository.

Description

Read the attached issue for far more details. Long story short, this PR tweaks the postgres, pgx/v4 and pgx/v5 migrate database drivers to issue a ROLLBACK on the db connection any time a migration statement returns an error. If operating in multi-statement mode, this allows the connection to leave the aborted state that the explicit transaction was left in by the error running the migration statement. If operating in simple query mode, the additional rollback will be a no-op since the implicit transaction created will have already been automatically been rolled back on error.

Any error generated by the rollback itself will be appended to the root migration error as a multierror and returned to the Migrate instance for reporting to the user.

Testing Considerations

Validate that the postgres/pgx/pgx5 tests pass:
make test-short DATABASE='postgres pgx pgx5'

This archive contains a set of dockerized Golang test environments using the pgx/v5 and postgres migrate database drivers (pgx/v4 is assumed to work similarly to pgx/v5). Each one provides a self-contained docker compose environment in which we attempt to run a migration containing an error against a Postgres database, and logs the results to the console.

tests.zip

The run.sh script in each directory will handle orchestrating the test. In each scenario, the goal is to observe the following behavior:

  1. No error similar to the following appears: "current transaction is aborted, commands ignored until end of transaction block in line 0: SELECT pg_advisory_unlock($1)". This is a sign that the connection responsible for executing the migrations has been returned to a non-aborted state and is able to also relinquish the advisory lock protecting the migration table.
  2. After migrations have been attempted, no advisory locks are found in the pg_locks view. This is a sign that both (A) the migrate instance was able to successfully clean up its advisory lock and (B) (at least in the case of the pq driver, where a WithConnection method exists so I can force the pool to use the same connection after migrations have finished) the connection is able to be used successfully to issue new queries against the database after the migration work is complete.

Versioning

I think this represents a patch bump version increment. However, this decision will ultimately be made on the upstream repo so there's not much to discuss at this point on the matter.


@bclarkx2 bclarkx2 self-assigned this Sep 20, 2023
@bclarkx2 bclarkx2 changed the title Issue rollback after migration err [postgres] Issue rollback after migration error Sep 20, 2023
@bclarkx2 bclarkx2 force-pushed the bugfix/issue-rollback-after-migration-err branch from e69d0ee to 723584e Compare September 20, 2023 15:43
// back.
if _, rollbackErr := p.conn.ExecContext(ctx, "ROLLBACK"); rollbackErr != nil {
rollbackErr = fmt.Errorf("failed to rollback migration tx: %w", rollbackErr)
return multierror.Append(migrationErr, rollbackErr)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a simple example of what this output might look like from a basic console logger in the case where both the migration statement and this explicit rollback generated an error:

2023/09/20 15:20:30 Database connection established
2023/09/20 15:20:30 Error running migrations: 2 errors occurred:
        * migration failed: division by zero in line 0:
SELECT 1/0; -- Divide by zero error!

 (details: pq: division by zero)
        * failed to rollback migration tx: pq: column "doesnt-exist" does not exist

2023/09/20 15:20:30 Closing database connection

I aimed to make this message clear that what failed wasn't a "rollback of the migrate schema version itself" , but just a rollback of the transaction being used in the up migration that was being attempted. Happy to take feedback on this error to make that clearer, though!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only nit on the wrapped error message is that failed to is kinda noise, as suggested by https://github.com/uber-go/guide/blob/master/style.md#error-wrapping, but it looks like this repo uses Failed to... already https://github.com/search?q=repo%3Anicheinc%2Fmigrate+Errorf+%25w&type=code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me! In bfdd2f0 I changed the wording from "failed to rollback migration tx: " to "rolling back migration tx: ".

@bclarkx2 bclarkx2 force-pushed the bugfix/issue-rollback-after-migration-err branch 2 times, most recently from 240dfaf to 8d90b84 Compare September 20, 2023 20:13
@bclarkx2 bclarkx2 marked this pull request as ready for review September 20, 2023 20:14
@bclarkx2 bclarkx2 force-pushed the bugfix/issue-rollback-after-migration-err branch from 8d90b84 to 0b3b38c Compare September 20, 2023 20:16
// back.
if _, rollbackErr := p.conn.ExecContext(ctx, "ROLLBACK"); rollbackErr != nil {
rollbackErr = fmt.Errorf("failed to rollback migration tx: %w", rollbackErr)
return multierror.Append(migrationErr, rollbackErr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do enough CFA-by-eye to know that migrationErr can't be nil here, but I wonder if we could protect against that my moving migrationErr = database.Error{OrigErr: err, Err: "migration failed", Query: statement} to the initialization of migrationErr - or if we'd even want to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing of note is that hashicorp (in their infinite wisdom) has made Append work such that if migrationErr were nil here it would still at least generate a multierror with a single error inside of it: https://github.com/hashicorp/go-multierror/blob/v1.1.1/append.go#L36-L39

à la https://go.dev/play/p/HOmHw7-3Dbf

That being said, defining migrationErr from the get-go as the version derived from a non-pgconn error to start with and overwriting with a more specific database.Error when possible is fine by me as well!

Switched over accordingly in 4813cdb

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing of note is that hashicorp (in their infinite wisdom) has made Append work such that if migrationErr were nil here it would still at least generate a multierror with a single error inside of it: https://github.com/hashicorp/go-multierror/blob/v1.1.1/append.go#L36-L39

à la https://go.dev/play/p/HOmHw7-3Dbf

That being said, defining migrationErr from the get-go as the version derived from a non-pgconn error to start with and overwriting with a more specific database.Error when possible is fine by me as well!

Switched over accordingly in 4813cdb

// back.
if _, rollbackErr := p.conn.ExecContext(ctx, "ROLLBACK"); rollbackErr != nil {
rollbackErr = fmt.Errorf("failed to rollback migration tx: %w", rollbackErr)
return multierror.Append(migrationErr, rollbackErr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only nit on the wrapped error message is that failed to is kinda noise, as suggested by https://github.com/uber-go/guide/blob/master/style.md#error-wrapping, but it looks like this repo uses Failed to... already https://github.com/search?q=repo%3Anicheinc%2Fmigrate+Errorf+%25w&type=code

@nathanjcochran
Copy link

🚀

@bclarkx2
Copy link
Author

@wmarshall made both tweaks you suggested! I'll squash the extra commits if you don't have any further items to discuss so we just propose one commit upstream 👍

@wmarshall
Copy link

@wmarshall made both tweaks you suggested! I'll squash the extra commits if you don't have any further items to discuss so we just propose one commit upstream 👍

:shipit: 🐸

@wmarshall
Copy link

Looks like the test failures here are running out of GHA disk space?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants