Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "cannot start a transaction within a transaction" on cached connection #764

Open
azavorotnii opened this issue Dec 13, 2019 · 3 comments

Comments

@azavorotnii
Copy link
Contributor

azavorotnii commented Dec 13, 2019

Assuming we run this piece of code:

			ctx, cancel := context.WithCancel(ctx)
			tx, err := db.BeginTx(ctx, nil)

it is possible to get at the same time context.Cancelled error and started transaction. This will happen only if we cancel context very fast.

This happens because context cancellation is inherently racy as performed not in same go-routine where statement is executed.

Proposed fix #765

@rittneje
Copy link
Collaborator

rittneje commented Dec 16, 2019

@azavorotnii If I understand your bug report correctly, this is not a bug with BeginTx specifically, but rather some faulty logic in (*SQLiteStmt).exec.

go-sqlite3/sqlite3.go

Lines 1893 to 1905 in 590d44c

select {
case rv := <- resultCh:
return rv.r, rv.err
case <-ctx.Done():
select {
case <-resultCh: // no need to interrupt
default:
// this is still racy and can be no-op if executed between sqlite3_* calls in execSync.
C.sqlite3_interrupt(s.c.db)
<-resultCh // ensure goroutine completed
}
return nil, ctx.Err()
}

The existing code erroneously concludes that if we called sqlite3_interrupt, then the pending operation will definitely fail so it doesn't bother looking at the result. This is flawed (1) because of the inherent raciness, and (2) because the SQLite docs themselves say that sqlite3_interrupt will not necessarily interrupt a pending operation. "If an SQL operation is very nearly finished at the time when sqlite3_interrupt() is called, then it might not have an opportunity to be interrupted and might continue to completion." https://www.sqlite.org/c3ref/interrupt.html

I believe the correct fix would be to not just discard what gets pulled from resultCh and instead only return ctx.Err() if we receive SQLITE_INTERRUPT from the actual operation.

I have also sent a question to the SQLite mailing list about the behavior when BEGIN gets interrupted to see if we need to explicitly roll back in that case or not.

ETA: The SQLite maintainers say that if you interrupt a call to BEGIN, the transaction is aborted. So calling ROLLBACK in this case is unnecessary (though harmless).

@azavorotnii
Copy link
Contributor Author

azavorotnii commented Dec 16, 2019

I believe the correct fix would be to not just discard what gets pulled from resultCh and instead only return ctx.Err() if we receive SQLITE_INTERRUPT from the actual operation.

@rittneje that was my initial thought, but for some reason instead of SQLITE_INTERRUPT I got SQLITE_DONE. That could be race in my test setup though (not same test as in this PR). Let me check again.

@azavorotnii
Copy link
Contributor Author

@rittneje done as you proposed. I did mistake in my initial test, now it should be stable.

neilalexander added a commit to matrix-org/dendrite that referenced this issue Aug 28, 2020
neilalexander added a commit to matrix-org/dendrite that referenced this issue Aug 28, 2020
…1363)

* Add some error wrapping to sync API

* Don't use request context for BeginTx until mattn/go-sqlite3#764 is fixed
mattn pushed a commit that referenced this issue Aug 28, 2020
)

* Fix "cannot start a transaction within a transaction" issue

[why]
If db.BeginTx(ctx, nil) context is cancelled too fast, "BEGIN" statement can be
completed inside DB, but we still try to cancel it with sqlite3_interrupt.
In such case we get context.Cancelled or context.DeadlineExceeded from exec(),
but operation really completed. Connection returned into pool, and returns "cannot
start a transaction within a transaction" error for next db.BeginTx() call.

[how]
Handle status code returned from cancelled operation.

[testing]
Added unit-test which reproduces issue.

* Reduce TestQueryRowContextCancelParallel concurrency

[why]
Tests times out in travis-ci when run with -race option.
anandv96 pushed a commit to anandv96/dendrite that referenced this issue Sep 1, 2020
…atrix-org#1363)

* Add some error wrapping to sync API

* Don't use request context for BeginTx until mattn/go-sqlite3#764 is fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants