Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "cannot start a transaction within a transaction" issue (#764) #765

Merged
merged 2 commits into from Aug 28, 2020

Conversation

azavorotnii
Copy link
Contributor

[why]
If db.BeginTx(ctx, nil) context is cancelled too fast, "BEGIN" statement can be
completed inside DB, but we still try to cancel it with sqlite3_interrupt.
In such case we get context.Cancelled or context.DeadlineExceeded from exec(),
but operation really completed. Connection returned into pool, and returns "cannot
start a transaction within a transaction" error for next db.BeginTx() call.

[how]
If we get context cancelled on "BEGIN" statement, call "ROLLBACK" to clean-up
connection state. Don't return cancellation error from exec() if operation completed
without sqlite3_interrupt.

[testing]
Added unit-test which reproduces issue.

@coveralls
Copy link

coveralls commented Dec 14, 2019

Coverage Status

Coverage increased (+0.8%) to 52.262% when pulling 5c1abba on azavorotnii:ctx_cancel into 9bdaffc on mattn:master.

@mattn
Copy link
Owner

mattn commented Dec 16, 2019

Thanks your contribution.

@rittneje Could you please give me comment about this?

@azavorotnii
Copy link
Contributor Author

azavorotnii commented Dec 16, 2019

https://travis-ci.org/mattn/go-sqlite3/jobs/624858930?utm_medium=notification&utm_source=github_status

Doesn't look related to changes in PR at all:

The command "$HOME/gopath/bin/goveralls -repotoken 3qJVUE0iQwqnCbmNcDsjYu1nh4J4KIFXx" exited with 1.

It looks like happens randomly, now 3 jobs failed of that with no test failures.

@azavorotnii
Copy link
Contributor Author

@mattn @rittneje ping here.


for i := 0; i < 1000; i++ {
ctx, cancel := context.WithCancel(context.Background())
go cancel() // make it cancel concurrently with exec("BEGIN");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder this canceliing always works as you expected. Sleep is needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the code being tested here is inherently non-deterministic. @azavorotnii Did this test as written fail consistently prior to your changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it failed every time I tested it without fix in code, but I agree that "passed" status doesn't prove issue absence.

@rittneje
Copy link
Collaborator

rittneje commented Feb 7, 2020

Your implementation looks good. Unfortunately, I cannot think of a great way of dealing with the test deterministically.


wg := sync.WaitGroup{}
// create several go-routines to expose racy issue
for i := 0; i < 10; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this outer for loop? Why is the inner one insufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner loop is inside go-routine. running loop in single go-routine gave me more false-positive results without fix in code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the only thing I can think of is that the extra goroutines cause the scheduler to behave slightly differently. For example, if it causes the cancel goroutine to consistently complete before the execSync goroutine, that would explain the discrepancy. (The old code would have consistently hit the "no need to interrupt" case.)

That does give me an idea for something more deterministic here. If we were to call BeginTx directly on *SQliteConn (rather than going through the database/sql package) with an already canceled context, that would have almost always caused the failure in the old code, but now will always work as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did as you suggested and it really much more deterministic (still not 100% as we want to interrupt statement with sqlite3_interrupt.

From tests I see that Conn.Raw() method was introduced only in go 1.13. Will move that test separately.

[why]
If db.BeginTx(ctx, nil) context is cancelled too fast, "BEGIN" statement can be
completed inside DB, but we still try to cancel it with sqlite3_interrupt.
In such case we get context.Cancelled or context.DeadlineExceeded from exec(),
but operation really completed. Connection returned into pool, and returns "cannot
start a transaction within a transaction" error for next db.BeginTx() call.

[how]
Handle status code returned from cancelled operation.

[testing]
Added unit-test which reproduces issue.
[why]
Tests times out in travis-ci when run with -race option.
@azavorotnii
Copy link
Contributor Author

@mattn , @rittneje ping here again.

@mattn
Copy link
Owner

mattn commented May 15, 2020

@rittneje any thought?

@rittneje
Copy link
Collaborator

Sorry for the delay. Looks good to me.

@azavorotnii
Copy link
Contributor Author

@mattn can it be merged now?

@djoyner
Copy link

djoyner commented Jul 19, 2020

I stumbled on this same bug and created a small reproduce (https://gist.github.com/djoyner/0133a23000d3ebd3b5421f975c4c2fbb) before I found this issue. FWIW, this PR works.

@kegsay
Copy link

kegsay commented Aug 28, 2020

Also encountering this problem, it looks like this PR has passed code review so please can it be merged so we can benefit from it downstream without having to fork?

@mattn
Copy link
Owner

mattn commented Aug 28, 2020

will look this in later.

@mattn mattn merged commit 862b959 into mattn:master Aug 28, 2020
@mattn
Copy link
Owner

mattn commented Aug 28, 2020

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants