New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "cannot start a transaction within a transaction" issue (#764) #765
Conversation
6c3ce7a
to
04a819e
Compare
Thanks your contribution. @rittneje Could you please give me comment about this? |
Doesn't look related to changes in PR at all:
It looks like happens randomly, now 3 jobs failed of that with no test failures. |
04a819e
to
0cc6d20
Compare
sqlite3_go18_test.go
Outdated
|
||
for i := 0; i < 1000; i++ { | ||
ctx, cancel := context.WithCancel(context.Background()) | ||
go cancel() // make it cancel concurrently with exec("BEGIN"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder this canceliing always works as you expected. Sleep is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the code being tested here is inherently non-deterministic. @azavorotnii Did this test as written fail consistently prior to your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it failed every time I tested it without fix in code, but I agree that "passed" status doesn't prove issue absence.
Your implementation looks good. Unfortunately, I cannot think of a great way of dealing with the test deterministically. |
sqlite3_go18_test.go
Outdated
|
||
wg := sync.WaitGroup{} | ||
// create several go-routines to expose racy issue | ||
for i := 0; i < 10; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the need for this outer for loop? Why is the inner one insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inner loop is inside go-routine. running loop in single go-routine gave me more false-positive results without fix in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the only thing I can think of is that the extra goroutines cause the scheduler to behave slightly differently. For example, if it causes the cancel
goroutine to consistently complete before the execSync
goroutine, that would explain the discrepancy. (The old code would have consistently hit the "no need to interrupt" case.)
That does give me an idea for something more deterministic here. If we were to call BeginTx
directly on *SQliteConn
(rather than going through the database/sql package) with an already canceled context, that would have almost always caused the failure in the old code, but now will always work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did as you suggested and it really much more deterministic (still not 100% as we want to interrupt statement with sqlite3_interrupt.
From tests I see that Conn.Raw() method was introduced only in go 1.13. Will move that test separately.
0cc6d20
to
a74c2fd
Compare
[why] If db.BeginTx(ctx, nil) context is cancelled too fast, "BEGIN" statement can be completed inside DB, but we still try to cancel it with sqlite3_interrupt. In such case we get context.Cancelled or context.DeadlineExceeded from exec(), but operation really completed. Connection returned into pool, and returns "cannot start a transaction within a transaction" error for next db.BeginTx() call. [how] Handle status code returned from cancelled operation. [testing] Added unit-test which reproduces issue.
[why] Tests times out in travis-ci when run with -race option.
a74c2fd
to
5c1abba
Compare
@rittneje any thought? |
Sorry for the delay. Looks good to me. |
@mattn can it be merged now? |
I stumbled on this same bug and created a small reproduce (https://gist.github.com/djoyner/0133a23000d3ebd3b5421f975c4c2fbb) before I found this issue. FWIW, this PR works. |
Also encountering this problem, it looks like this PR has passed code review so please can it be merged so we can benefit from it downstream without having to fork? |
will look this in later. |
Thank you. |
[why]
If db.BeginTx(ctx, nil) context is cancelled too fast, "BEGIN" statement can be
completed inside DB, but we still try to cancel it with sqlite3_interrupt.
In such case we get context.Cancelled or context.DeadlineExceeded from exec(),
but operation really completed. Connection returned into pool, and returns "cannot
start a transaction within a transaction" error for next db.BeginTx() call.
[how]
If we get context cancelled on "BEGIN" statement, call "ROLLBACK" to clean-up
connection state. Don't return cancellation error from exec() if operation completed
without sqlite3_interrupt.
[testing]
Added unit-test which reproduces issue.