Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad connection is never removed from the connection pool #606

Open
dududko opened this issue Sep 28, 2020 · 1 comment · May be fixed by #607
Open

Bad connection is never removed from the connection pool #606

dududko opened this issue Sep 28, 2020 · 1 comment · May be fixed by #607

Comments

@dududko
Copy link

dududko commented Sep 28, 2020

Describe the bug

There is a stored procedure which is executed every 5 seconds. The stored procedure returns the result of the join of two tables. Data is stored in one database on a single server. It is NOT a distributed query. It is a read only query.

Once in a blue moon(once a day) something bad happen, and my program break. After that on each request I see the recurrent error:

"error": "mssql: New request is not allowed to start because it should come with valid transaction descriptor."

The thing is, that I get this error only on Commit, which mean that the connection is established, I even can scan the results, but on commit I receive this error. So it looks like the bad connection is stuck in the pool, and the driver does not recognize that the connection is broken.

Digging into the problem

I added sql hooks, increased the logging level, and filled my code with bunch of log statements. Here is what I found:

  1. after the transaction is open and before it is closed driver prints the error (note: ip addresses is removed)
Intercepted panic read tcp *:46326->*:1433: i/o timeout
  1. after that the row scan returns the error
"logger": "sqlhook.Rows.Next"
"query": "exec myStoredProcedure",
"error": "unknown token type returned: token(0)"
  1. the transaction is cancelled
"The transaction active in this session has been committed or aborted by another session."
"ROLLBACK TRANSACTION 5100000d66\n""The transaction active in this session has been committed or aborted by another session."
"ROLLBACK TRANSACTION 5100000d66\n"
  1. then after 5 seconds the procedure is called again, and now it fails on commit
    "msg": "detected mssql error",
    "message": "The server failed to resume the transaction. Desc:3a0000005d.",
    "number": 3971,
    "state": 1,
    "class": 16,
    "server_name": "MY_SERVER",
    "line_no": 1,
  1. Then I see the recurrent error
"New request is not allowed to start because it should come with valid transaction descriptor."v

Conclusions

I reviewed the code of the driver and I think that there is no proper error handling in the row scan logic. Errors are mostly ignored.
Also there is no way to tell the driver which errors are unrecoverable. The only way to do this is to patch the driver.

I was able to provide the patch for the error (5), but ideally it would be great to find the source of error (1) and remove the broken connection from the pool.

Further technical details

Microsoft SQL Server 2012 - 11.0.2100.60 (X64)

@denisenkom
Copy link
Owner

Is it possible to reproduce this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants