Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang with SQLClient close connection #886

Open
hungdqht opened this issue Jan 16, 2021 · 12 comments
Open

Hang with SQLClient close connection #886

hungdqht opened this issue Jan 16, 2021 · 12 comments

Comments

@hungdqht
Copy link

hungdqht commented Jan 16, 2021

We sometime experienced the application hang on IIS, 60% of the threads has the same stack as below and all of them seems stuck when trying to release the connection to the pool. After restart the w3wp process it is back to normal.
We use System.Data.SqlClient 4.8.4121
Could you please check if this is a bug and how we can avoid that happen? Thanks in advance

0:065> !clrstack
OS Thread Id: 0x1ec (65)
        Child SP               IP Call Site
000000e649cbbb98 00007ffe73ba59b4 [GCFrame: 000000e649cbbb98] 
000000e649cbbc68 00007ffe73ba59b4 [HelperMethodFrame_1OBJ: 000000e649cbbc68] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
000000e649cbbd80 00007ffe61083e54 System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(Int32, UInt32, System.Threading.CancellationToken)
000000e649cbbdd0 00007ffe61083beb System.Threading.SemaphoreSlim.Wait(Int32, System.Threading.CancellationToken)
000000e649cbbe60 00007ffe61083982 System.Threading.SemaphoreSlim.Wait()
000000e649cbbe90 00007ffe5d16e1c3 System.Data.SqlClient.SqlInternalConnectionTds+SyncAsyncLock.Wait(Boolean)
000000e649cbbed0 00007ffe5cf88a8f System.Data.SqlClient.SqlInternalConnectionTds.ObtainAdditionalLocksForClose()
000000e649cbbf10 00007ffe5cf86048 System.Data.ProviderBase.DbConnectionInternal.CloseConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)
000000e649cbbf60 00007ffe5caba7da System.Data.SqlClient.SqlConnection.Close()
000000e649cbbfd0 00007ffe043a23a2 System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1[[System.__Canon, mscorlib]].Dispatch[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon, System.Action`2, System.__Canon, System.Action`3, System.Action`3)
000000e649cbc080 00007ffe04f49c5d System.Data.Entity.Infrastructure.Interception.DbConnectionDispatcher.Close(System.Data.Common.DbConnection, System.Data.Entity.Infrastructure.Interception.DbInterceptionContext)
000000e649cbc0f0 00007ffe04f49279 System.Data.Entity.Core.EntityClient.EntityConnection.StoreCloseHelper()
000000e649cbc140 00007ffe04f48dc3 System.Data.Entity.Core.Objects.ObjectContext.ReleaseConnection()
000000e649cbc170 00007ffe04f488dd System.Data.Entity.Core.Common.Internal.Materialization.Shaper`1[[System.__Canon, mscorlib]].Finally()
000000e649cbc1b0 00007ffe05955f19 System.Data.Entity.Core.Common.Internal.Materialization.Shaper`1+ObjectQueryNestedEnumerator[[System.__Canon, mscorlib]].Dispose()
000000e649cbc1e0 00007ffe04fcf98c System.Data.Entity.Internal.LazyAsyncEnumerator`1[[System.__Canon, mscorlib]].Dispose()
000000e649cbc210 00007ffe04f851e1 System.Data.Entity.Infrastructure.IDbAsyncEnumerableExtensions+d__5`1[[System.__Canon, mscorlib]].MoveNext()
000000e649cbc260 00007ffe04f8516f System.Data.Entity.Infrastructure.IDbAsyncEnumerableExtensions+d__5`1[[System.__Canon, mscorlib]].MoveNext()
000000e649cbc2d0 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbc3a0 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbc3d0 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbc420 00007ffe04fcf47b System.Data.Entity.Utilities.TaskExtensions+CultureAwaiter`1+c__DisplayClass1[[System.Boolean, mscorlib]].b__0()
000000e649cbc470 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbc4c0 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbc550 00007ffe606eee5e System.Threading.Tasks.Task`1[[System.Boolean, mscorlib]].TrySetResult(Boolean)
000000e649cbc590 00007ffe606eef68 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].SetResult(Boolean)
000000e649cbc5d0 00007ffe04f89fd2 System.Data.Entity.Internal.LazyAsyncEnumerator`1+d__0[[System.__Canon, mscorlib]].MoveNext()
000000e649cbc640 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbc710 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbc740 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbc790 00007ffe04fccd4b System.Data.Entity.Utilities.TaskExtensions+CultureAwaiter`1+c__DisplayClass1[[System.__Canon, mscorlib]].b__0()
000000e649cbc7e0 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbc830 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbc8c0 00007ffe606bfdb7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].TrySetResult(System.__Canon)
000000e649cbc900 00007ffe606f1d5c System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].SetResult(System.__Canon)
000000e649cbc950 00007ffe04f8d8f8 System.Data.Entity.Core.Objects.ObjectQuery`1+d__e[[System.__Canon, mscorlib]].MoveNext()
000000e649cbc9c0 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbca90 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbcac0 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbcb10 00007ffe04fccd4b System.Data.Entity.Utilities.TaskExtensions+CultureAwaiter`1+c__DisplayClass1[[System.__Canon, mscorlib]].b__0()
000000e649cbcb60 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbcbb0 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbcc40 00007ffe606bfdb7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].TrySetResult(System.__Canon)
000000e649cbcc80 00007ffe606f1d5c System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].SetResult(System.__Canon)
000000e649cbccd0 00007ffe04f91e9e System.Data.Entity.SqlServer.DefaultSqlExecutionStrategy+d__9`1[[System.__Canon, mscorlib]].MoveNext()
000000e649cbcd50 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbce20 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbce50 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbcea0 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbcef0 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbcf80 00007ffe606bfdb7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].TrySetResult(System.__Canon)
000000e649cbcfc0 00007ffe606f1d5c System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].SetResult(System.__Canon)
000000e649cbd010 00007ffe04f9608d System.Data.Entity.Core.Objects.ObjectContext+d__3d`1[[System.__Canon, mscorlib]].MoveNext()
000000e649cbd080 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbd150 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbd180 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbd1d0 00007ffe04fccd4b System.Data.Entity.Utilities.TaskExtensions+CultureAwaiter`1+c__DisplayClass1[[System.__Canon, mscorlib]].b__0()
000000e649cbd220 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbd270 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbd300 00007ffe606bfdb7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].TrySetResult(System.__Canon)
000000e649cbd340 00007ffe606f1d5c System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].SetResult(System.__Canon)
000000e649cbd390 00007ffe04fc501e System.Data.Entity.Core.Objects.Internal.ObjectQueryExecutionPlan+d__0`1[[System.__Canon, mscorlib]].MoveNext()
000000e649cbd430 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbd500 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbd530 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbd580 00007ffe04fccd4b System.Data.Entity.Utilities.TaskExtensions+CultureAwaiter`1+c__DisplayClass1[[System.__Canon, mscorlib]].b__0()
000000e649cbd5d0 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbd620 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbd6b0 00007ffe606bfdb7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].TrySetResult(System.__Canon)
000000e649cbd6f0 00007ffe606f1d5c System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].SetResult(System.__Canon)
000000e649cbd740 00007ffe04fc5a81 System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition+d__c.MoveNext()
000000e649cbd790 00007ffe606a674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbd860 00007ffe606a65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000e649cbd890 00007ffe606f29fd System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
000000e649cbd8e0 00007ffe04fccd4b System.Data.Entity.Utilities.TaskExtensions+CultureAwaiter`1+c__DisplayClass1[[System.__Canon, mscorlib]].b__0()
000000e649cbd930 00007ffe606f9c62 System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
000000e649cbd980 00007ffe606c0244 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbda10 00007ffe606efe44 System.Threading.Tasks.Task.Finish(Boolean)
000000e649cbda70 00007ffe606f0c00 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
000000e649cbdb20 00007ffe606f0303 System.Threading.Tasks.Task.ExecuteEntry(Boolean)
000000e649cbdb60 00007ffe606f46f4 System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(System.Threading.Tasks.Task, Boolean)
000000e649cbdbb0 00007ffe606f45bb System.Threading.Tasks.TaskScheduler.TryRunInline(System.Threading.Tasks.Task, Boolean)
000000e649cbdc50 00007ffe606f4245 System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(System.Threading.Tasks.Task, Boolean)
000000e649cbdce0 00007ffe606bff5b System.Threading.Tasks.Task.FinishContinuations()
000000e649cbdd70 00007ffe606bfdb7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].TrySetResult(System.__Canon)
000000e649cbddb0 00007ffe606fadc0 System.Threading.Tasks.UnwrapPromise`1[[System.__Canon, mscorlib]].TrySetFromTask(System.Threading.Tasks.Task, Boolean)
000000e649cbde10 00007ffe606fabdc System.Threading.Tasks.UnwrapPromise`1[[System.__Canon, mscorlib]].ProcessInnerTask(System.Threading.Tasks.Task)
000000e649cbde50 00007ffe606fa9ea System.Threading.Tasks.UnwrapPromise`1[[System.__Canon, mscorlib]].Invoke(System.Threading.Tasks.Task)
000000e649cbdea0 00007ffe606c0287 System.Threading.Tasks.Task.FinishContinuations()
000000e649cbdf30 00007ffe606efe44 System.Threading.Tasks.Task.Finish(Boolean)
000000e649cbdf90 00007ffe606f0c00 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
000000e649cbe040 00007ffe606f0303 System.Threading.Tasks.Task.ExecuteEntry(Boolean)
000000e649cbe080 00007ffe606afa10 System.Threading.ThreadPoolWorkQueue.Dispatch()
000000e649cbe518 00007ffe61936a53 [DebuggerU2MCatchHandlerFrame: 000000e649cbe518] 
000000e649cbe6a8 00007ffe61936a53 [ContextTransitionFrame: 000000e649cbe6a8] 
000000e649cbe8d8 00007ffe61936a53 [DebuggerU2MCatchHandlerFrame: 000000e649cbe8d8] 
@ErikEJ
Copy link
Contributor

ErikEJ commented Jan 16, 2021

Driver version?

@hungdqht
Copy link
Author

We still use System.Data.SqlClient not Microsoft.Data..SqlClient which is shiped along with the .net framework
image

@cheenamalhotra
Copy link
Member

Hi @hungdqht

Could you explain more about under what circumstances does this occur?
Do you have a repro app to reproduce issue?

@cheenamalhotra cheenamalhotra added this to Needs triage in SqlClient Triage Board via automation Jan 18, 2021
@cheenamalhotra cheenamalhotra moved this from Needs triage to Needs More Info in SqlClient Triage Board Jan 18, 2021
@hungdqht
Copy link
Author

Hi @cheenamalhotra ,
sometime it happens on our production and we still try to figure it out. I will add more information when we find something. At the moment we try to review the code and do load test on staging try to reproduce that with hope.

@hungdqht
Copy link
Author

hungdqht commented Jan 19, 2021

image
It seems that has a deadlock within the System.Data.SqlClient.SqlInternalConnectionTds.
I found this bug which happen to Microsoft.Data.SqlClient and will it happen to the System.Data.SqlClient too? I'm not sure it is our case but it seems similar
#755

@cheenamalhotra
Copy link
Member

Hi @hungdqht

#755 was fixed in v2.1.0, however System.Data.SqlClient is not receiving Bug Fixes.
Could you try testing your load test app with the latest Microsoft.Data.SqlClient (v2.1.1) driver and let us know?

@hungdqht
Copy link
Author

unfortunately we ccan't reproduce the problem with load test. However we noticed that the problem happend when we upgrade the server from 2012 to 2016. The avg cpu is higher on 2016. We moved it back to 2012 and continue to monitor the servers. I will update if it happen again.

Thank you

@hungdqht
Copy link
Author

hungdqht commented Feb 1, 2021

Hi @cheenamalhotra ,
We can't reproduce the problem but still have it on production. Could you please check my finding below and give me advices on this. I review our code and have no clue for this. Thank you in advance

I Study the code and the dump file and it looks like the thread deadlock with itself at the code below

protected override object ObtainAdditionalLocksForClose() {
bool obtainParserLock = !ThreadHasParserLockForClose;
Debug.Assert(obtainParserLock || _parserLock.ThreadMayHaveLock(), "Thread claims to have lock, but lock is not taken");
if (obtainParserLock) {
_parserLock.Wait(canReleaseFromAnyThread: false);
ThreadHasParserLockForClose = true;
}
return obtainParserLock;
}

When close the connection thread try to obtain the _parserLock which it already have it. Look into the class SqlInternalConnectionTds I found this comments confirmed that deadlock issue
// FOR SYNCHRONIZATION IN TdsParser
// How to use these locks:
// 1. Whenever writing to the connection (with the exception of Cancellation) the _parserLock MUST be taken
// 2. _parserLock will also be taken during close (to prevent closing in the middle of a write)
// 3. Whenever you have the _parserLock and are calling a method that would cause the connection to close if it failed (with the exception of any writing method), you MUST set ThreadHasParserLockForClose to true
// * This is to prevent the connection deadlocking with itself (since you already have the _parserLock, and Closing the connection will attempt to re-take that lock)
// * It is safe to set ThreadHasParserLockForClose to true when writing as well, but it is unneccesary
// * If you have a method that takes _parserLock, it is a good idea check ThreadHasParserLockForClose first (if you don't expect _parserLock to be taken by something higher on the stack, then you should at least assert that it is false)

// 4. ThreadHasParserLockForClose is thread-specific - this means that you must set it to false before returning a Task, and set it back to true in the continuation
// 5. ThreadHasParserLockForClose should only be modified if you currently own the _parserLock
// 6. Reading ThreadHasParserLockForClose is thread-safe

So I think the thread somehow take _parserLock and not/fail to update ThreadHasParserLockForClose to true and when it close the connection it deadlock with itself.

I search the usages of the _parserLock.Wait methods and see that it used several places and some of them set the ThreadHasParserLockForClose to true when it has _parserLock but not all.
Method WriteRowSourceToServerAsync in SqlBulkCopy doesn't do that.
private Task WriteRowSourceToServerAsync(int columnCount, CancellationToken ctoken) {
...
Debug.Assert(_parserLock == null, "Previous parser lock not cleaned");
_parserLock = internalConnection._parserLock;
_parserLock.Wait(canReleaseFromAnyThread: _isAsyncBulkCopy);
....
}

I wonder if we must set ThreadHasParserLockForClose to true when we have _parserLock why we don't do that in _parserLock.Wait method

internal void Wait(bool canReleaseFromAnyThread)
{
Monitor.Enter(semaphore); // semaphore is used as lock object, no relation to SemaphoreSlim.Wait/Release methods
if (canReleaseFromAnyThread || semaphore.CurrentCount==0) {
semaphore.Wait();
ThreadHasParserLockForClose = true; //add this line to avoid deadlock
if (canReleaseFromAnyThread) {
Monitor.Exit(semaphore);
}
else {
semaphore.Release();
}
}
}

also this method is not release the lock on semaphore. It not call Monitor.Exit in every cases but always Enter.
Monitor.Enter without Monitor.Exit can cause deadlock?

The problem still happen under heavy load and we still stuck at finding a solution

@cheenamalhotra
Copy link
Member

That's interesting. We will take a closer look and get back to you!

@crydante
Copy link

crydante commented Jul 5, 2022

@hungdqht Do you have any news ? Your hang is still present ?

I had exactly the same problem. My app is on Net Core 3.1 with Entity Framework 6.

Hangs happened sometimes in IIS in production and 70-80% of the threads are waiting here :

" Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlInternalConnectionTds.SyncAsyncLock.Wait(bool canReleaseFromAnyThread = false)"

I had the problem with System.Data.SQLClient 4.7.0 so i upgrade to Microsoft.Data.SqlClient with ErikEJ.EntityFramework.SqlServer but the problem is still here.

In ErikEJ.EntityFramework.SqlServer, Microsoft.Data.SqlClient version is 2.1.4. Maybe i can try to update to 4.1 and test ?

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 5, 2022

@crydante yes just add an explicit reference to 4.1

@crydante
Copy link

My hang is resolved with upgrading Microsoft.Data.SqlClient to 4.1 AND disable MARS for all SqlConnections

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
SqlClient Triage Board
  
Needs More Info
Development

No branches or pull requests

4 participants