Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application not closing socket stuck in CLOSE_WAIT state #36293

Closed
jogoertzen-stantec opened this issue May 12, 2020 · 14 comments
Closed

Application not closing socket stuck in CLOSE_WAIT state #36293

jogoertzen-stantec opened this issue May 12, 2020 · 14 comments

Comments

@jogoertzen-stantec
Copy link

jogoertzen-stantec commented May 12, 2020

Our ASP.NET Core application (RHEL 7; ASP.NET Core 2.2.7 runtime) frequently stops responding immediately after restarting a remote SQL Server database hosted on Windows Server.

htop

The application has a thread that is repeatedly calling recvmsg on a socket in CLOSE_WAIT state.

strace

The socket was originally ESTABLISHED with the database and transitioned to CLOSE_WAIT as soon as the database host was restarted. The socket remains in CLOSE_WAIT indefinitely until the application is restarted.

lsof
netstat

Attaching a debugger reveals the following stack trace.

stack.txt

I'm not exactly sure, but this is my best guess as to the line of code that produces the recvmsg calls.

If the TCP state transition diagram below is to be believed, then the socket correctly transitioned to CLOSE_WAIT when the database went down, but the application seems to think it is still ESTABLISHED or something.

tcp

Given the information above, does this look like a dotnet bug?

Thanks for reading!

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Sockets untriaged New issue has not been triaged by the area owner labels May 12, 2020
@ghost
Copy link

ghost commented May 12, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@karelz
Copy link
Member

karelz commented May 12, 2020

@jogoertzen-stantec does it use high CPU for calling recvmsg in a loop?
Given that 2.2 is not supported anymore, would you be able to upgrade to 3.1 and test it out there if it is still a problem?

@jogoertzen-stantec
Copy link
Author

@jogoertzen-stantec does it use high CPU for calling recvmsg in a loop?
Given that 2.2 is not supported anymore, would you be able to upgrade to 3.1 and test it out there if it is still a problem?

@karelz It sure does! 😅 Is this a known issue?

We unfortunately cannot upgrade to 3.1 at this time as odata is not supported.

@scalablecory
Copy link
Contributor

scalablecory commented May 12, 2020

Do you have small code to repro this? I don't think we have enough here to diagnose the problem -- at first glance it appears someone is calling Socket.Receive in a loop and not handling an exception, but that can be in many places.

edit: actually, I missed your stack trace. This appears to be something in SqlClient:

@544
SocketPal.Receive() in System.Net.Sockets, System.Net.Sockets.dll
SocketPal.TryCompleteReceiveFrom() in System.Net.Sockets, System.Net.Sockets.dll
SocketAsyncContext.ReceiveFrom() in System.Net.Sockets, System.Net.Sockets.dll
SocketPal.Receive() in System.Net.Sockets, System.Net.Sockets.dll
Socket.Receive() in System.Net.Sockets, System.Net.Sockets.dll
NetworkStream.Read() in System.Net.Sockets, System.Net.Sockets.dll
SslOverTdsStream.<ReadInternal>d__11.MoveNext() in System.Data.SqlClient.SNI, System.Data.SqlClient.dll
AsyncMethodBuilderCore.Start<System.Data.SqlClient.SNI.SslOverTdsStream.<ReadInternal>d__11>() in System.Runtime.CompilerServices, System.Private.CoreLib.dll
SslOverTdsStream.ReadInternal() in System.Data.SqlClient.SNI, System.Data.SqlClient.dll
SslOverTdsStream.Read() in System.Data.SqlClient.SNI, System.Data.SqlClient.dll
FixedSizeReader.ReadPacket() in System.Net, System.Net.Security.dll
SslState.StartReceiveBlob() in System.Net.Security, System.Net.Security.dll
SslState.CheckCompletionBeforeNextReceive() in System.Net.Security, System.Net.Security.dll
SslState.StartSendBlob() in System.Net.Security, System.Net.Security.dll
SslState.ForceAuthentication() in System.Net.Security, System.Net.Security.dll
SslState.ProcessAuthentication() in System.Net.Security, System.Net.Security.dll
SslStream.AuthenticateAsClient() in System.Net.Security, System.Net.Security.dll
SslStream.AuthenticateAsClient() in System.Net.Security, System.Net.Security.dll
SslStream.AuthenticateAsClient() in System.Net.Security, System.Net.Security.dll
SNITCPHandle.EnableSsl() in System.Data.SqlClient.SNI, System.Data.SqlClient.dll
SNIProxy.EnableSsl() in System.Data.SqlClient.SNI, System.Data.SqlClient.dll
TdsParserStateObjectManaged.EnableSsl() in System.Data.SqlClient.SNI, System.Data.SqlClient.dll
TdsParser.ConsumePreLoginHandshake() in System.Data.SqlClient, System.Data.SqlClient.dll
TdsParser.Connect() in System.Data.SqlClient, System.Data.SqlClient.dll
SqlInternalConnectionTds.AttemptOneLogin() in System.Data.SqlClient, System.Data.SqlClient.dll
SqlInternalConnectionTds.LoginNoFailover() in System.Data.SqlClient, System.Data.SqlClient.dll
SqlInternalConnectionTds.OpenLoginEnlist() in System.Data.SqlClient, System.Data.SqlClient.dll
new SqlInternalConnectionTds() in System.Data.SqlClient, System.Data.SqlClient.dll
SqlConnectionFactory.CreateConnection() in System.Data.SqlClient, System.Data.SqlClient.dll
DbConnectionFactory.CreatePooledConnection() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionPool.CreateObject() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionPool.UserCreateRequest() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionPool.TryGetConnection() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionPool.TryGetConnection() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionFactory.TryGetConnection() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionInternal.TryOpenConnectionInternal() in System.Data.ProviderBase, System.Data.SqlClient.dll
DbConnectionClosed.TryOpenConnection() in System.Data.ProviderBase, System.Data.SqlClient.dll
SqlConnection.TryOpen() in System.Data.SqlClient, System.Data.SqlClient.dll
SqlConnection.Open() in System.Data.SqlClient, System.Data.SqlClient.dll
SqlServerStorage.CreateAndOpenConnection() in Hangfire.SqlServer, Hangfire.SqlServer.dll
SqlServerStorage.UseConnection<int>() in Hangfire.SqlServer, Hangfire.SqlServer.dll
SqlServerConnection.RemoveTimedOutServers() in Hangfire.SqlServer, Hangfire.SqlServer.dll
ServerWatchdog.Execute() in Hangfire.Server, Hangfire.Core.dll
BackgroundProcessDispatcherBuilder.ExecuteProcess() in Hangfire.Server, Hangfire.Core.dll
BackgroundExecution.Run() in Hangfire.Processing, Hangfire.Core.dll
BackgroundDispatcher.DispatchLoop() in Hangfire.Processing, Hangfire.Core.dll
Thread.ThreadMain_ThreadStart() in System.Threading, System.Threading.Thread.dll
ExecutionContext.RunInternal() in System.Threading, System.Private.CoreLib.dll
[Native to Managed Transition]

@wfurt
Copy link
Member

wfurt commented May 12, 2020

I think it is essentially dup of #29327
fixed by
dotnet/corefx#38499 in 3.0.

@wfurt
Copy link
Member

wfurt commented May 12, 2020

BTW some background, we could leave OS descriptor open if it is not disposed explicitly.
Doing explicit Dispose() should avoid the issues. That can still be hidden inside of SqlClient.

@Wraith2
Copy link
Contributor

Wraith2 commented May 13, 2020

Given the presence of SslOverTdsStream.ReadInternal() in the stack trace it looks like it could be a known possible issue in SslOverTdsStream in SqlClient

                    while (readBytes < TdsEnums.HEADER_LEN)
                    {
                        readBytes += async ?
                            await _stream.ReadAsync(packetData, readBytes, TdsEnums.HEADER_LEN - readBytes, token).ConfigureAwait(false) :
                            _stream.Read(packetData, readBytes, TdsEnums.HEADER_LEN - readBytes);
                    }

Note that the readBytes value is not checked for 0 so you can sit in a non-iterating infinite loop if the connection is somehow closed but not throwing on access. In Microsoft.Data.SqlClient I've got an open PR that changes that behaviour, dotnet/SqlClient#541

@karelz
Copy link
Member

karelz commented May 13, 2020

So, we have a few suspicion. Is there any easy way to confirm which one it is?
Is there anything actionable left on this issue?

@Wraith2
Copy link
Contributor

Wraith2 commented May 13, 2020

If the runtime fix for later versions makes it impossible to return 0 on a closed socket that'll fix it for most people. In this particular case my upcoming change in Microsoft.Data.SqlClient will fix it on the older runtime.

So I don't think there's anything that needs to be done here. My suggestion would be that @jogoertzen-stantec probably needs to track the PR I linked and once it gets merged try the Microsoft.Data.SqlClient preview that contains it to see if the issue goes away.

@jogoertzen-stantec
Copy link
Author

jogoertzen-stantec commented May 13, 2020

@Wraith2 Unless I am missing something, the potential change to Microsoft.Data.SqlClient will not make a difference in this particular situation as the application currently uses System.Data.SqlClient. Are you also suggesting the application should switch to Microsoft.Data.SqlCient?

@Wraith2
Copy link
Contributor

Wraith2 commented May 13, 2020

Yes.

@jogoertzen-stantec
Copy link
Author

Great! I am subscribed to dotnet/SqlClient#541 as @Wraith2 suggested. We are planning to upgrade the application to ASP.NET Core 3.1 anyways in the coming weeks, but in case that plan falls through this is a nice alternative.

Also, thank you all very much for jumping on this issue so deliciously fast. 😄 Honestly, I wasn't even sure I was in the right place to begin with, so all this attention is greatly appreciated!

@jogoertzen-stantec
Copy link
Author

jogoertzen-stantec commented May 13, 2020

Actually, should I close this issue at this point, or should it remain open until I can confirm one of the proposed approaches actually solves the issue?

@karelz
Copy link
Member

karelz commented May 13, 2020

Given that it is not actionable now (#36293 (comment)), we can close it.
If you find out it is not addressed with upcoming changes, we can reopen and then dig deeper to make it actionable.

@karelz karelz closed this as completed May 13, 2020
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 9, 2020
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants