Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the maximum size of connection pool is 100 connections but the pending queue keeps growing #816

Closed
bahadirbal opened this issue Nov 23, 2020 · 7 comments
Labels
⏳ Waiting for Customer We're waiting for your response :)

Comments

@bahadirbal
Copy link

Describe the bug

This issue happens sometimes on our production environment. When the problem occurs, our service starts timing out when attempting to open a SQL connection to a SQL Azure Database. Other instances of the service are able to open connections to the same SQL DB without problems. The same instance of the service is able to establish connections to other SQL Azure Databases. The only solution that we found for mitigating the problem is to restart the instance of the service that is failing to open the connection.

Given that the problematic service instance is able to establish connections to other other databases, one of our theories was that perhaps we are exhausting SQL connection pool. So we have added monitoring around the state of the SQL connection pool. We fell back to using reflection to get to the internal state of DbConnectionPool objects.

Here is our logging code and parameters that we are logging =>

var assembly = typeof(SqlConnection).Assembly;
var factory = assembly.GetTypes().Where(t => t.Name == "SqlConnectionFactory").FirstOrDefault();
var property = factory.GetField("SingletonInstance", BindingFlags.Static | BindingFlags.Public);
var sqlConnectionFactoryInstance = property.GetValue(null);

var connectionPoolGroups = GetPrivateFieldValue<IDictionary>(sqlConnectionFactoryInstance, "_connectionPoolGroups", baseType: true);

foreach (var connectionPoolGroupKey in connectionPoolGroups.Keys)
{
    var connectionPoolGroup = connectionPoolGroups[connectionPoolGroupKey];
    var poolCollection = GetPrivateFieldValue<IDictionary>(connectionPoolGroup, "_poolCollection").Values;

    foreach (var pool in poolCollection)
    {
        var poolGroupIdHash = connectionPoolGroupKey?.GetHashCode();
        var poolIdHash = GetPrivateFieldValue<object>(pool, "_identity")?.GetHashCode();
        var totalConnectionsCount = GetPrivateFieldValue<int>(pool, "_totalObjects");
        var minPoolSize = GetPrivatePropertyValue<int>(pool, "MinPoolSize");
        var maxPoolSize = GetPrivatePropertyValue<int>(pool, "MaxPoolSize");
        var availableConnectionsNewCount = GetPrivateFieldValue<ICollection>(pool, "_stackNew")?.Count;
        var availableConnectionsOldCount = GetPrivateFieldValue<ICollection>(pool, "_stackOld")?.Count;
        var pendingOpensCount = GetPrivateFieldValue<ICollection>(pool, "_pendingOpens")?.Count;
        var waitCount = GetPrivateFieldValue<int>(pool, "_waitCount"); // Number of threads trying to open connection. Seems similar to the previous metric

        Logger.LogInformation(
            "SQL pool stats. " +
            "poolGroupIdHash: {poolGroupIdHash}" +
            "poolIdHash: {poolIdHash}" +
            "totalConnectionsCount: {totalConnectionsCount}" +
            "minPoolSize: {minPoolSize}" +
            "maxPoolSize: {maxPoolSize}" +
            "availableConnectionsNewCount: {availableConnectionsNewCount}" +
            "availableConnectionsOldCount: {availableConnectionsOldCount}" +
            "pendingOpensCount: {pendingOpensCount}" +
            "waitCount: {waitCount}",
            poolGroupIdHash.AsSystemMetadata(),
            poolIdHash.AsSystemMetadata(),
            totalConnectionsCount.AsSystemMetadata(),
            minPoolSize.AsSystemMetadata(),
            maxPoolSize.AsSystemMetadata(),
            availableConnectionsNewCount.AsSystemMetadata(),
            availableConnectionsOldCount.AsSystemMetadata(),
            pendingOpensCount.AsSystemMetadata(),
            waitCount.AsSystemMetadata());
    }
}

Basically we are logging the following fields and properties from the DbConnectionPool object: _totalObjects, MinPoolSize, MaxPoolSize, _stackNew.Count, _stackOld.Count, _pendingOpens, _waitCount

Below are the log details when we start to see the problem occurring. We are seeing that _pendingOpens keeps growing and growing and growing. And the growth started after _waitCount got flipped from 0 to 1.

image

We suspect that there is a concurrency bug of some sort in SQL client that leads to the behavior described above. One of connection is just blocking all new ones and keeping under wait status. There is no other solution than restarting our service and by restart all transactions are wiped out from the pool.

Expected behavior

We expect the client will use the all possible available connections from the pool and not blocking new connection requests.

Further technical details

Microsoft.Data.SqlClient version: 1.1.1
.NET target: .NET Core 2.1
SQL Server version: Microsoft SQL Azure (RTM) - 12.0.2000.8
Operating system: Ubuntu - 16.04.6 LTS (Xenial Xerus)

@ErikEJ
Copy link
Contributor

ErikEJ commented Nov 23, 2020

That is a very old version of SqlClient you use - have you tried 2.1.0 ?

@bahadirbal
Copy link
Author

Our version has latest update on last June, I don't think it is really old. On the other hand, we tried couple of times to upgrade the version of SqlClient (not 2.1.0 - it is released 4 days ago I see), we revert it back due to some other bugs. We can give another try to upgrade our client.

But do you know there are changes on the creation of connection between these versions? Or any related bug/fix is pointing on this kind of issue?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 23, 2020

@bahadirbal

What issue have you been facing with v2.1.0?
Please provide details so we can help you get past those.

You can view all release notes here: Release Notes and most importantly v2.0.0 Breaking changes.

This issue also looks similar to #422 to me, please provide us a repro if possible with connection string.
Also let us know if you have enabled MARS and disabling it fixes your issues?

@cheenamalhotra cheenamalhotra added this to Needs triage in SqlClient Triage Board via automation Nov 23, 2020
@cheenamalhotra cheenamalhotra moved this from Needs triage to Under Investigation in SqlClient Triage Board Nov 23, 2020
@bahadirbal
Copy link
Author

@cheenamalhotra

We have not tried to upgrade to v.2.1.0 yet. I will try to upgrade the version. The issues were on previous versions.

Regarding to your share about MARS, #422, we explicitly enabled MARS as workaround of this issue #54. We faced this on production and that is why disabling MARS is not ideal for now, since I am also seeing recent activities/comments on the issue "The connection is closed". Do you confirm this #54 is resolved in v2.1.0?

Regarding to repro steps, we are not hitting this issue all the time. That is why we added logging these connection pools statistics and try to catch something from these numbers. Do you have any other suggestion that we can enhance our investigation? listen some events etc?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 24, 2020

Yes #54 is now fixed with v2.1.0.

The issue #422 affects all MARS cases on Unix platforms when connecting in parallel in high volume, so would recommend keeping it "false" for now (till we fix it) unless you can manage to ensure requests don't go in parallel to the driver.

@JRahnama
Copy link
Member

@bahadirbal is the issue resolved?

@JRahnama JRahnama added the ⏳ Waiting for Customer We're waiting for your response :) label Nov 25, 2021
@JRahnama
Copy link
Member

Closing the issue due to inactivity. Feel free to comment here or open a new issue.

SqlClient Triage Board automation moved this from Under Investigation to Closed Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏳ Waiting for Customer We're waiting for your response :)
Projects
Development

No branches or pull requests

4 participants