Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading large data (binary, text) asynchronously is extremely slow #593

Open
roji opened this issue Jun 5, 2020 · 155 comments
Open

Reading large data (binary, text) asynchronously is extremely slow #593

roji opened this issue Jun 5, 2020 · 155 comments
Labels
📈 Performance Use this label for performance improvement activities

Comments

@roji
Copy link
Member

roji commented Jun 5, 2020

Reading a 10MB VARBINARY(MAX) value asynchronously takes around 5 seconds my machine, while doing the same synchronously takes around 20ms. Increasing the data size to 20MB increases the running time to around 52 seconds.

These numbers were with Microsoft.Data.SqlClient 1.1.3, but 2.0.0-preview4.20142.4 has the same issue (it actually seems slightly slower). Note that I'm not posting final BDN results because the benchmark ran extremely slowly.

Note that this looks very similar to #245, but that issue was fixed for 1.1.0. The difference may be that here we're writing binary data - or possibly the bigger size.

Benchmark code
[MemoryDiagnoser]
public class Benchmarks
{
    const string ConnectionString = "Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0";


    [GlobalSetup]
    public void Setup()
    {
        using var conn = new SqlConnection(ConnectionString);
        conn.Open();

        using var cmd = conn.CreateCommand();
        cmd.CommandText = @"
IF OBJECT_ID('dbo.data', 'U') IS NOT NULL
DROP TABLE data; 
CREATE TABLE data (id INT, foo VARBINARY(MAX))
";
        cmd.ExecuteNonQuery();

        cmd.CommandText = "INSERT INTO data (id, foo) VALUES (@id, @foo)";
        cmd.Parameters.AddWithValue("id", 1);
        cmd.Parameters.AddWithValue("foo", new byte[1024 * 1024 * 10]);
        cmd.ExecuteNonQuery();
    }

    [Benchmark]
    public async ValueTask<int> Async()
    {
        using var conn = new SqlConnection(ConnectionString);
        using var cmd = new SqlCommand("SELECT foo FROM data", conn);
        await conn.OpenAsync();

        return ((byte[])await cmd.ExecuteScalarAsync()).Length;
    }

    [Benchmark]
    public async ValueTask<int> Sync()
    {
        using var conn = new SqlConnection(ConnectionString);
        using var cmd = new SqlCommand("SELECT foo FROM data", conn);
        conn.Open();

        return ((byte[])cmd.ExecuteScalar()).Length;
    }

    static void Main(string[] args)
        => BenchmarkRunner.Run<Benchmarks>();
}

Originally raised in dotnet/efcore#21147

@roji roji added the 📈 Performance Use this label for performance improvement activities label Jun 5, 2020
@ErikEJ
Copy link
Contributor

ErikEJ commented Jun 5, 2020

Cannot see much relation to #245 really.

@DavoudEshtehari DavoudEshtehari added this to Needs triage in SqlClient Triage Board via automation Jun 5, 2020
@DavoudEshtehari
Copy link
Member

Here is the result of the test at NetCore & Netfx:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18362.657 (1903/May2019Update/19H1)
Intel Core i7-8665U CPU 1.90GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT  [AttachedDebugger]
  Job-TUQHOY : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT

IterationCount=10  LaunchCount=1  WarmupCount=10  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 3,170.53 ms 222.636 ms 132.487 ms 635000.0000 634000.0000 633000.0000 13155.71 MB
Sync 37.88 ms 2.139 ms 1.415 ms 333.3333 333.3333 333.3333 20 MB
  • ManagedSNI = false
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18362.657 (1903/May2019Update/19H1)
Intel Core i7-8665U CPU 1.90GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.202
  [Host]     : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT
  Job-LXIUHK : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT

IterationCount=10  LaunchCount=1  WarmupCount=10  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 4,731.62 ms 228.407 ms 151.077 ms 3000.0000 2000.0000 1000.0000 30.5 MB
Sync 36.68 ms 2.590 ms 1.713 ms 500.0000 500.0000 500.0000 20 MB
  • ManagedSNI = true
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18362.657 (1903/May2019Update/19H1)
Intel Core i7-8665U CPU 1.90GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.202
  [Host]     : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT
  Job-OTAQAT : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT

IterationCount=10  LaunchCount=1  WarmupCount=10  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 4,054.54 ms 474.155 ms 282.162 ms 3000.0000 2000.0000 1000.0000 30.5 MB
Sync 34.23 ms 1.365 ms 0.903 ms 357.1429 357.1429 357.1429 20 MB

@cheenamalhotra cheenamalhotra moved this from Needs triage to Under Investigation in SqlClient Triage Board Jun 5, 2020
@Wraith2
Copy link
Contributor

Wraith2 commented Jun 9, 2020

Working from memory here. Reading a MAX defined field will mean using chunked returns and the complete length of the field may not be known ahead of time so the optimization I put in place for reading won't help. If that's right then you'll see the same horrific escalating performance with increased data sizes as the reader continuously allocates and releases the temporary buffers, those GC numbers seem to support that.

It might be possible to do adapt the cached buffer we've got for multipacket known lengths to work like a list and auto expand as we go. What I can't do is prevent the repeated rescanning of the async packet queue.

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 10, 2020

I've had a look into the profiles and I was wrong. The optimization from #245 is being used because it's reading as PLP data. The excessive memory usage is caused mostly by input packet buffers on the TDS side being lost into the snapshot packet chain. The time escalation is cause by the snapshot chain repeatedly being replayed.

The only currently available mitigations are to increase the packet size to as close to 16K as you can get or to use a stream reader mode which should work on a per-packet basis and not buffer the entire contents (I'm a bit hazy on that for async, it probably needs checking).

The long term fix would be to rewrite async reads to avoid replaying the entire packet chain on every receipt. This is akin to brain surgery. I've looked into this while reviewing other performance issues around async and it isn't something to attempt lightly and if I managed it solo and dropped a PR for it the team they may all justifiably quit in protest.

The

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 10, 2020

Oh, and this in System.Data.Common causes about 1/3 of the allocations and really isn't helpful.

https://github.com/dotnet/runtime/blob/7a57b927c3bdc31b91b7605f25f8598b8e6e58b8/src/libraries/System.Data.Common/src/System/Data/SQLTypes/SQLBinary.cs#L61-L74

We already use type workarounds with an overlay type to access the single value field directly to create SqlBinary instances so we could do the same to get it out but we'd then be handing the user a direct reference to a mutable array instead of a copy an I've no doubt that some idiot somewhere in the world is relying on the copy semantic to make their code work.

@roji
Copy link
Member Author

roji commented Jun 11, 2020

@Wraith2

Oh, and this in System.Data.Common causes about 1/3 of the allocations and really isn't helpful.

You're referring to the fact that the binary data is copied out into an array that's given to the user, right? If so, isn't that the same between sync and async?

In any case, that's indeed a lot of allocations, but the alternative would be to return memory that's owned by SqlClient, right? If I'm understanding things right, that would mean that if the user reads the next row, the reference they get would suddenly start pointing to a totally different value... I did think about this at some point as an advanced/high-perf/"unsafe" API in dotnet/runtime#24899 - but that's tricky because of the lifetime issue, and also because the entire binary data isn't necessarily in memory if CommandBehavior.Sequential is used (as it should if values are big).

Anyway, all that stuff is less important than making async at least perform similar to sync...

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 11, 2020

You're referring to the fact that the binary data is copied out into an array that's given to the user, right? If so, isn't that the same between sync and async?

Yes, and yes.

In any case, that's indeed a lot of allocations, but the alternative would be to return memory that's owned by SqlClient, right?

Each row of data is stored in an array of SqlBuffer instances stored in the reader, The SqlBuffer type is a discriminated union and it'll hold a reference to an SqlBinary. When you move row or move field in sequential mode the buffer instance is dropped to the GC. The backing byte[] array inside SqlBinary is not re-used.

If we returned the array directly it would only have an effect on multiple reads of the same field in the same row and the only way to observe a change of behaviour would be to change the contents of the array and then re-request it from the reader. It's a pretty clear optimization that I'd like to be able to make but I'm pretty sure I can't because someone will have relied on it returning a new array each time.

As you suggest the ideal approach would be to use a ReadOnlyMemory then it'd all be safe but that'll require a runtime api change and for all providers to consume it.

also because the entire binary data isn't necessarily in memory if CommandBehavior.Sequential is used (as it should if values are big).

The entire array is skipped in Sequential mode if the field is not requested, as you say.
In standard mode it will be read into memory and kept until Read() is called to move to the next row if it or any field after it in the column order is read.

Anyway, all that stuff is less important than making async at least perform similar to sync...

I can sort of understand how it might be done but it's daunting and difficult.

@roji
Copy link
Member Author

roji commented Jun 11, 2020

As you suggest the ideal approach would be to use a ReadOnlyMemory then it'd all be safe but that'll require a runtime api change and for all providers to consume it.

Wouldn't it be possible for a provider to simply support GetFieldValue<ReadOnlyMemory<byte>>(), without introducing a new API?

I can sort of understand how it might be done but it's daunting and difficult.

I can certainly understand (and appreciate) that, and I'm definitely not suggesting anyone do anything specific (was just reporting the problem). Along with #547 and possibly other async-related issues, it seems that at least documenting these limitations could help users avoid pitfalls.

BTW do you think there are any workarounds to this, besides switching to sync?

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 11, 2020

Wouldn't it be possible for a provider to simply support GetFieldValue<ReadOnlyMemory>(), without introducing a new API?

I hadn't considered that, nice idea.

Currently the only workaround I can think of is to use a stream read overload because I think those drive the parser per-packet directly so you don't have to replay the packet queue, that'll need verifying.

@roji
Copy link
Member Author

roji commented Jun 11, 2020

Currently the only workaround I can think of is to use a stream read overload because I think those drive the parser per-packet directly so you don't have to replay the packet queue, that'll need verifying.

That at least sounds like a good workaround if it works! Definitely worth benchmarking that.

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 11, 2020

Unless I've got something very wrong it won't work because there's a bug.

            using var conn = new SqlConnection(ConnectionString);
            using var cmd = new SqlCommand("SELECT foo FROM data", conn);
            await conn.OpenAsync();
            using var reader = await cmd.ExecuteReaderAsync(
                System.Data.CommandBehavior.SequentialAccess
            );
            await reader.ReadAsync();
            using var stream =  reader.GetStream(0);
            using var memory = new MemoryStream(16 * 1024);

            await stream.CopyToAsync(memory);
            return (int)memory.Length;

This should work but freezes in a task wait after a single read cycle, the second read never completed. IF you change it to standard mode it'll work but it does so by fetching the entire field and giving you a reader over the byte[]. So no workaround, this needs fixing if the team agree that it's a bug.

@roji
Copy link
Member Author

roji commented Jun 11, 2020

Thanks for testing this... Definitely looks like a bug to me.

IF you change it to standard mode it'll work but it does so by fetching the entire field and giving you a reader over the byte[]

Is this at least fast - in the sense that it resembles the sync perf rather than the async? If so it's at least a workaround, even if not ideal...

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 11, 2020

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 2,181.93 ms 42.299 ms 39.567 ms 2000.0000 1000.0000 1000.0000 30.5 MB
StreamAsync 2,162.94 ms 30.055 ms 28.114 ms 2000.0000 1000.0000 1000.0000 40.52 MB
Sync 18.40 ms 0.359 ms 0.414 ms 406.2500 406.2500 406.2500 20 MB

So no. The only way to get the speed benefit Is to put the reader in sequential mode.

// For non-variant types with sequential access, we support proper streaming

There's also an interesting comment elsewhere in the file that says streaming isn't supported when using encryption. I think that makes streaming unusable for a generic consumer like EFCore because you'll have to special case it per driver.

We need to fix the sequential stream issue and then we need to rewrite async reads.

@roji
Copy link
Member Author

roji commented Jun 11, 2020

Thanks for the details @Wraith2.

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 14, 2020

With the fix in #603 the numbers are better:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 2,096.68 ms 14.390 ms 13.460 ms 2000.0000 1000.0000 1000.0000 30.5 MB
StreamAsync 25.42 ms 0.499 ms 0.534 ms 2031.2500 1968.7500 1968.7500 32.36 MB
Sync 16.67 ms 0.041 ms 0.036 ms 406.2500 406.2500 406.2500 20 MB

@roji
Copy link
Member Author

roji commented Jun 17, 2020

That looks quite good, at the very least you can point people to a workaround.

@ErikEJ
Copy link
Contributor

ErikEJ commented Dec 1, 2023

I have confirmed that the major improvement will be in 5.2 preview 4 and later

@tibitoth
Copy link

tibitoth commented Dec 2, 2023

@ErikEJ do you mean this?

size sync async-cm31 async-cm32
1 5.692 10.382 9.228
5 19.78 293.99 96.88
10 39.5 2,297.18 495.26
15 53.92 9,409.01 1,401.61
20 80.46 28,579.30 2,867.15

Yeah async will be much faster but sync is still 20x or 30x faster than async

@ErikEJ
Copy link
Contributor

ErikEJ commented Dec 2, 2023

@tibitoth Yes, I mean this, and I can read the numbers.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 2, 2023

The scaling is much better. The absolute values are still not equal and it requires a lot more merge work before I can implement the changes that allow them to have the same speed. We're getting there slowly.

@ErikEJ
Copy link
Contributor

ErikEJ commented Dec 10, 2023

@roji Ran your original benchmark with 5.2 preview 4:

Using 5.1.2:

BenchmarkDotNet v0.13.11, Windows 11 (10.0.22631.2715/23H2/2023Update/SunValley3)
11th Gen Intel Core i7-1165G7 2.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2

| Method | Mean        | Error      | StdDev     | Median      | Gen0      | Gen1      | Gen2      | Allocated |
|------- |------------:|-----------:|-----------:|------------:|----------:|----------:|----------:|----------:|
| Async  | 3,256.31 ms | 203.078 ms | 579.392 ms | 3,029.00 ms | 2000.0000 | 1000.0000 | 1000.0000 |  30.67 MB |
| Sync   |    46.09 ms |   1.463 ms |   4.314 ms |    45.25 ms |  454.5455 |  454.5455 |  454.5455 |     20 MB |

Using 5.2 preview 4:

BenchmarkDotNet v0.13.11, Windows 11 (10.0.22631.2715/23H2/2023Update/SunValley3)
11th Gen Intel Core i7-1165G7 2.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2

| Method | Mean      | Error     | StdDev    | Gen0      | Gen1      | Gen2      | Allocated |
|------- |----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| Async  | 636.92 ms | 12.118 ms | 15.325 ms | 2000.0000 | 1000.0000 | 1000.0000 |  30.68 MB |
| Sync   |  45.00 ms |  1.513 ms |  4.269 ms |  333.3333 |  333.3333 |  333.3333 |     20 MB |

@roji
Copy link
Member Author

roji commented Dec 11, 2023

@ErikEJ thanks for running the benchmarks and posting the results - it's great to see these improvements.

I'm still hoping to see a more fundamental refactor for sync/async support that will make them have comparable performance (and use async/await), but this is already very significant.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 11, 2023

I've already shown I can get time perf the same as sync. A total refactor is unlikely to happen because the level of complication required to meet all the current behaviours is high and the benefit low.

@roji
Copy link
Member Author

roji commented Dec 11, 2023

I respectfully disagree. Modernizing to async/await and away from the current callback-based async code - with its different (and sometimes duplicate) codepaths is IMHO vital for the maintainability and reliability of this foundational library.

@panoskj
Copy link
Contributor

panoskj commented Dec 11, 2023

I agree with @roji , the current asynchronous implementation is fundamentally wrong and this is why the performance is so bad. It is essentially wrapping a synchronous parsing implementation, repeating it each time a packet is received until it runs successfully.

Instead of reinventing a state machine to deal with the problem of stopping and resuming the execution of the parsing algorithm between packets, the natural C# solution is to use async/await where the end-result will be more maintainable and most likely more performant as well.

On the other hand this refactoring is not trivial and will take a lot of time.

I would propose a middle-ground modular solution: use async/await where it is easy to do so. I made such a proof-of-concept and my notes were:

  1. Some parts can be trivially modernized to use async/await.
  2. Some parts will take unreasonably long time to refactor but can be worked-around like @Wraith2 suggests.
  3. Using async/await where possible reduces allocations and makes the code shorter.
  4. The end result has the same performance as the sync version, with potential to be refactored into running faster than the sync version.

But then there is also the problem that there are two codebases, one for .net framework and another for .net core. That's why I am trying to merge the files required for optimizing the parsing algorithms: #2254, #2168, #1844, #1520 but it is taking years. We cannot simply make such drastic changes only for .net core (as the codebases will become significantly different, making it very hard to merge them in the future), but making the changes for both codebases amounts to a lot more work and PRs to be reviewed/merged and as a result more time (it is more efficient to merge before refactoring).

It is unclear to me whether the owners of the repository would like to prioritize merging the parser logic source code or whether they would accept PRs fixing this issue for the .net core version only, @David-Engel, @DavoudEshtehari and @JRahnama what are your thoughts on this? For the record, I am willing to help you merge the required files as fast as you can review/accept/merge them, as you have seen.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 11, 2023

The replay functionality is a bad idea but the parser works in general. If you want to replace it you'll have to abstract it, make it testable, write full test coverage and then rewrite it. I think you know that the amount of work required for that isn't going to happen in any sensible timeframe. Would it be nice? sure, it is going to happen? no. It's taking years, as you said, to simply merge two very similar codebases. Getting changes reviewed and merged is the bottleneck and that isn't going to change.

@panoskj
Copy link
Contributor

panoskj commented Dec 11, 2023

@Wraith2 I'm not talking about replacing the parser or the replay functionality, but about refactoring them gradually. I think the replay functionality can be completely removed, eventually. The async code could look like the sync, with minor differences such as the occasional async/await keywords in some places. That's the idea of async/await - writing the code as if it was executing synchronously.

I have to admit in my proof-of-concept I only refactored some of the calling functions to use async/await. I did not attempt to remove the replay functionality - this would be the next step.

@David-Engel
Copy link
Contributor

FWIW, I agree with roji, too. I'm onboard with anything that can help the situation. I dislike the original async implementation, particularly with the way it switches in and out of its syncOverAsync mode in various places. I would support anyone who wants to dedicate time to this. It's just a huge change to core code. It would be important to ensure no regressions, so testing is vital. I feel like we have pretty good test coverage, but there's always that worry. I also understand trying to address this incrementally.

It is unclear to me whether the owners of the repository would like to prioritize merging the parser logic source code or whether they would accept PRs fixing this issue for the .net core version only,

.NET Framework seems like it's going to be around for a long time and the changes will benefit both sides. So, I favor merging as much of the code as possible rather than leaving .NET Framework behind and just improving the .NET side. If we did .NET only, I feel like the code would never get merged.

For the record, I am willing to help you merge the required files as fast as you can review/accept/merge them, as you have seen.

Yes, and this has been much appreciated. I've been encouraging the team to not let code merge PRs sit too long. For example, I want to get your latest one into a 5th preview early/mid Jan for some exercise before a Feb GA (my estimates). The easier you can make reviewing them, the less likely people will stress out about having to review them and it'll happen faster. I also appreciate @Wraith2 chiming in on the PRs with his reviews and summaries to help clarify what's changed.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 11, 2023

@Wraith2 I'm not talking about replacing the parser or the replay functionality, but about refactoring them gradually. I think the replay functionality can be completely removed, eventually.

The reason that the replay functionality exists is that the parser cannot advance without consuming. It can't do this because it didn't need to in the original implementation. It also has other effects like not being able to determine if a string is wholly within the current packet, see #533 . I know how to make it possible to not replay which improves speed but it doesn't resolve the underlying cause.

The parser and more importantly the locking around it are hopelessly complicated and not testable in-situ so you can't replace them and get the same behaviour. For example look at the locking problems with the multiplexing around MARS connections. I rewrote it very very carefully and yet still was unable to get exactly the same behaviour causing it to be reverted.

If you want to try and rewrite it then go ahead. I think you're likely to find it's very hard, very time consuming and has a long bug tail, I also don't think you'll improve performance very much by doing so. From a development perspective its nice to be able to say we should just throw the mess we have away and make a new better one but from a management perspective it's a lot of work with no provable upside.

@julienFlexsoft
Copy link

EF Core defaults to nvarchar(max) for strings. We encountered some performance issues when querying those strings with ToListAsync (FirstOrDefaultAsync). Especially if there were no results, like it kept searching for too long.
If I understand correctly the issue discussed here could be the cause of this, no?
And simply changing the string columns that can fit in an nvarchar(4000) (the maximum, but not max) to it could fix those performance issue?
All columns that are bigger than 4000 chars has to remain nvarchar(max).

@JelleHissink
Copy link

JelleHissink commented Jan 15, 2024

@julienFlexsoft yes, but 4000 is based on unicode, so 2 bytes for a nchar. However sql has a maximum row size of 8060 bytes. Thus as far as I know this would cause issues for tables with 2 string fields. A better default would be nvarchar(1) or an exception if there is no maxlength given. so you are forced to think about sizes. You can also do this in the startup code, or as a unittest in your solution, just walk through the model and force all developers to give a length for string fields.

@julienFlexsoft
Copy link

@JelleHissink Thank you! Not to divert too much from the topic of this issue, but places where we store large strings (up to 30k chars) will have to remain nvarchar(max), or is there a better type?

@Wraith2
Copy link
Contributor

Wraith2 commented Jan 15, 2024

nvarchar(max) is the correct type to use.

@JonathanLydall
Copy link

EF Core defaults to nvarchar(max) for strings. We encountered some performance issues when querying those strings with ToListAsync (FirstOrDefaultAsync). Especially if there were no results, like it kept searching for too long. If I understand correctly the issue discussed here could be the cause of this, no? And simply changing the string columns that can fit in an nvarchar(4000) (the maximum, but not max) to it could fix those performance issue? All columns that are bigger than 4000 chars has to remain nvarchar(max).

My understanding of this particular GitHub issue is that one experiences very slow asynchronous result processing associated with high CPU usage of the .NET process when a very large result is returned by SQL Server, i.e. the queries may be near instant when executed using "synchronous" methods in .NET or when done in SSMS, but take hundreds or thousands of milliseconds in .NET when done with Async methods.

If one is having performance issues even with no results, then it's probably something about the schema and/or query and requires investigation using tools like SSMS's query analyzer.

@panoskj
Copy link
Contributor

panoskj commented Jan 16, 2024

@julienFlexsoft you can use varchar(max) for all columns, as @JonathanLydall correctly pointed out the schema doesn't matter: it is the contents of the column that matter. Unless, as they said, you have other issues as well (for example a varchar(max) column can mess with SQL optimizer resulting in suboptimal query performance), in which case these issues will persist even if you switch to the sync implementation. It is also likely you are experiencing both async performance issues and SQL optimizer issues at the same time.

As for the async parsing algorithm for columns, it takes O(N^2) time, where N are the packets needed to transfer the whole column from server to client, whereas the sync algorithm takes O(N) time.

Therefore, I highly recommend setting the maximum 32KB packet size in your connection string if you haven't done this already.

@rizi
Copy link

rizi commented Jan 17, 2024

@panoskj
This dokumentation states that using a value greater than 8KB could hurt performance, what do you think, is this still true for EF Core/SQL Server?

From the documentation:
Setting the default value to a number greater than 8000 will cause the packets to use the MultiPage allocator on the instance of SQL Server instead of the much more efficient SinglePage allocator, reducing the overall scalability of the SQL Server.

@panoskj
Copy link
Contributor

panoskj commented Jan 17, 2024

@panoskj This dokumentation states that using a value greater than 8KB could hurt performance, what do you think, is this still true for EF Core/SQL Server?

From the documentation: Setting the default value to a number greater than 8000 will cause the packets to use the MultiPage allocator on the instance of SQL Server instead of the much more efficient SinglePage allocator, reducing the overall scalability of the SQL Server.

I have tested the performance in real-world projects and the packet size greatly improves the performance of the SQL client when reading a lot of data (e.g. strings larger than 8KB).

The issue you are referring to is about SQL server's performance. The documentation you mentioned links to this documentation which says:

Changes to memory management starting with SQL Server 2012 (11.x)

In older versions of SQL Server, memory allocation was done using five different mechanisms:

  • Single-Page Allocator (SPA), including only memory allocations that were less than, or equal to 8 KB in the SQL Server process. The max server memory (MB) and min server memory (MB) configuration options determined the limits of physical memory that the SPA consumed. The Buffer Pool was simultaneously the mechanism for SPA, and the largest consumer of single-page allocations.
  • Multi-Page Allocator (MPA), for memory allocations that request more than 8 KB.
  • CLR Allocator, including the SQL CLR heaps and its global allocations that are created during CLR initialization.
  • Memory allocations for thread stacks in the SQL Server process.
  • Direct Windows allocations (DWA), for memory allocation requests made directly to Windows. These include Windows heap usage and direct virtual allocations made by modules that are loaded into the SQL Server process. Examples of such memory allocation requests include allocations from extended stored procedure DLLs, objects that are created by using Automation procedures (sp_OA calls), and allocations from linked server providers.

Starting with SQL Server 2012 (11.x), Single-Page allocations, Multi-Page allocations and CLR allocations are all consolidated into an "Any size" Page Allocator, and included in memory limits controlled by max server memory (MB) and min server memory (MB) configuration options. This change provided a more accurate sizing ability for all memory requirements that go through the SQL Server memory manager.

My understanding is there is no difference for the server starting with SQL Server 2012 and a negligible difference for previous versions which would only become noticeable if there were too many clients connected to the server with a large packet size int heir connection string. But I am no SQL server expert. As always, benchmark before making any decisions.

Also, let's not forget that RAM capacities have become much larger since 2012 too (the documentation you linked was probably written before that, since it doesn't mention the changes made in 2012).

@m-gasser
Copy link

According to https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-network-packet-size-server-configuration-option?view=sql-server-ver16#Restrictions
the maximum network packet size for encrypted connections is 16,383 bytes.

So we are free to choose any value >32kb and <16kb for async queries.

@ismarCopilot
Copy link

So this has been opened for a number of years now , do we have any indication on if and when this will be addressed ?

@Wraith2
Copy link
Contributor

Wraith2 commented Jan 30, 2024

No. For details read the whole thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Use this label for performance improvement activities
Projects
SqlClient Triage Board
  
Under Investigation
Development

No branches or pull requests