Add ValueTask stream overloads on SNI streams and Waits #902

Wraith2 · 2021-02-07T21:31:58Z

Some more perf improvements from investigating #659

This adds overloads on netcore builds which contain ValueTask so waits which can succeed synchronously can avoid allocating tasks and completion sources. This eliminates or reduces the highlighted items below. In my benching run it increases throughput by ~30%.

cheenamalhotra · 2021-02-10T07:23:51Z

You might want to run some tests locally that are not completing and hanging in the pipelines, they don't seem usual errors from test lab.

Wraith2 · 2021-02-10T10:18:11Z

Test run fine locally, can someone with access re-run them?

cheenamalhotra · 2021-02-10T16:08:53Z

Could you run with Managed SNI enabled on Windows? Just enable that property in config.json.

Wraith2 · 2021-02-10T16:20:16Z

I ran the tests with managed mode hardcoded in the stateobject factory. Any specific things you see of concern? i checked a few errors and didn't see anything unusual.

cheenamalhotra · 2021-02-10T16:31:29Z

Try with netcoreapp2.1

Manual Test hung up.
https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_build/results?buildId=23901&view=logs&jobId=6399d89d-3d30-5fff-3e92-07bbaf7ebe2f&j=6399d89d-3d30-5fff-3e92-07bbaf7ebe2f&t=7b11a3c3-d33f-5c99-5274-29f3db18094a

Wraith2 · 2021-02-22T16:31:10Z

/azp help

azure-pipelines · 2021-02-22T16:31:16Z

Supported commands help: Get descriptions, examples and documentation about supported commands Example: help "command_name" list: List all pipelines for this repository using a comment. Example: "list" run: Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run. Example: "run" or "run pipeline_name, pipeline_name, pipeline_name" where: Report back the Azure DevOps orgs that are related to this repository and org Example: "where" See additional documentation.

Wraith2 · 2021-02-22T16:31:42Z

/azp run

azure-pipelines · 2021-02-22T16:31:48Z

Commenter does not have sufficient privileges for PR 902 in repo dotnet/SqlClient

cheenamalhotra · 2021-02-22T18:54:34Z

/azp run

azure-pipelines · 2021-02-22T18:54:40Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

Wraith2 · 2021-03-02T09:08:13Z

I'm not sure what's going on with the tests. I rebased on master and force pushed which caused the CI to re-run and has greened a lot of the failing legs. There are still some which stop executing but give no indication why. Importantly there are managed sni tests that complete successfully.

On the legs that didn't complete the next test should have been: Passed Microsoft.Data.SqlClient.ManualTesting.Tests.AsyncTimeoutTest.TestDelayedAsyncTimeout

cheenamalhotra · 2021-03-02T09:39:37Z

If you check here, this is actually using Managed SNI for all tests (UseManagedSNIOnWindows = true) : https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_build/results?buildId=25670&view=results

netcoreapp2.1 jobs are "hung up" after running few tests:

Same is with other pipelines.
I suspect this is due to some changes here.

cheenamalhotra · 2021-03-02T09:51:56Z

Hang is captured by recently added AsyncTimeoutTests on .NET Core 2.1.
More tests were possibly also capturing it before. (e.g. BeginExecAsyncTest)

You can use below scripts to reproduce:

#build
msbuild /p:configuration=Release

#hangs forever
msbuild /p:configuration=Release /t:BuildTestsNetCore
dotnet test "src\Microsoft.Data.SqlClient\tests\ManualTests\Microsoft.Data.SqlClient.ManualTesting.Tests.csproj" /p:Configuration="Release" /p:TestTargetOS="Windowsnetcoreapp" --no-build -v n --filter "AsyncTimeoutTest"

#works
msbuild /p:configuration=Release /t:BuildTestsNetCore /p:TargetNetCoreVersion=netcoreapp3.1
dotnet test "src\Microsoft.Data.SqlClient\tests\ManualTests\Microsoft.Data.SqlClient.ManualTesting.Tests.csproj" /p:Configuration="Release" /p:TestTargetOS="Windowsnetcoreapp" --no-build -v n --filter "AsyncTimeoutTest" /p:TargetNetCoreVersion=netcoreapp3.1

Please ensure your config.json contains: "UseManagedSNIOnWindows": true

Wraith2 · 2021-03-02T11:22:14Z

Yup, saw that was where it hung but I can't work out why and locally it doesn't. There are legs which use the managed sni that are succeeding

CI-Win10-SQL19W-ManagedSNI (Win10-SQL19 AnyCPU,Release,netcoreapp,false,netcoreapp3.1) Successful in 18m

CI-Ub20-SQL17L-Core31 (Ubuntu 16 AnyCPU,Release,netcoreapp,true) Successful in 20m

The changes I've made are working fine for those. The netcore files are included in 2.1 and 3.1 builds so if there's a difference it's in the runtime and not this code. Given that 2.1 is out of support I could make these changes 3.1> only but I'd really like to know what that problem is rather than just avoiding it.

Is there a way to get some sort of diagnostic information out of the CI machines so I can see what they were hung on?

Wraith2 · 2021-03-02T13:06:59Z

I've found what I think is the problem but I'm going to need some help. The manual tests project can be debugged in visual studio if you change the build configuration to netcoreapp2.1-Debug and by doing that I see what's happening. What i need to be able to do is see the same situation with netcoreapp3.1-Debug but that configuration won't build and i can't work out what's wrong with it. Is there any chance you could take a look at the project config for me?

Wraith2 · 2021-03-02T20:43:26Z

I've found it and it's annoying. Netcore 2.1 implemented ValueTask overrides by calling the Task versions so you can end up with a cycle. I've changed 2.1 builds to use the netcore strategy and put a note at the top of the file explaining why. Lets see what the CI thinks.

Wraith2 · 2021-06-10T20:11:19Z

@cheenamalhotra @David-Engel can this be reviewed now please? It's been open 4 months now and if it was too risky for 3.0 for some reason that has now shipped.

JRahnama · 2021-06-11T20:45:40Z

@Wraith2 can you address the conflict here please?

Wraith2 · 2021-06-11T21:52:47Z

Done.

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIStreams.Task.cs

JRahnama · 2021-06-16T16:29:34Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIStreams.ValueTask.cs

+            }
+            else
+            {
+                return valueTask.AsTask();


What are the benefits of starting with a value task and returning a task? can we just start with a task and return a task?

The only version that does the real work is the ValueTask version so everything must call that one and then convert the result to whatever upstream type it needs. I agree that it looks wasteful but it's more complex to have two complete implementations and this is the pattern used in /runtime

In an ideal world all callers will use ValueTask directly and avoid the Task. In cases where somehow the caller can't do that (isn't aware of ValueTask) they'll incur the same task allocation penalty they were already going to have to and the only change is a couple of additional stack copy instructions. Overall it's a net win.

Since we (sqlclient) are the only caller of these functions this PR completes the vertical call chain that uses ValueTask instead of Task and we'll see the benefits, this method might only get called through sequential reader streams.

In several cases like this, the consumer is not waiting them directly they are waiting for a Task and that makes me wonder if value task is a good choice.

In an ideal world all callers will use ValueTask directly and avoid the Task.

This is a bit controversial. Tasks are still preferred over value tasks. Micro benchmarking shows a bit of perf cost when using value tasks. They are easier to use correctly.

According to this article a Value task should only be used if:
a) you expect consumers of your API to only await them directly,
b) allocation-related overhead is important to avoid for your API, and
c) either you expect synchronous completion to be a very common case, or you’re able to effectively pool objects for use with asynchronous completion.

I would ask @roji's help/comment on this as well.

This fits into cases b and c. The fact that we can't change the external api surface easily to be ValueTask instead of Task based prevents it being a as well.

This is a bit controversial. Tasks are still preferred over value tasks. Micro benchmarking shows a bit of perf cost when using value tasks. They are easier to use correctly.

One of the things I do for fun is listen to the framework design council meetings when they're streamed on YouTube and I've heard the long conversations about when ValueTask is suitable and when it is not. I investigated this because I had good reason to believe that using ValueTasks is appropriate and would sometimes allow us to have allocation free sync calls. The items I highlighted in the original post show the reduction in allocations from these changes. There will be speed increases as well because we spend less time in GC but I didn't think it necessary to quantify them because they weren't the reason for making the changes.

Note that this works on netcore3 best because the runtime did the work to make SSLStream more efficient by adding ValueTask overloads, we're just following the same pattern to allow the most efficient reads we can.

I do agree it's worth introducing ValueTask implementations here - this seems to be a Stream implementation that all driver traffic goes through, and it could indeed reduce lots of allocations when the call completes synchronously (and possibly even asynchronously, depending on how optimized the rewritten SslStream is). There's indeed a lot of discussions around whether to expose ValueTask vs. Task when designing APIs, but when consuming an API that exposes ValueTask it's generally good practice to use that (unless you're sure the invocation will return asynchronously, and that no asynchronous-completion optimizations are in place in the implementation).

In my benching run it increases throughput by ~30%.

That seems like quite a high number for this change... FWIW I always find it helpful to post the actual BenchmarkDotNet results and code, to show what exactly is being tested etc.

The change to ConcurrentQueueSemaphore is a big part of it, and I think I already split that out into another PR because it was so valuable. It prevents the allocation of a TCS, Task, and two continue objects each time we want to check a lock (which is usually uncontested) for every single packet receipt.

It's great that you're making progress. I haven't review this PR thoroughly, though specifically for making ConcurrentQueueSemaphore not allocate for the uncontended case, this seems to already be the case as far as I can tell seems to have already done that for the uncontested case. Also see my comment on the non-utility of non-generic ValueTask there.

The other changes allow value task paths to be used further down. Without this PR I can't use them in SNIPacket so I've got to start somewhere and I started here. If I'd tried to use them in SNIPacket first you'd complain that it was no use doing that because SslStream didn't have it in.

@Wraith2 I'm not complaining about anything. My opinion was asked for, and I'm pointing out that this change, as it currently is, doesn't improve performance in any way, and indeed may slightly regress it. I very much support improving how async is implemented in SqlClient, though I think the real problems really are elsewhere, and not in Task vs. ValueTask.

specifically for making ConcurrentQueueSemaphore not allocate for the uncontended case, this seems to already be the case as far as I can tell

I split that one out after talking with cheena a few months back. It has a big impact and I wanted to make sure it got in as soon as possible. I don't keep rebasing my open PR's until there's some reason to do so, there's no point me taking the time if no-one is ready to review them, I used to do that but it took ages and the number of branches I'd have to review got out hand.

Outisde a microbenchmark designed to show the difference between ValueTask and Task implementations do you think there is likely to be a noticeable performance difference between the versions? I accept and agree with your statements but is it either harmful or wrong to have a ValueTask version?

If you can point me at the other problems I can look at them. Until then I'll keep investigating and addressing the problems I find. I still hope that one day we'll be able to get Sql Server included in the TE benchmarks. The TE Fortunes benchmark is also a simple and easily understood optimization scenario to work on so I'll come back to it often to see if there are new optimization opportunities.

I accept and agree with your statements but is it either harmful or wrong to have a ValueTask version?

No - I don't think it's either harmful or wrong. As I wrote above, I'd implement the Task-returning versions by calling the base SslStream counterparts (rather than the ValueTask), but that's a really small suggestion.

The TE Fortunes benchmark is also a simple and easily understood optimization scenario to work on so I'll come back to it often to see if there are new optimization opportunities.

I agree that TE Fortunes is a great way to find optimization opportunities (regardless of whether results are officially published on techempower.com or not). But I'll go out on a limb here, and say that IMHO a few micro-optimizations here and there simply aren't going to have a substantial impact given the larger problems in the codebase. At some point these need to be addressed.

PS I'm still unclear on where the aforementioned allocation reductions and throughput improvements come from in this PR, since the new Stream ValueTask methods aren't called, and I can't see how the ConcurrentQueueSemaphore changes improve anything.

Ignore the semaphore stuff, that got merged separately. It was the perf improvement.
The stream valuetasks just enable me to use them in a future PR.

OK, though consider #902 (comment).

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIStreams.cs

roji · 2021-06-16T23:19:59Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIStreams.ValueTask.cs

+        public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
+        {
+            ValueTask<int> valueTask = ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken);
+            if (valueTask.IsCompletedSuccessfully)


This seems redundant - ValueTask.AsTask already knows to efficiently return a completed Task.

Though I'd personally just call the corresponding Task-returning method on the base class directly, and apply the semaphore check here. At least theoretically, the Task-returning method could be doing something slightly different since it knows it's returning a Task (e.g. no need to deal with IValueTaskSource etc.).

So you think it would be better to just do return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask(); ?

Yeah, it's basically the same. Take a look at the source for AsTask.

roji · 2021-06-16T23:34:34Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIStreams.ValueTask.cs

+            }
+            else
+            {
+                return valueTask.AsTask();


I do agree it's worth introducing ValueTask implementations here - this seems to be a Stream implementation that all driver traffic goes through, and it could indeed reduce lots of allocations when the call completes synchronously (and possibly even asynchronously, depending on how optimized the rewritten SslStream is). There's indeed a lot of discussions around whether to expose ValueTask vs. Task when designing APIs, but when consuming an API that exposes ValueTask it's generally good practice to use that (unless you're sure the invocation will return asynchronously, and that no asynchronous-completion optimizations are in place in the implementation).

In my benching run it increases throughput by ~30%.

That seems like quite a high number for this change... FWIW I always find it helpful to post the actual BenchmarkDotNet results and code, to show what exactly is being tested etc.

roji · 2021-06-17T00:44:24Z

...ta.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/ConcurrentQueueSemaphore.NetCoreApp.cs

+{
+    internal sealed partial class ConcurrentQueueSemaphore
+    {
+        public ValueTask WaitAsync(CancellationToken cancellationToken)


I'm not sure what the point of returning ValueTask is - is this better than returning a Task? For the first path below, you can simply return Task.CompletedTask instead of new ValueTask(), and for the second one, just return tcs.Task...

Non-generic ValueTask is generally useful only in quite rare circumstances. It only really makes sense if it's going to be backed with IValueTaskSource (i.e. optimizing for asynchronous completion). It doesn't help for optimizing for synchronous completion, since you already have the option of returning Task.CompletedTask; that option doesn't exist in generic contexts, which is why ValueTask<T> is far more useful than non-generic ValueTask.

...a.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/ConcurrentQueueSemaphore.NetStandard.cs

Wraith2 · 2021-06-17T18:39:14Z

Issues addressed. Concurrent queue simplified to task only version for all builds.

Wraith2 · 2021-07-05T19:26:44Z

What is blocking this?

cheenamalhotra

Overall LGTM.
PR is also tested with repro from #262 as deadlocks occurred in the past from similar changes.

src/Microsoft.Data.SqlClient/netcore/src/Microsoft.Data.SqlClient.csproj

Wraith2 closed this Feb 10, 2021

Wraith2 reopened this Feb 10, 2021

Wraith2 force-pushed the perf13 branch from d007fb5 to 466062b Compare March 2, 2021 00:11

Wraith2 force-pushed the perf13 branch from dcce4c9 to dce70b3 Compare March 14, 2021 15:57

Base automatically changed from master to main March 15, 2021 17:54

Wraith2 mentioned this pull request Mar 25, 2021

Perf: Add non-waited semaphore check #1008

Merged

Wraith2 force-pushed the perf13 branch from be92dee to 08c188d Compare April 7, 2021 18:17

Wraith2 mentioned this pull request Jun 11, 2021

Perf: Static delegates #1060

Merged

JRahnama requested review from DavoudEshtehari and JRahnama June 11, 2021 20:46

JRahnama added this to the 4.0.0-preview1 milestone Jun 11, 2021

Wraith2 force-pushed the perf13 branch from 08c188d to 3d99390 Compare June 11, 2021 21:49

cheenamalhotra added this to In progress in SqlClient v4.0 via automation Jun 14, 2021

cheenamalhotra moved this from In progress to Review in progress in SqlClient v4.0 Jun 14, 2021

JRahnama reviewed Jun 16, 2021

View reviewed changes

roji reviewed Jun 16, 2021

View reviewed changes

roji reviewed Jun 17, 2021

View reviewed changes

vonzshik reviewed Jun 17, 2021

View reviewed changes

...a.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/ConcurrentQueueSemaphore.NetStandard.cs Outdated Show resolved Hide resolved

vonzshik reviewed Jun 17, 2021

View reviewed changes

...a.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/ConcurrentQueueSemaphore.NetStandard.cs Outdated Show resolved Hide resolved

Wraith2 added 5 commits June 17, 2021 17:21

resolve conflicts

a47e779

remove ValueTask overrides on netcore2.1

d3d24c8

fixup project file after rebase

52ebec6

address feedback

001e995

address feedback

4d612cc

Wraith2 force-pushed the perf13 branch from 92efd8e to 4d612cc Compare June 17, 2021 18:36

cheenamalhotra approved these changes Jul 5, 2021

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft.Data.SqlClient.csproj Show resolved Hide resolved

cheenamalhotra reviewed Jul 5, 2021

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft.Data.SqlClient.csproj Show resolved Hide resolved

cheenamalhotra requested a review from johnnypham July 6, 2021 00:50

SqlClient v4.0 automation moved this from Review in progress to Reviewer approved Jul 6, 2021

JRahnama approved these changes Jul 6, 2021

View reviewed changes

cheenamalhotra merged commit 2325ddd into dotnet:main Jul 13, 2021

SqlClient v4.0 automation moved this from Reviewer approved to Done Jul 13, 2021

Wraith2 deleted the perf13 branch July 18, 2021 22:25

DavoudEshtehari pushed a commit to Wraith2/SqlClient that referenced this pull request Sep 1, 2021

Add ValueTask stream overloads on SNI streams and Waits (dotnet#902)

1ee0cba

Add ValueTask stream overloads on SNI streams and Waits #902

Add ValueTask stream overloads on SNI streams and Waits #902

Conversation

Wraith2 commented Feb 7, 2021

cheenamalhotra commented Feb 10, 2021

Wraith2 commented Feb 10, 2021

cheenamalhotra commented Feb 10, 2021

Wraith2 commented Feb 10, 2021

cheenamalhotra commented Feb 10, 2021 • edited

Wraith2 commented Feb 22, 2021

azure-pipelines bot commented Feb 22, 2021

Wraith2 commented Feb 22, 2021

azure-pipelines bot commented Feb 22, 2021

cheenamalhotra commented Feb 22, 2021

azure-pipelines bot commented Feb 22, 2021

Wraith2 commented Mar 2, 2021

cheenamalhotra commented Mar 2, 2021

cheenamalhotra commented Mar 2, 2021 • edited

Wraith2 commented Mar 2, 2021

Wraith2 commented Mar 2, 2021

Wraith2 commented Mar 2, 2021

Wraith2 commented Jun 10, 2021

JRahnama commented Jun 11, 2021

Wraith2 commented Jun 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wraith2 commented Jun 17, 2021

Wraith2 commented Jul 5, 2021

cheenamalhotra left a comment

Choose a reason for hiding this comment

cheenamalhotra commented Feb 10, 2021 •

edited

cheenamalhotra commented Mar 2, 2021 •

edited