Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ValueTask stream overloads on SNI streams and Waits #902

Merged
merged 5 commits into from Jul 13, 2021

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Feb 7, 2021

Some more perf improvements from investigating #659

This adds overloads on netcore builds which contain ValueTask so waits which can succeed synchronously can avoid allocating tasks and completion sources. This eliminates or reduces the highlighted items below. In my benching run it increases throughput by ~30%.

Untitled

@cheenamalhotra
Copy link
Member

You might want to run some tests locally that are not completing and hanging in the pipelines, they don't seem usual errors from test lab.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 10, 2021

Test run fine locally, can someone with access re-run them?

@Wraith2 Wraith2 closed this Feb 10, 2021
@Wraith2 Wraith2 reopened this Feb 10, 2021
@cheenamalhotra
Copy link
Member

Could you run with Managed SNI enabled on Windows? Just enable that property in config.json.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 10, 2021

I ran the tests with managed mode hardcoded in the stateobject factory. Any specific things you see of concern? i checked a few errors and didn't see anything unusual.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 22, 2021

/azp help

@azure-pipelines
Copy link

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 22, 2021

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 902 in repo dotnet/SqlClient

@cheenamalhotra
Copy link
Member

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 2, 2021

I'm not sure what's going on with the tests. I rebased on master and force pushed which caused the CI to re-run and has greened a lot of the failing legs. There are still some which stop executing but give no indication why. Importantly there are managed sni tests that complete successfully.

On the legs that didn't complete the next test should have been: Passed Microsoft.Data.SqlClient.ManualTesting.Tests.AsyncTimeoutTest.TestDelayedAsyncTimeout

@cheenamalhotra
Copy link
Member

If you check here, this is actually using Managed SNI for all tests (UseManagedSNIOnWindows = true) : https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_build/results?buildId=25670&view=results

netcoreapp2.1 jobs are "hung up" after running few tests:

image

Same is with other pipelines.
I suspect this is due to some changes here.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Mar 2, 2021

Hang is captured by recently added AsyncTimeoutTests on .NET Core 2.1.
More tests were possibly also capturing it before. (e.g. BeginExecAsyncTest)

You can use below scripts to reproduce:

#build
msbuild /p:configuration=Release

#hangs forever
msbuild /p:configuration=Release /t:BuildTestsNetCore
dotnet test "src\Microsoft.Data.SqlClient\tests\ManualTests\Microsoft.Data.SqlClient.ManualTesting.Tests.csproj" /p:Configuration="Release" /p:TestTargetOS="Windowsnetcoreapp" --no-build -v n --filter "AsyncTimeoutTest"

#works
msbuild /p:configuration=Release /t:BuildTestsNetCore /p:TargetNetCoreVersion=netcoreapp3.1
dotnet test "src\Microsoft.Data.SqlClient\tests\ManualTests\Microsoft.Data.SqlClient.ManualTesting.Tests.csproj" /p:Configuration="Release" /p:TestTargetOS="Windowsnetcoreapp" --no-build -v n --filter "AsyncTimeoutTest" /p:TargetNetCoreVersion=netcoreapp3.1

Please ensure your config.json contains: "UseManagedSNIOnWindows": true

@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 2, 2021

Yup, saw that was where it hung but I can't work out why and locally it doesn't. There are legs which use the managed sni that are succeeding

CI-Win10-SQL19W-ManagedSNI (Win10-SQL19 AnyCPU,Release,netcoreapp,false,netcoreapp3.1) Successful in 18m
CI-Ub20-SQL17L-Core31 (Ubuntu 16 AnyCPU,Release,netcoreapp,true) Successful in 20m

The changes I've made are working fine for those. The netcore files are included in 2.1 and 3.1 builds so if there's a difference it's in the runtime and not this code. Given that 2.1 is out of support I could make these changes 3.1> only but I'd really like to know what that problem is rather than just avoiding it.

Is there a way to get some sort of diagnostic information out of the CI machines so I can see what they were hung on?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 2, 2021

I've found what I think is the problem but I'm going to need some help. The manual tests project can be debugged in visual studio if you change the build configuration to netcoreapp2.1-Debug and by doing that I see what's happening. What i need to be able to do is see the same situation with netcoreapp3.1-Debug but that configuration won't build and i can't work out what's wrong with it. Is there any chance you could take a look at the project config for me?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 2, 2021

I've found it and it's annoying. Netcore 2.1 implemented ValueTask overrides by calling the Task versions so you can end up with a cycle. I've changed 2.1 builds to use the netcore strategy and put a note at the top of the file explaining why. Lets see what the CI thinks.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 10, 2021

@cheenamalhotra @David-Engel can this be reviewed now please? It's been open 4 months now and if it was too risky for 3.0 for some reason that has now shipped.

@Wraith2 Wraith2 mentioned this pull request Jun 11, 2021
@JRahnama
Copy link
Member

@Wraith2 can you address the conflict here please?

@JRahnama JRahnama added this to the 4.0.0-preview1 milestone Jun 11, 2021
@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 11, 2021

Done.

@cheenamalhotra cheenamalhotra added this to In progress in SqlClient v4.0 via automation Jun 14, 2021
@cheenamalhotra cheenamalhotra moved this from In progress to Review in progress in SqlClient v4.0 Jun 14, 2021
}
else
{
return valueTask.AsTask();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the benefits of starting with a value task and returning a task? can we just start with a task and return a task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only version that does the real work is the ValueTask version so everything must call that one and then convert the result to whatever upstream type it needs. I agree that it looks wasteful but it's more complex to have two complete implementations and this is the pattern used in /runtime

In an ideal world all callers will use ValueTask directly and avoid the Task. In cases where somehow the caller can't do that (isn't aware of ValueTask) they'll incur the same task allocation penalty they were already going to have to and the only change is a couple of additional stack copy instructions. Overall it's a net win.

Since we (sqlclient) are the only caller of these functions this PR completes the vertical call chain that uses ValueTask instead of Task and we'll see the benefits, this method might only get called through sequential reader streams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In several cases like this, the consumer is not waiting them directly they are waiting for a Task and that makes me wonder if value task is a good choice.

In an ideal world all callers will use ValueTask directly and avoid the Task.

This is a bit controversial. Tasks are still preferred over value tasks. Micro benchmarking shows a bit of perf cost when using value tasks. They are easier to use correctly.

According to this article a Value task should only be used if:
a) you expect consumers of your API to only await them directly,
b) allocation-related overhead is important to avoid for your API, and
c) either you expect synchronous completion to be a very common case, or you’re able to effectively pool objects for use with asynchronous completion.

I would ask @roji's help/comment on this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fits into cases b and c. The fact that we can't change the external api surface easily to be ValueTask instead of Task based prevents it being a as well.

This is a bit controversial. Tasks are still preferred over value tasks. Micro benchmarking shows a bit of perf cost when using value tasks. They are easier to use correctly.

One of the things I do for fun is listen to the framework design council meetings when they're streamed on YouTube and I've heard the long conversations about when ValueTask is suitable and when it is not. I investigated this because I had good reason to believe that using ValueTasks is appropriate and would sometimes allow us to have allocation free sync calls. The items I highlighted in the original post show the reduction in allocations from these changes. There will be speed increases as well because we spend less time in GC but I didn't think it necessary to quantify them because they weren't the reason for making the changes.

Note that this works on netcore3 best because the runtime did the work to make SSLStream more efficient by adding ValueTask overloads, we're just following the same pattern to allow the most efficient reads we can.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it's worth introducing ValueTask implementations here - this seems to be a Stream implementation that all driver traffic goes through, and it could indeed reduce lots of allocations when the call completes synchronously (and possibly even asynchronously, depending on how optimized the rewritten SslStream is). There's indeed a lot of discussions around whether to expose ValueTask vs. Task when designing APIs, but when consuming an API that exposes ValueTask it's generally good practice to use that (unless you're sure the invocation will return asynchronously, and that no asynchronous-completion optimizations are in place in the implementation).

In my benching run it increases throughput by ~30%.

That seems like quite a high number for this change... FWIW I always find it helpful to post the actual BenchmarkDotNet results and code, to show what exactly is being tested etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to ConcurrentQueueSemaphore is a big part of it, and I think I already split that out into another PR because it was so valuable. It prevents the allocation of a TCS, Task, and two continue objects each time we want to check a lock (which is usually uncontested) for every single packet receipt.

It's great that you're making progress. I haven't review this PR thoroughly, though specifically for making ConcurrentQueueSemaphore not allocate for the uncontended case, this seems to already be the case as far as I can tell seems to have already done that for the uncontested case. Also see my comment on the non-utility of non-generic ValueTask there.

The other changes allow value task paths to be used further down. Without this PR I can't use them in SNIPacket so I've got to start somewhere and I started here. If I'd tried to use them in SNIPacket first you'd complain that it was no use doing that because SslStream didn't have it in.

@Wraith2 I'm not complaining about anything. My opinion was asked for, and I'm pointing out that this change, as it currently is, doesn't improve performance in any way, and indeed may slightly regress it. I very much support improving how async is implemented in SqlClient, though I think the real problems really are elsewhere, and not in Task vs. ValueTask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifically for making ConcurrentQueueSemaphore not allocate for the uncontended case, this seems to already be the case as far as I can tell

I split that one out after talking with cheena a few months back. It has a big impact and I wanted to make sure it got in as soon as possible. I don't keep rebasing my open PR's until there's some reason to do so, there's no point me taking the time if no-one is ready to review them, I used to do that but it took ages and the number of branches I'd have to review got out hand.

Outisde a microbenchmark designed to show the difference between ValueTask and Task implementations do you think there is likely to be a noticeable performance difference between the versions? I accept and agree with your statements but is it either harmful or wrong to have a ValueTask version?

If you can point me at the other problems I can look at them. Until then I'll keep investigating and addressing the problems I find. I still hope that one day we'll be able to get Sql Server included in the TE benchmarks. The TE Fortunes benchmark is also a simple and easily understood optimization scenario to work on so I'll come back to it often to see if there are new optimization opportunities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept and agree with your statements but is it either harmful or wrong to have a ValueTask version?

No - I don't think it's either harmful or wrong. As I wrote above, I'd implement the Task-returning versions by calling the base SslStream counterparts (rather than the ValueTask), but that's a really small suggestion.

The TE Fortunes benchmark is also a simple and easily understood optimization scenario to work on so I'll come back to it often to see if there are new optimization opportunities.

I agree that TE Fortunes is a great way to find optimization opportunities (regardless of whether results are officially published on techempower.com or not). But I'll go out on a limb here, and say that IMHO a few micro-optimizations here and there simply aren't going to have a substantial impact given the larger problems in the codebase. At some point these need to be addressed.

PS I'm still unclear on where the aforementioned allocation reductions and throughput improvements come from in this PR, since the new Stream ValueTask methods aren't called, and I can't see how the ConcurrentQueueSemaphore changes improve anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the semaphore stuff, that got merged separately. It was the perf improvement.
The stream valuetasks just enable me to use them in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, though consider #902 (comment).

public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
ValueTask<int> valueTask = ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken);
if (valueTask.IsCompletedSuccessfully)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant - ValueTask.AsTask already knows to efficiently return a completed Task.

Though I'd personally just call the corresponding Task-returning method on the base class directly, and apply the semaphore check here. At least theoretically, the Task-returning method could be doing something slightly different since it knows it's returning a Task (e.g. no need to deal with IValueTaskSource etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you think it would be better to just do return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask(); ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's basically the same. Take a look at the source for AsTask.

}
else
{
return valueTask.AsTask();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it's worth introducing ValueTask implementations here - this seems to be a Stream implementation that all driver traffic goes through, and it could indeed reduce lots of allocations when the call completes synchronously (and possibly even asynchronously, depending on how optimized the rewritten SslStream is). There's indeed a lot of discussions around whether to expose ValueTask vs. Task when designing APIs, but when consuming an API that exposes ValueTask it's generally good practice to use that (unless you're sure the invocation will return asynchronously, and that no asynchronous-completion optimizations are in place in the implementation).

In my benching run it increases throughput by ~30%.

That seems like quite a high number for this change... FWIW I always find it helpful to post the actual BenchmarkDotNet results and code, to show what exactly is being tested etc.

{
internal sealed partial class ConcurrentQueueSemaphore
{
public ValueTask WaitAsync(CancellationToken cancellationToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the point of returning ValueTask is - is this better than returning a Task? For the first path below, you can simply return Task.CompletedTask instead of new ValueTask(), and for the second one, just return tcs.Task...

Non-generic ValueTask is generally useful only in quite rare circumstances. It only really makes sense if it's going to be backed with IValueTaskSource (i.e. optimizing for asynchronous completion). It doesn't help for optimizing for synchronous completion, since you already have the option of returning Task.CompletedTask; that option doesn't exist in generic contexts, which is why ValueTask<T> is far more useful than non-generic ValueTask.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 17, 2021

Issues addressed. Concurrent queue simplified to task only version for all builds.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 5, 2021

What is blocking this?

Copy link
Member

@cheenamalhotra cheenamalhotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.
PR is also tested with repro from #262 as deadlocks occurred in the past from similar changes.

SqlClient v4.0 automation moved this from Review in progress to Reviewer approved Jul 6, 2021
@cheenamalhotra cheenamalhotra merged commit 2325ddd into dotnet:main Jul 13, 2021
SqlClient v4.0 automation moved this from Reviewer approved to Done Jul 13, 2021
@Wraith2 Wraith2 deleted the perf13 branch July 18, 2021 22:25
DavoudEshtehari pushed a commit to Wraith2/SqlClient that referenced this pull request Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants