Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: Static delegates #1060

Merged
merged 5 commits into from Jun 11, 2021
Merged

Perf: Static delegates #1060

merged 5 commits into from Jun 11, 2021

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented May 3, 2021

Now we've got c#9 i've audited all lambda expressions in the library and changed some of them over to using static delegates which allows the compiler to cache the delegate on first use.

I also adjusted some other callsites to use state parameters to reduce the size of the closure slightly or allow onFailure, or onCancellation delegates to be static.

I tried to apply a consistent style to all the lambda usage layout spacing named params etc. It feels better to me overall but let me know if you want a different style.

@jinek
Copy link
Contributor

jinek commented May 5, 2021

I think you are confusing anonymous methods and expression trees both of which can be created using lambda expressions.

Anonymous methods are compiled by c# compiler once at compilation time, while expression trees can be compiled to a delegate at runtime.
I've created small example to demonstrate how anonymous methods are compiled:
image

You can also investigate this using windbg tool which shows the method table information for a given type-object.

Static anonymous function has been introduced to restrict the access to instance and local variables if it is not intended so by the design.
However, one of the main benefits of anonymous delegates and local functions is the possibility to capture the context of a method so you don't have to pass it via parameters or explicitely define fields of a class to contain that context.
And this is my point here: you are increasing the code base to remove the use of the feature which has been introduced to decrease the codebase:
image

@Wraith2
Copy link
Contributor Author

Wraith2 commented May 5, 2021

Captures contribute to the size and number of hidden classes generated by the compiler. Passing using existing state parameter placement decreases or obviates the need for captures at all. Using static lambdas allows the compiler to cache the delegate again avoiding per-call allocations.

I do know what I'm doing here. Try running a memory profile of an async test case with and without the PR and see what difference it makes.

@jinek
Copy link
Contributor

jinek commented May 5, 2021

Captures contribute to the size and number of hidden classes generated by the compiler. Passing using existing state parameter placement decreases or obviates the need for captures at all. Using static lambdas allows the compiler to cache the delegate again avoiding per-call allocations.

If you are talking about delegate object itself then does it make sense to do optimization in this code? Because probably even the average connection string is several times bigger than a delegate and captured state together.

I do know what I'm doing here. Try running a memory profile of an async test case with and without the PR and see what difference it makes.

Could you please share the result of your experiment?
(because from your PR the only metric seen at the moment has negative impact)

@Wraith2
Copy link
Contributor Author

Wraith2 commented May 5, 2021

If you are talking about delegate object itself then does it make sense to do optimization in this code? Because probably even the average connection string is several times bigger than a delegate and captured state together.

Each connection string is allocated once and re-used. The paths through these methods allocate on each call. If you call them once you'll see a very minor improvement, if you call them thousands or millions of times that adds up considerably. It's a question of scale.

Results from running the https://github.com/aspnet/DataAccessPerformance which is an implementation of the TE Fortunes benchmark.

Test Type Threads Througput StdDev branch
ado-sqlclient+async+16 async 16 26411 1040 perf-staticdelegates
ado-sqlclient+async+16 async 16 26165 1101 main

So a 0.9% throughput increase from some very minor changes.

These changes barely impact the size of the code in comparison to the overall size of the codebase. In general the size of the code is far less relevant than the correctness and speed. If your main concern is code size at the driver level I think your priorities are wrong.

@Wraith2
Copy link
Contributor Author

Wraith2 commented May 5, 2021

@cmeyertons there are a number of changes in bulk copy here and they may improve the performance enough to outweigh the problem in #1048 if you want to give them a try.

@jinek
Copy link
Contributor

jinek commented May 7, 2021

Each connection string is allocated once and re-used. The paths through these methods allocate on each call. If you call them once you'll see a very minor improvement, if you call them thousands or millions of times that adds up considerably. It's a question of scale.

Sure, I completely agree scaling is important as well as performance overall. It's just, even if we run simple non-query request with new connection, it consumes about 25Kb, and empty delegate is 32B . Compiler does optimization and caches the delegate if possible whether it's static or not, it not necessary creates hidden classes if there is no need to capture the stack, right?

Results from running the https://github.com/aspnet/DataAccessPerformance which is an implementation of the TE Fortunes benchmark.
Test Type Threads Througput StdDev branch
ado-sqlclient+async+16 async 16 26411 1040 perf-staticdelegates
ado-sqlclient+async+16 async 16 26165 1101 main

So a 0.9% throughput increase from some very minor changes.

These changes barely impact the size of the code in comparison to the overall size of the codebase. In general the size of the code is far less relevant than the correctness and speed. If your main concern is code size at the driver level I think your priorities are wrong.

That interesting tool uses DateTime.UtcNow to measure time intervals, Task.Wait for synchronization etc.
But there are methodologies of measurements, this topic is not new. I'd suggest you to measure with something like https://benchmarkdotnet.org/ and check if your improvements do more than the measurement error.
Also I believe, you, while being an expirienced dev, could investigate some of the allocations which you can find in mem profiler tools, for example, I would check why it allocates many bytes, may be it does some redundant IO. Or, for example, what is going on in _SqlMetadata
image

@Wraith2
Copy link
Contributor Author

Wraith2 commented May 7, 2021

If you believe that there is work that can be done on the entries in that profile then please go ahead, I've already made some changes in #521 and dotnet/corefx#34393 but any additional improvements you can make are welcome.

I didn't write that tool. The ASP net core team did. I was pointed to it as a reliable perf measure by one of the maintainers of this library when I started working on it back when it was System.Data.SqlClient in corefx. I use a variety of tools including profilers, benchmarks, user provided and synthetic scenario tests to identify functional and performance issues. Have a look through my PR history and you'll find all sort of things.

This code demonstrably and measurably produces a small performance increase. It may be a small increase but scale is important as we've agreed. We can go all the way down to IL and ASM but I really don't see the need to do that. Do you have any specific problem with the code change as written other than it slightly increases code complexity and size?

@jinek
Copy link
Contributor

jinek commented May 7, 2021

Do you have any specific problem with the code change as written other than it slightly increases code complexity and size?

no

@DavoudEshtehari DavoudEshtehari self-assigned this Jun 8, 2021
Copy link
Member

@DavoudEshtehari DavoudEshtehari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double-check; The following methods have used lambda expression. Please look at them if they're worth being static:

  • Net Core:
    • SqlBulkCopy.WriteToServerInternalRestAsync
    • SqlCommand.GetParameterEncryptionDataReaderAsync
    • SqlConnection.OnError

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 9, 2021

  • SqlBulkCopy.WriteToServerInternalRestAsync : Closes over 4 parameters including this. We can't avoid the allocation and a 4 tuple is less readable than the closure. I haven't profiled SqlBulkCopy scenarios so I don't have information that makes me think that changing this is beneficial at the moment.
  • GetParameterEncryptionDataReaderAsync: generally the same as WriteToServerInternalRestAsync . The result is cached so it isn't per-call. I haven't started profiling always encrypted scenarios yet so I don't know if this is hot, the cache means it should not be.
  • OnError: only 2 parameters so it could be tupled easily and fairly clearly but the allocation still can't be avoided. We're about to throw an exception which is going to allocation the exception, it's message, error collection, error collection contents, stacktrace. Attempting to save memory at this point only hurts code clarity it does nothing to benefit the user or us.

In general the more parameters are captured by a closure the less likely it is that it is worth converting. Tuples hurt code clarity and we can deal with some of that for improved performance of users but there is a limit somewhere and more than 3 parameters is where I currently feel readability suffers more than users benefit. This can in change (and may have done since I've been working on perf) with new information like profiling results and discussions.

The changes made to convert to static delegates in this PR were not driven by profile results so they should retain clarity as much as possible while improving performance. I think I would only reduce clarity further if I had a very strong argument that it would improve performance.

Let me know what you think and whether I've gone too far in reducing clarity. There may be cases where I've converted to tuples in the past that it might be better to use closures for clarity.

@DavoudEshtehari
Copy link
Member

Generally, I agree with keeping code legible; I think except the number of parameters in a tuple, it's important to avoid using it in a medium or large block of code. If a block of code is less than 10 lines, it seems reasonable to use a tuple with more than three parameters. And, definitely, it's a trade off among performance, memory allocation, and readability which should be considered. I would say go ahead with your best judgement. Let's see if anyone has other idea.

);
}
}, ctoken); // We do not need to propagate exception, etc, from reconnect task, we just need to wait for it to finish.
return tcs.Task;
}
else
{
AsyncHelper.WaitForCompletion(reconnectTask, BulkCopyTimeout, () => { throw SQL.CR_ReconnectTimeout(); }, rethrowExceptions: false);
AsyncHelper.WaitForCompletion(reconnectTask, BulkCopyTimeout, static () => throw SQL.CR_ReconnectTimeout(), rethrowExceptions: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quiet sure or probably did not understand what benefit we get from changing this particular line to static? Isn't the purpose of static expression to prevent unintentional capture of local variables or instance state by the lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the purpose. There is also a useful side effect of the way that the compiler implements it because it's a static delegate means that the delegate can be cached. This sharplab example shows that you get a hidden static class generated and that the action delegate is created the first time it is needed and re-used every time it is called after that. So we get lazy single allocation for free with a nice syntax.

    public void B()
    {
       AsyncHelper.WaitForCompletion(null, 0, static () => throw SQL.CR_ReconnectTimeout(), rethrowExceptions: false);
    }

is lowered to:

    [Serializable]
    [CompilerGenerated]
    private sealed class <>c
    {
        public static readonly <>c <>9 = new <>c();

        public static Action <>9__0_0;

        internal void <B>b__0_0()
        {
            throw SQL.CR_ReconnectTimeout();
        }
    }

    public void B()
    {
        AsyncHelper.WaitForCompletion(null, 0, <>c.<>9__0_0 ?? (<>c.<>9__0_0 = new Action(<>c.<>9.<B>b__0_0)), false);
    }

@DavoudEshtehari
Copy link
Member

DavoudEshtehari commented Jun 11, 2021

/azp run CI-Ub18-Enclave-NetStd

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@JRahnama
Copy link
Member

@Wraith2 Can you trigger a build by making a small change in your PR and make the tests to run once more? I have tried from our side and somethings are missing...

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 11, 2021

Fixed up a couple of anonymous typed lambdas.

@JRahnama
Copy link
Member

JRahnama commented Jun 11, 2021

@Wraith2 LGTM, I am waiting for the Enclave tests to finish and see if the failures are fixed after I cleaned up the databases. I am jumping to other PRs. Do you have any blocking PRs that needs to be done before other steps to come in?

@JRahnama JRahnama added this to the 4.0.0-preview1 milestone Jun 11, 2021
@Wraith2
Copy link
Contributor Author

Wraith2 commented Jun 11, 2021

I'd like to get #934 #925 and #902 finished to unblock other work if possible.

@JRahnama JRahnama merged commit f17bba2 into dotnet:main Jun 11, 2021
@cheenamalhotra cheenamalhotra added this to In progress in SqlClient v4.0 via automation Jun 14, 2021
@cheenamalhotra cheenamalhotra moved this from In progress to Done in SqlClient v4.0 Jun 14, 2021
@Wraith2 Wraith2 deleted the perf-staticdelegates branch June 14, 2021 23:13
panoskj added a commit to panoskj/SqlClient that referenced this pull request Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants