Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and async improvements in SSRP, connection establishment and MARS connections #2451

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

edwardneal
Copy link

@edwardneal edwardneal commented Apr 7, 2024

This PR eliminates sync-over-async in SSRP, starts to simplify TCP connection establishment in the managed SNI provider by using async/await and slightly improves performance when building the pre-login packet. It lays some infrastructure down for PR #2384, although TdsParser would probably need to be merged before opening a connection can be made fully asynchronous.

  • Updated the SSRP utility class to make this end-to-end async (with a synchronous path for current compatibility.)
  • Adjusted DNS resolution for connections to named instances: rather than perform a second DNS lookup, we reuse the results of the DNS lookup which we use to send a SQL Browser request.
  • Small update to SQLFallbackDNS cache to make the IP addresses and port number strongly typed, eliminating a layer of ToString and Parse.
  • The MARS connection header (SNISMUXHeader) is now a ref struct, which slightly reduces memory usage and should hopefully reduce GC overhead for heavily-used connections.
  • Removed unnecessary cast of SNIMarsHandle._callbackObject on the hot path - declared it as TdsParserStateObject.
  • Conditionally removed one layer of locking on the TDS stream: when SSL is enabled and the underlying network stream is wrapped in an SNISslStream, there's no need to lock on both the inner and the outer streams.
  • Removed an instance of sync-over-async-over-sync in SNITcpHandle.Connect: .NET 6.0 introduced an overload of Socket.ConnectAsync which accepted a cancellation token, so there's no need to simulate a connection timeout period by waiting on a task which synchronously invokes Socket.Connect. Also simplified SNITcpHandle.ParallelConnectAsync by removing an instance of manually setting up task callbacks.
  • Building the pre-login packet now uses a stack-allocated byte array and writes it in one instance to the stream. This packet is fairly small - 51 or 305 bytes, depending upon whether we use .NET Core or Framework.

I also noticed while benchmarking that the benchmarking project didn't run on .NET 8.0, so I've bumped the version of BenchmarkDotNet to one which supports this.

Edit: I can see a number of test failures - will re-run them locally and resolve.

This isn't in use yet, but once TCP connection establishment is fully async, it'll be available.
Also incremented the version of BenchmarkDotNet (so that the benchmarks can run with .NET 8.0.)
Made the various fields strongly-typed to eliminate casting/parsing on the hot path.
This removes an explicit task continuation, slightly reduces memory usage and makes use of the newer .NET 6.0+ APIs to simplify socket connection cancellation
* Forcing the flush of all streams before they get switched over to SSL/TLS streams.
* Forcing the task sending the pre-login packet to be sent before we try to process the response.
Minor tweaks here:
* Removed two copies from SSRP.
* Switched a few instances of manual bit-shifting to .NET intrinsics.
If an SNINetworkStream is wrapped by an SNISslStream, there's already a layer of synchronisation - we don't need to keep the inner one enabled.
* The SNI SMUX header is now a ref struct
* Now using intrinsics to write out the message header
Also updated the reading of header lengths to use intrinsics.
Also corrected runnerconfig to allow it to bypass SSL verification by default (for default localhost usage.)
…SNIMarsHeader struct rather than reading into it.

Also caching UdpClient instances on a per-AddressFamily basis to prevent reallocating them for the same family.

Removed the instantiation of one delegate instance to reduce memory usage slightly.
One exception to this is that we're writing a byte array rather than a span to the packet - TdsParserStateObject doesn't support it there
Removing one implicit string allocation if a UDP request fails and tracing is disabled.
This commit lays the groundwork to eliminate the second DNS lookup required after SSRP.
If a DNS lookup was needed for SSRP, forward its results to the TCP connection creation
@ErikEJ
Copy link
Contributor

ErikEJ commented Apr 7, 2024

Can you share some benchmark results?

SSRP timeout returns an OperationCanceledException inside an AggregateException
@edwardneal
Copy link
Author

The absolute difference in result times is pretty minor - it's around a 10% reduction in an execution time of 2-2.5ms. On average memory usage has dropped by around 3-4KB.

No multi-subnet failover

Baseline

Method MARS Pooling Mean Error StdDev Completed Work Items Lock Contentions Allocated
OpenConnection False False 2,252.1 μs 54.82 μs 109.48 μs 5.0000 0.0333 115.51 KB
OpenAsyncConnection False False 2,336.7 μs 10.13 μs 18.79 μs 7.0000 - 163.46 KB
OpenConnection False True 161.1 μs 79.17 μs 159.92 μs - - 1.4 KB
OpenAsyncConnection False True 163.1 μs 80.39 μs 162.40 μs - - 1.59 KB
OpenConnection True False 2,159.7 μs 7.81 μs 15.24 μs 8.8667 - 142.08 KB
OpenAsyncConnection True False 2,414.4 μs 12.39 μs 23.87 μs 10.9000 0.0333 181.78 KB
OpenConnection True True 159.1 μs 78.41 μs 158.39 μs - - 1.39 KB
OpenAsyncConnection True True 158.8 μs 78.30 μs 158.17 μs - - 1.58 KB

Post changes

Method MARS Pooling Mean Error StdDev Completed Work Items Lock Contentions Allocated
OpenConnection False False 2,050.8 μs 48.74 μs 97.34 μs 7.0000 - 112.39 KB
OpenAsyncConnection False False 2,129.9 μs 10.90 μs 21.51 μs 9.0000 0.0333 160.35 KB
OpenConnection False True 160.9 μs 79.03 μs 159.64 μs - - 1.4 KB
OpenAsyncConnection False True 161.6 μs 79.50 μs 160.59 μs - - 1.56 KB
OpenConnection True False 1,972.0 μs 7.61 μs 14.48 μs 10.8667 - 138.28 KB
OpenAsyncConnection True False 2,219.9 μs 5.83 μs 11.78 μs 12.9000 - 178.06 KB
OpenConnection True True 160.5 μs 79.17 μs 159.93 μs - - 1.39 KB
OpenAsyncConnection True True 157.7 μs 77.71 μs 156.97 μs - - 1.58 KB
Multi-subnet failover

Baseline

Method MARS Pooling Mean Error StdDev Completed Work Items Lock Contentions Allocated
OpenConnection False False 2,394.4 μs 52.83 μs 103.05 μs 12.0000 0.0333 116.99 KB
OpenAsyncConnection False False 2,453.3 μs 25.08 μs 45.86 μs 14.0000 0.0667 164.95 KB
OpenConnection False True 161.8 μs 79.58 μs 160.75 μs - - 1.4 KB
OpenAsyncConnection False True 162.0 μs 79.60 μs 160.79 μs - - 1.59 KB
OpenConnection True False 2,318.6 μs 7.94 μs 14.91 μs 15.9000 0.1000 141.87 KB
OpenAsyncConnection True False 2,562.4 μs 7.69 μs 14.81 μs 17.9000 0.1333 181.72 KB
OpenConnection True True 160.0 μs 78.88 μs 159.34 μs - - 1.4 KB
OpenAsyncConnection True True 158.4 μs 78.06 μs 157.69 μs - - 1.59 KB

Post changes

Method MARS Pooling Mean Error StdDev Completed Work Items Lock Contentions Allocated
OpenConnection False False 2,157.4 μs 61.82 μs 120.57 μs 10.0000 - 115.43 KB
OpenAsyncConnection False False 2,197.0 μs 14.70 μs 27.26 μs 12.0000 0.1000 163.27 KB
OpenConnection False True 157.1 μs 77.06 μs 155.66 μs - - 1.4 KB
OpenAsyncConnection False True 159.2 μs 78.31 μs 158.20 μs - - 1.59 KB
OpenConnection True False 2,060.5 μs 11.54 μs 21.95 μs 13.8000 0.0333 141.14 KB
OpenAsyncConnection True False 2,288.7 μs 8.37 μs 16.72 μs 15.8000 0.0333 182.08 KB
OpenConnection True True 156.3 μs 77.14 μs 155.83 μs - - 1.4 KB
OpenAsyncConnection True True 156.2 μs 76.96 μs 155.46 μs - - 1.59 KB

I have the other benchmarks across local instances (named and default), and Azure SQL instances (within the same region and in a distant region), but they're broadly similar: small memory usage improvements when connecting, and minor variations in execution times which are within the specified margin of error. I've not included these because of volume: there's one for every combination of [baseline/updated, multi-subnet failover enabled/disabled, managed/native SNI] on each of the SQL instances.

One pre-existing issue with a resource string missing a space.
Several instances where the SQL DNS cache property names were hardcoded and accessed via reflection.
The current header is now an instance-level variable once again. Not sure why this would make a difference, but without this, some of the data bytes are left in the packet buffer and interpreted as the header bytes of the next packet.
…gle IP address.

Correcting a deadlock in MARS connection tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants