Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool commands #1590

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Pool commands #1590

wants to merge 1 commit into from

Conversation

benaadams
Copy link
Contributor

Rather than creating them per request

@sebastienros
Copy link
Member

No change on fortunes, so tried on updates:

db updates-baseline updates-pool
CPU Usage (%) 96 97 +1.04%
Raw CPU Usage (%) 2,698.85 2,706.41 +0.28%
Working Set (MB) 524 524 0.00%
Build Time (ms) 1,856 1,762 -5.06%
Start Time (ms) 326 322 -1.23%
application updates-baseline updates-pool
CPU Usage (%) 70 70 0.00%
Raw CPU Usage (%) 1,952.18 1,960.89 +0.45%
Working Set (MB) 481 480 -0.21%
Build Time (ms) 4,750 4,964 +4.51%
Start Time (ms) 1,595 1,561 -2.13%
Published Size (KB) 98,006 98,007 +0.00%
load updates-baseline updates-pool
CPU Usage (%) 2 2 0.00%
Raw CPU Usage (%) 43.98 45.07 +2.49%
Working Set (MB) 7 7 0.00%
Build Time (ms) 3,316 3,317 +0.03%
Start Time (ms) 0 0
Published Size (KB) 76,389 76,389 0.00%
First Request (ms) 81 81 0.00%
Requests/sec 12,737 12,913 +1.38%
Requests 191,923 194,986 +1.60%
Mean latency (ms) 22.53 22.23 -1.33%
Max latency (ms) 383.46 285.45 -25.56%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 9.20 9.33 +1.41%
Latency 50th (ms) 17.54 17.24 -1.71%
Latency 75th (ms) 28.31 27.93 -1.34%
Latency 90th (ms) 43.44 43.06 -0.87%
Latency 99th (ms) 91.41 91.37 -0.04%

@sebastienros
Copy link
Member

@roji I think we have tested this approach by the past

@benaadams
Copy link
Contributor Author

benaadams commented Sep 29, 2020

@roji does the command reparse the cmd string (to convert to from ado => postgres format, e.g. @parm to $1) on each execute or only if it has changed?

@benaadams
Copy link
Contributor Author

Raised issue npgsql/npgsql#3200; it reduces allocations but reparses the query and generates a new one in pg format for each execution

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for disappearing here, personal issues.

Yeah, at some point I tested an approach of pooling the ADO.NET facade objects (we can do the same for NpgsqlConnection BTW), and didn't get convincing results - the way I understood it, the overhead of pooling synchronization isn't worth it for very light-weight objects. It's true that skipping the SQL parsing is a more convincing argument, but see npgsql/npgsql#3200 (comment) about improvements in 6.0 which should obviate all that anyway. Also, when I last benchmarked fortunes, SQL parsing was pretty negligible (SQL is very small and doesn't even contain parameters).

But we can definitely revisit all this.

@@ -237,7 +259,55 @@ private async Task<World> ReadSingleRow(NpgsqlCommand cmd)
}
}

private static readonly object[] _cacheKeys = Enumerable.Range(0, 10001).Select((i) => new CacheKey(i)).ToArray();
internal class SqlFortuneCommand : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these wrapper classes actually necessary, why not pool NpgsqlCommand directly? As far as I can tell they're mainly there to enqueue back when disposing, but that can just be done by the code using the command instead of disposing, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly so the callsites are just a using block and doon't have to worry about pooling e.g. its

using (var cmd = CreateReadCommand())
{
    cmd.Connection = db;
    // do something with pooled command
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Moving the pooling logic here and removing the wrappers might make a tiny bit of difference too.

@benaadams
Copy link
Contributor Author

SQL is very small and doesn't even contain parameters

All the SQL other than the fortunes benchmark contains parameters?

{
cmd.Connection = db;
var param = cmd.Parameter;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In npgsql/npgsql#3200 (comment) you say

The typical scenario of reexecuting the same command with the same SQL also does it on the same connection, in which case explicit preparation is the right choice and bypasses everything.

Since this is executed 20 times just changing the value, should it do a prepare here?

Copy link
Member

@roji roji Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should definitely give it a try... I don't pay enough attention as I should to the non-Fortunes benchmarks.

Automatic preparation still has the advantage of doing one less roundtrip - the first execution prepares and executes in the same go, where with explicit preparation they're split. But as you add more executions for a single initial prepare the impact of that goes down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye olde score multiplier for composite scores
image

Automatic preparation still has the advantage of doing one less roundtrip - the first execution prepared and executes in the same go, where with explicit preparation they're split. But as you add more executions for a single initial prepare the impact of that goes down.

Which is kinda why I want to bypass the parse; then its auto prep + parse once, rather than parse 20 times with auto prep?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll try to take some time this weekend to play around with bypassing the parse. In any case I need to run (and update) the benchmarks for the newest Npgsql 5.0.0 (just released preview1) - will look into the parsing thing as part of that.

@roji
Copy link
Member

roji commented Oct 15, 2020

All the SQL other than the fortunes benchmark contains parameters?

IIRC yeah (though whether a parameter exists or not doesn't matter that much for SQL parsing, just a little bit).

@benaadams
Copy link
Contributor Author

Every little helps; top query per second is 1.1M and aspnet is 472k

1,185,480 = 20 * 59,274
472,120 = 20 * 23,606

image

Base automatically changed from master to main March 8, 2021 18:29
@DamianEdwards
Copy link
Member

@benaadams @roji is this old PR still relevant?

@benaadams
Copy link
Contributor Author

@benaadams @roji is this old PR still relevant?

@roji said he was introducing a better way of doing it in a newer version of the driver; not sure of status of that

@roji
Copy link
Member

roji commented Mar 18, 2023

Things have changed quite a lot since this was done... Here are some thoughts.

Re SQL parsing, Npgsql 6.0 did introduce support for using (native) positional parameters and not parsing SQL (@p -> $1); for more details, see this write-up. This is automatically triggered when the command parameters are unnamed, but when there are no parameters, we do have to parse for backwards compat. IIRC the only benchmark that doesn't have parameters is fortunes, so we currently do parse there, which is unneeded overhead.

We do have an app context switch which allows disabling SQL parsing/rewriting globally in the application, even where there are no parameters. Since that switch is global and since Dapper and EF don't work with positional parameters, turning it on would break them. But unlike our implementation here, our TechEmpower platforms implementation has only raw (no Dapper/EF), so I'm doing that there (in TechEmpower/FrameworkBenchmarks#8005). Though we'll have to figure out what to do if we unify the implementations (#1815).

@roji
Copy link
Member

roji commented Mar 18, 2023

The 2nd thing here is pooling the ADO.NET objects (e.g. NpgsqlCommand). With the introduction of NpgsqlDataSource, we'll soon be switching to creating commands directly from that instead of instantiating connections (https://github.com/aspnet/Benchmarks/pull/1816/files#r1139401127):

// instead of:
using var connection = new NpgsqlConnection(...);
using var command = new NpgsqlCommand("SQL", connection);
// we'll just do this:
using var command = dataSource.CreateCommand();

(We can do this since the benchmarks don't involve any connection state (e.g. transactions), and this models multiplexing much more correctly, i.e. just execute a command against the database, without needing to care about which connection it goes throw or how. This will also likely bring some optimizations later.)

Currently, NpgsqlDataSource.CreateCommand() doesn't pool. If it's really beneficial to do so, this is an optimization we can and should implement inside Npgsql itself; opened npgsql/npgsql#5001 to track this.

/cc @vonzshik @NinoFloris

@roji
Copy link
Member

roji commented Mar 18, 2023

So beyond the above two things, I think this can be closed... We should definitely experiment with command pooling in Npgsql and see what happens.

@DamianEdwards
Copy link
Member

@roji

We do have an app context switch which allows disabling SQL parsing/rewriting globally

Are there plans to enable setting this via a property on NpgsqlCommand directly?

@roji
Copy link
Member

roji commented Mar 19, 2023

Not really... This whole thing is tricky and somewhat complex, and comes from the fact that someone in Npgsql's history decided to accept named parameter placeholders (@p) instead of positional ones ($1), and also to support batching by parsing the command's SQL for semicolons and splitting that to multiple batched statements at the wire protocol level. Neither of these are natively supported by PG, so Npgsql has to parse/rewrite in order to suppotr them (here's a writeup).

Now, if the command has parameters, we check whether they're named on or not (is DbParameter.ParameterName set). If they're unnamed, we take that as a signal that SQL parsing isn't required, i.e. positional parameters are being used. Since we already have a user gesture here (unset parameter name), we don't need an additional flag on NpgsqlCommand. Note that if you're using unnamed positional parameters, Npgsql also doesn't support semicolons for batching: you must instead use the DbBatch abstraction we introduced in Npgsql 6.0 (partially for this).

The only corner case is when there's no parameters at all. For this case, there's still the problem of semicolons inside the SQL (batching) - we must parse since there's no user gesture here. We could in theory introduce a bool property command just to skip parsing/rewriting in the no-parameters case, but that seems really excessive... I'd rather we made EF (and Dapper) compatible with positional parameters and DbBatch (yet another thing on my list...)

For now we can probably have #if FORTUNES or similar to enable this AppContext switch only when running fortunes...

(it's all been quite a long journey...)

@benaadams
Copy link
Contributor Author

The idea was that the parsed state would remain in the command if the command text didn't change; so reusing the command would skip the reparsing. Alas that isn't what happens and it reparses each time even though its the same command object with an unchanged command

@roji
Copy link
Member

roji commented Mar 19, 2023

@benaadams right. Parsing/rewriting was already disabled in all benchmarks with parameters, since we switched to positional placeholders a while ago; TechEmpower/FrameworkBenchmarks#8005 (comment) does that for Fortunes as well. So I don't think we need to worry about that part any more.

There's another kind of "parsing" which happens every time: to look up the PostgreSQL prepared statement in an internal dictionary. We're planning to add a data-source level API for "globally-available" prepared statements, that would skip this step (npgsql/npgsql#4509).

In the meantime, we could in theory pool commands and assume that a command rented from the pool already has the correct SQL. That assumption would hold only in a single-statement benchmark, so it seems a bit unrealistic/problematic.

@NinoFloris
Copy link
Contributor

NinoFloris commented Mar 19, 2023

@benaadams Any reuse that we might introduce on the DbDataSource will likely release all query specific resources during return.
I would personally like to see ADO.NET support for concurrent executions on a DbDataSource/connectionless command to be able to store one on a static (this would need new ExecuteReader methods accepting parameters as an argument and quite some rework in drivers, so it's not realistic any time soon).

If we really need fast pooling we could store relevant instances on the kestrel connection and pass them down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants