sqlite3: reduce C to Go string conversions in SQLiteConn.{query,exec} #1133

charlievieth · 2023-02-07T15:23:32Z

NOTE: This PR is dependent on #1128 being merged

This commit fixes an issue where SQLiteConn.{exec,query} would use an
exponential amount of memory when processing a prepared statement that
consisted of multiple SQL statements ("stmt 1; stmt 2; ...").

Previously both exec/query used SQLiteConn.prepare() which converts the
"tail" pointer set by sqlite3_prepare_v2() back into a Go string, which
then has to be converted back to a C string for the next call to
prepare() (assuming there are multiple SQL statements in the provided
query).

This commit fixes this by changing both exec and query to use the
returned "tail" pointer for the next call to exec/query.

It also changes prepare() to use pointer arithmetic to calculate the
offset of the remaining "tail" portion of the query as a substring of
the original query. This saves a call to C.GoString() and an allocation.

Benchmarks:

goos: darwin
goarch: arm64
pkg: github.com/mattn/go-sqlite3
                            │ base.10.txt  │             new.10.txt              │
                            │    sec/op    │   sec/op     vs base                │
Suite/BenchmarkExec-10         1.351µ ± 1%   1.247µ ± 1%   -7.74% (p=0.000 n=10)
Suite/BenchmarkQuery-10        3.830µ ± 1%   3.558µ ± 1%   -7.11% (p=0.000 n=10)
Suite/BenchmarkParams-10       4.221µ ± 0%   4.228µ ± 1%        ~ (p=1.000 n=10)
Suite/BenchmarkStmt-10         2.906µ ± 1%   2.864µ ± 1%   -1.45% (p=0.001 n=10)
Suite/BenchmarkRows-10         149.1µ ± 4%   148.2µ ± 1%   -0.61% (p=0.023 n=10)
Suite/BenchmarkStmtRows-10     147.3µ ± 1%   145.6µ ± 0%   -1.16% (p=0.000 n=10)
Suite/BenchmarkExecStep-10    1898.9µ ± 3%   889.0µ ± 1%  -53.18% (p=0.000 n=10)
Suite/BenchmarkQueryStep-10   1848.0µ ± 1%   894.6µ ± 1%  -51.59% (p=0.000 n=10)
geomean                        38.56µ        31.30µ       -18.84%

                            │  base.10.txt   │               new.10.txt               │
                            │      B/op      │     B/op      vs base                  │
Suite/BenchmarkExec-10            184.0 ± 0%     176.0 ± 0%   -4.35% (p=0.000 n=10)
Suite/BenchmarkQuery-10           864.0 ± 0%     856.0 ± 0%   -0.93% (p=0.000 n=10)
Suite/BenchmarkParams-10        1.289Ki ± 0%   1.281Ki ± 0%   -0.61% (p=0.000 n=10)
Suite/BenchmarkStmt-10          1.078Ki ± 0%   1.078Ki ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkRows-10          34.45Ki ± 0%   34.45Ki ± 0%   -0.02% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      34.40Ki ± 0%   34.40Ki ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkExecStep-10    5334.61Ki ± 0%   70.41Ki ± 0%  -98.68% (p=0.000 n=10)
Suite/BenchmarkQueryStep-10    5397.4Ki ± 0%   133.2Ki ± 0%  -97.53% (p=0.000 n=10)
geomean                         17.06Ki        6.208Ki       -63.62%
¹ all samples are equal

                            │ base.10.txt  │              new.10.txt               │
                            │  allocs/op   │  allocs/op   vs base                  │
Suite/BenchmarkExec-10          13.00 ± 0%    12.00 ± 0%   -7.69% (p=0.000 n=10)
Suite/BenchmarkQuery-10         46.00 ± 0%    45.00 ± 0%   -2.17% (p=0.000 n=10)
Suite/BenchmarkParams-10        54.00 ± 0%    53.00 ± 0%   -1.85% (p=0.000 n=10)
Suite/BenchmarkStmt-10          49.00 ± 0%    49.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkRows-10         2.042k ± 0%   2.041k ± 0%   -0.05% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10     2.038k ± 0%   2.038k ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkExecStep-10    13.000k ± 0%   8.004k ± 0%  -38.43% (p=0.000 n=10)
Suite/BenchmarkQueryStep-10   11.013k ± 0%   6.017k ± 0%  -45.36% (p=0.000 n=10)
geomean                         418.6         359.8       -14.04%
¹ all samples are equal

rittneje · 2023-02-11T22:19:23Z

sqlite3.go

@@ -79,6 +80,16 @@ _sqlite3_exec(sqlite3* db, const char* pcmd, long long* rowid, long long* change
  return rv;
 }

+static const char *
+_trim_leading_spaces(const char *str) {


Remove, trimming leading spaces is not required. (I don't know why the old code did it, but there's no point as SQLite itself ignores them.)
I believe that the general condition for knowing we are done is that the returned stmt is null, and the tail is the empty string.

If pzTail is not NULL then *pzTail is made to point to the first byte past the end of the first SQL statement in zSql. These routines only compile the first statement in zSql, so *pzTail is left pointing to what remains uncompiled.

From: https://www.sqlite.org/c3ref/prepare.html

We actually do need to trim spaces since we use *pzTail == 0 to check if the entire query has been consumed and trailing whitespace will cause an infinite loop (omitting this will cause the tests to fail / loop forever and timeout).

Technically, we only need to trim trailing spaces, but it's simpler to trim leading spaces from pzTail if done in C. This could be done in Go (pquery := C.CString(strings.TrimSpace(query))) or C,but I figured it would be best to locate this logic nearest to where it's needed instead of having to replicate it wherever _sqlite3_prepare_v2_internal() is called.

I think this table may help illustrate:

zSql ppStmt pzTail

"" null ""

" " null ""

"/*abc*/" null ""

"--abc" null ""

";" null ""

"; VALUES(1) non-null ""

"; VALUES(1); VALUES(2)" non-null " VALUES(2)"

" ;;; VALUES(1)" non-null ""

" VALUES(1)" non-null ""

"VALUES (1) " non-null ""

"VALUES (1) /*abc*/" non-null ""

"VALUES (1) --abc" non-null ""

"VALUES (1); " non-null " "

"VALUES (1); /*abc*/" non-null " /*abc*/"

"VALUES (1); --abc" non-null " --abc"

"VALUES (1); VALUES (2); VALUES (3);" non-null " VALUES (2); VALUES (3);"

In other words, if sqlite3_prepare_v2 returns a null sqlite3_stmt (with SQLITE_OK), then it means the input string did not contain any statements at all, but rather only some combination of whitespace, comments, and semicolons. This is supported by the docs.

If pzTail is not NULL then *pzTail is made to point to the first byte past the end of the first SQL statement in zSql. These routines only compile the first statement in zSql, so *pzTail is left pointing to what remains uncompiled. [...] If the input text contains no SQL (if the input is an empty string or a comment) then *ppStmt is set to NULL.

Consequently, if the sqlite3_stmt is null, that implies that pzTail is the empty string. So rather than attempting to trim whitespace (which will not account for comments or superfluous semicolons), we should just let sqlite3_prepare_v2 do its job. If it returns a null sqlite3_stmt, then we are definitely done. And as an optimization (or potential alternative), at the top of the loop, we could also break if the current string is empty.

Thanks and I'll look into this, but the tests do fail if the white space is not trimmed.

@rittneje I was able to finally look into this and the issue is deeper than just trimming whitespace. With the current logic (i.e. master) any query with a trailing comment will result in an infinite loop.

This commit charlievieth@6bc917b modifies TestQueryer to reproduce the error.

patch

--- a/sqlite3_test.go +++ b/sqlite3_test.go @@ -1115,7 +1115,7 @@ func TestQueryer(t *testing.T) { t.Error("Failed to call db.Exec:", err) } rows, err := db.Query(` - select id from foo order by id; + select id from foo order by id; -- #1 harmless comment `) if err != nil { t.Error("Failed to call db.Query:", err) @@ -1128,10 +1128,13 @@ func TestQueryer(t *testing.T) { if err != nil { t.Error("Failed to db.Query:", err) } - if id != n + 1 { + if id != n+1 { t.Error("Failed to db.Query: not matched results") } n = n + 1 + if n > 3 { + t.Fatal("Too many errors:", n) + } } if err := rows.Err(); err != nil { t.Errorf("Post-scan failed: %v\n", err)

Fixing this is likely outside the scope of this PR and I'm not sure I'll have the cycles to address it, but will let you know if that changes. Additionally, let me know if I should create an issue for this.

The issue appears to be due to the code not checking if sqlite3_prepare_v2 sets its sqlite3_stmt** argument to NULL because the input text contains no SQL, which can occur if there are comments after a valid SQL statement (see: the sqlite3_prepare_v2 docs).

A possible solution would be to put more of this logic into the returned SQLiteRows struct and let it work through the provided arguments and statements (this would also allow for multiple SQL statements with args in a single query - it looks like pgx does this).

Below is a simple example using the sqlite3 CLI to show that leading/trailing comments are allowed around a SQL QUERY statement.

#!/bin/sh set -eu DB="$(mktemp)" # test database TMP1="$(mktemp)" # want TMP2="$(mktemp)" # got trap 'rm -f -- ${DB} ${TMP1} ${TMP2}' EXIT # init/seed table echo ' drop table if exists foo; create table foo (id integer); insert into foo(id) values(1); insert into foo(id) values(2); insert into foo(id) values(3);' | sqlite3 "${DB}" # desired output echo '1 2 3' >"${TMP1}" # test "SELECT" statement with leading/trailing comments echo ' -- FOO select id from foo order by id; -- BAR /* BAZ */ ' | sqlite3 "${DB}" >"${TMP2}" if ! diff --color "${TMP1}" "${TMP2}"; then echo 'FAIL' exit 1 fi echo 'PASS'

Additionally, with the current logic, if two SELECT statements are provided then the returned Rows are only for the second statement ("select 1; select 2" returns only 2).

I believe this is also the root cause of: #950

@rittneje just pushed 5880fdc which addresses the above 2 bugs. This might be breaking change since in the case of SELECT 1; SELECT 2; an error is returned instead of SELECT 2; being the query executed.

Also, great call out on not needing to trim the query strings - these bugs would not have been identified without it!

rittneje · 2023-02-11T22:26:12Z

sqlite3.go

-		s, err := c.prepare(ctx, query)
-		if err != nil {
-			return nil, err
+		*s = SQLiteStmt{c: c, cls: true} // reset


You never set the finalizer, which means we could end up leaking memory.
I think it might be best not to attempt to be clever here and revert these changes. SQLiteStmt itself is pretty small (a few bytes), and overwriting something with a mutex fields is potentially dangerous.

You never set the finalizer, which means we could end up leaking memory.

runtime.SetFinalizer was never needed here. The only way for this to leak a prepared statement would be for this code to panic thus preventing s.finalize() from being called. But in that case the code is unreasonable and the program should exit. Additionally, there is some overhead to runtime.SetFinalizer.

I think it might be best not to attempt to be clever here and revert these changes. SQLiteStmt itself is pretty small (a few bytes), and overwriting something with a mutex fields is potentially dangerous.

Fair and the number of allocations only really add up when the query contains multiple SQL statements (see below). The overwrite here is safe because we never copy the SQLiteStmt struct by value (which go vet via golang/tools/go/analysis/passes/copylock ).

Happy to change if you feel strongly about this.

Comparison between reusing the SQLiteStmt and initializing one for each iteration.

benchmark old ns/op new ns/op delta BenchmarkSuite/BenchmarkExecStep-10 1146170 1171202 +2.18% benchmark old allocs new allocs delta BenchmarkSuite/BenchmarkExecStep-10 8004 9003 +12.48% benchmark old bytes new bytes delta BenchmarkSuite/BenchmarkExecStep-10 72104 120059 +66.51%

But in that case the code is unreasonable and the program should exit.

We have no control over what the caller chooses to do. They could have a deferred recover(), even if it is ill-advised, and we should do our best to handle it.

codecov-commenter · 2023-02-14T14:12:49Z

Codecov Report

Patch coverage: 70.90% and project coverage change: +0.68 🎉

Comparison is base (edc3bb6) 46.72% compared to head (1c1a89a) 47.40%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1133      +/-   ##
==========================================
+ Coverage   46.72%   47.40%   +0.68%     
==========================================
  Files          12       12              
  Lines        1526     1542      +16     
==========================================
+ Hits          713      731      +18     
  Misses        671      671              
+ Partials      142      140       -2

Impacted Files	Coverage Δ
sqlite3.go	`53.89% <70.90%> (+1.03%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

This commit fixes an issue where SQLiteConn.{exec,query} would use an exponential amount of memory when processing a prepared statement that consisted of multiple SQL statements ("stmt 1; stmt 2; ..."). Previously both exec/query used SQLiteConn.prepare() which converts the "tail" pointer set by sqlite3_prepare_v2() back into a Go string, which then has to be converted back to a C string for the next call to prepare() (assuming there are multiple SQL statements in the provided query). This commit fixes this by changing both exec and query to use the returned "tail" pointer for the next call to exec/query. It also changes prepare() to use pointer arithmetic to calculate the offset of the remaining "tail" portion of the query as a substring of the original query. This saves a call to C.GoString() and an allocation. Benchmarks: ``` goos: darwin goarch: arm64 pkg: github.com/mattn/go-sqlite3 │ base.10.txt │ new.10.txt │ │ sec/op │ sec/op vs base │ Suite/BenchmarkExec-10 1.351µ ± 1% 1.247µ ± 1% -7.74% (p=0.000 n=10) Suite/BenchmarkQuery-10 3.830µ ± 1% 3.558µ ± 1% -7.11% (p=0.000 n=10) Suite/BenchmarkParams-10 4.221µ ± 0% 4.228µ ± 1% ~ (p=1.000 n=10) Suite/BenchmarkStmt-10 2.906µ ± 1% 2.864µ ± 1% -1.45% (p=0.001 n=10) Suite/BenchmarkRows-10 149.1µ ± 4% 148.2µ ± 1% -0.61% (p=0.023 n=10) Suite/BenchmarkStmtRows-10 147.3µ ± 1% 145.6µ ± 0% -1.16% (p=0.000 n=10) Suite/BenchmarkExecStep-10 1898.9µ ± 3% 889.0µ ± 1% -53.18% (p=0.000 n=10) Suite/BenchmarkQueryStep-10 1848.0µ ± 1% 894.6µ ± 1% -51.59% (p=0.000 n=10) geomean 38.56µ 31.30µ -18.84% │ base.10.txt │ new.10.txt │ │ B/op │ B/op vs base │ Suite/BenchmarkExec-10 184.0 ± 0% 176.0 ± 0% -4.35% (p=0.000 n=10) Suite/BenchmarkQuery-10 864.0 ± 0% 856.0 ± 0% -0.93% (p=0.000 n=10) Suite/BenchmarkParams-10 1.289Ki ± 0% 1.281Ki ± 0% -0.61% (p=0.000 n=10) Suite/BenchmarkStmt-10 1.078Ki ± 0% 1.078Ki ± 0% ~ (p=1.000 n=10) ¹ Suite/BenchmarkRows-10 34.45Ki ± 0% 34.45Ki ± 0% -0.02% (p=0.000 n=10) Suite/BenchmarkStmtRows-10 34.40Ki ± 0% 34.40Ki ± 0% ~ (p=1.000 n=10) ¹ Suite/BenchmarkExecStep-10 5334.61Ki ± 0% 70.41Ki ± 0% -98.68% (p=0.000 n=10) Suite/BenchmarkQueryStep-10 5397.4Ki ± 0% 133.2Ki ± 0% -97.53% (p=0.000 n=10) geomean 17.06Ki 6.208Ki -63.62% ¹ all samples are equal │ base.10.txt │ new.10.txt │ │ allocs/op │ allocs/op vs base │ Suite/BenchmarkExec-10 13.00 ± 0% 12.00 ± 0% -7.69% (p=0.000 n=10) Suite/BenchmarkQuery-10 46.00 ± 0% 45.00 ± 0% -2.17% (p=0.000 n=10) Suite/BenchmarkParams-10 54.00 ± 0% 53.00 ± 0% -1.85% (p=0.000 n=10) Suite/BenchmarkStmt-10 49.00 ± 0% 49.00 ± 0% ~ (p=1.000 n=10) ¹ Suite/BenchmarkRows-10 2.042k ± 0% 2.041k ± 0% -0.05% (p=0.000 n=10) Suite/BenchmarkStmtRows-10 2.038k ± 0% 2.038k ± 0% ~ (p=1.000 n=10) ¹ Suite/BenchmarkExecStep-10 13.000k ± 0% 8.004k ± 0% -38.43% (p=0.000 n=10) Suite/BenchmarkQueryStep-10 11.013k ± 0% 6.017k ± 0% -45.36% (p=0.000 n=10) geomean 418.6 359.8 -14.04% ¹ all samples are equal ```

This commit fixes *SQLiteConn.Query to properly handle trailing comments after a SQL query statement. Previously, trailing comments could lead to an infinite loop. It also changes Query to error if the provided SQL statement contains multiple queries ("SELECT 1; SELECT 2;") - previously only the last query was executed ("SELECT 1; SELECT 2;" would yield only 2). This may be a breaking change as previously: Query consumed all of its args - despite only using the last query (Query now only uses the args required to satisfy the first query and errors if there is a mismatch); Query used only the last query and there may be code using this library that depends on this behavior. Personally, I believe the behavior introduced by this commit is correct and any code relying on the prior undocumented behavior incorrect, but it could still be a break.

charlievieth · 2023-04-20T04:33:09Z

It would also be great to get #1130 merged since it makes working on this library with go1.19+ much nicer.

rittneje reviewed Feb 11, 2023

View reviewed changes

charlievieth force-pushed the cev/multi-stmt-allocs branch from 58452f3 to e5d3f72 Compare February 14, 2023 14:07

rittneje mentioned this pull request Mar 23, 2023

Slow performance on bulk insert #1145

Open

charlievieth force-pushed the cev/multi-stmt-allocs branch 2 times, most recently from e430326 to c816a29 Compare April 20, 2023 04:31

charlievieth force-pushed the cev/multi-stmt-allocs branch from c816a29 to 1c1a89a Compare April 20, 2023 04:32

rittneje mentioned this pull request May 5, 2023

"SELECT 1;;" means (rc *SQLiteRows) Next() enters an infinite loop (SQLITE_MISUSE) #1161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlite3: reduce C to Go string conversions in SQLiteConn.{query,exec} #1133

sqlite3: reduce C to Go string conversions in SQLiteConn.{query,exec} #1133

charlievieth commented Feb 7, 2023 •

edited

rittneje Feb 11, 2023 •

edited

charlievieth Feb 14, 2023

rittneje Feb 15, 2023 •

edited

charlievieth Feb 15, 2023

charlievieth Apr 19, 2023 •

edited

charlievieth Apr 20, 2023

charlievieth Apr 20, 2023 •

edited

rittneje Feb 11, 2023

charlievieth Feb 14, 2023

rittneje Feb 15, 2023

codecov-commenter commented Feb 14, 2023 •

edited

charlievieth commented Apr 20, 2023

`zSql`	`ppStmt`	`pzTail`
`""`	null	`""`
`" "`	null	`""`
`"/abc/"`	null	`""`
`"--abc"`	null	`""`
`";"`	null	`""`
`"; VALUES(1)`	non-null	`""`
`"; VALUES(1); VALUES(2)"`	non-null	`" VALUES(2)"`
`" ;;; VALUES(1)"`	non-null	`""`
`" VALUES(1)"`	non-null	`""`
`"VALUES (1) "`	non-null	`""`
`"VALUES (1) /abc/"`	non-null	`""`
`"VALUES (1) --abc"`	non-null	`""`
`"VALUES (1); "`	non-null	`" "`
`"VALUES (1); /abc/"`	non-null	`" /abc/"`
`"VALUES (1); --abc"`	non-null	`" --abc"`
`"VALUES (1); VALUES (2); VALUES (3);"`	non-null	`" VALUES (2); VALUES (3);"`

sqlite3: reduce C to Go string conversions in SQLiteConn.{query,exec} #1133

Are you sure you want to change the base?

sqlite3: reduce C to Go string conversions in SQLiteConn.{query,exec} #1133

Conversation

charlievieth commented Feb 7, 2023 • edited

rittneje Feb 11, 2023 • edited

Choose a reason for hiding this comment

charlievieth Feb 14, 2023

Choose a reason for hiding this comment

rittneje Feb 15, 2023 • edited

Choose a reason for hiding this comment

charlievieth Feb 15, 2023

Choose a reason for hiding this comment

charlievieth Apr 19, 2023 • edited

Choose a reason for hiding this comment

charlievieth Apr 20, 2023

Choose a reason for hiding this comment

charlievieth Apr 20, 2023 • edited

Choose a reason for hiding this comment

rittneje Feb 11, 2023

Choose a reason for hiding this comment

charlievieth Feb 14, 2023

Choose a reason for hiding this comment

rittneje Feb 15, 2023

Choose a reason for hiding this comment

codecov-commenter commented Feb 14, 2023 • edited

Codecov Report

charlievieth commented Apr 20, 2023

charlievieth commented Feb 7, 2023 •

edited

rittneje Feb 11, 2023 •

edited

rittneje Feb 15, 2023 •

edited

charlievieth Apr 19, 2023 •

edited

charlievieth Apr 20, 2023 •

edited

codecov-commenter commented Feb 14, 2023 •

edited