Named query parameters implemented #71

anton7r · 2021-12-03T21:07:32Z

I've integrated my pgxe library into scany.

~~Note:~~
~~What this PR still lacks however is the possiblity to do named queries that do not return anything (NamedExec()).~~

~~Also test cases for the new api methods need to be added and the documentation needs to be improved.~~

~~The query parser as of right now only supports : as the delimeter, if we want to support more delimeters i need to update the code~~

anton7r · 2021-12-04T18:10:11Z

By caching the index of a field we make querying values by field name faster, but however where we only query one field the optimizations wont help performance as seen in the _2 suffixed benchmarks

_3 suffixed benchmarks query 3 fields
_4 suffixed benchmarks query 3 fields 5 times

The reason why we don't want to use reflect.Value.FieldByName is because it is O(n^2) and the IndexMap implementations are roughly O(n)

goos: windows
goarch: amd64
pkg: github.com/georgysavva/scany/queryparser/utils
cpu: AMD Ryzen 7 2700X Eight-Core Processor         
BenchmarkGetNamedField-16      	 6453988	       184.1 ns/op	      32 B/op	       3 allocs/op
BenchmarkIndexMap-16           	 3876772	       309.1 ns/op	      88 B/op	       3 allocs/op
BenchmarkIndexMapV2-16         	 4150629	       291.5 ns/op	      88 B/op	       3 allocs/op
BenchmarkGetNamedField_2-16    	 6405780	       186.7 ns/op	      32 B/op	       3 allocs/op
BenchmarkIndexMap_2-16         	 3698179	       316.2 ns/op	      88 B/op	       3 allocs/op
BenchmarkIndexMapV2_2-16       	 4254351	       283.4 ns/op	      88 B/op	       3 allocs/op
BenchmarkGetNamedField_3-16    	 2653401	       451.5 ns/op	      64 B/op	       7 allocs/op
BenchmarkIndexMap_3-16         	 2809712	       463.2 ns/op	      96 B/op	       5 allocs/op
BenchmarkIndexMapV2_3-16       	 2951157	       404.9 ns/op	      96 B/op	       5 allocs/op
BenchmarkGetNamedField_4-16    	  545352	      2290 ns/op	     320 B/op	      35 allocs/op
BenchmarkIndexMap_4-16         	  922863	      1353 ns/op	     224 B/op	      21 allocs/op
BenchmarkIndexMapV2_4-16       	  922898	      1297 ns/op	     224 B/op	      21 allocs/op
PASS
ok  	github.com/georgysavva/scany/queryparser/utils	17.691s

anton7r · 2021-12-04T18:41:36Z

Performance is no longer a concern

goos: windows
goarch: amd64
pkg: github.com/georgysavva/scany/queryparser/utils
cpu: AMD Ryzen 7 2700X Eight-Core Processor         
BenchmarkGetNamedField-16      	 6281607	       187.7 ns/op	      32 B/op	       3 allocs/op
BenchmarkIndexMap-16           	 3786922	       352.3 ns/op	      88 B/op	       3 allocs/op
BenchmarkIndexMapV2-16         	 6816387	       175.7 ns/op	      24 B/op	       2 allocs/op
BenchmarkGetNamedField_2-16    	 6460810	       188.0 ns/op	      32 B/op	       3 allocs/op
BenchmarkIndexMap_2-16         	 3744350	       323.1 ns/op	      88 B/op	       3 allocs/op
BenchmarkIndexMapV2_2-16       	 6756135	       180.1 ns/op	      24 B/op	       2 allocs/op
BenchmarkGetNamedField_3-16    	 2528263	       473.2 ns/op	      64 B/op	       7 allocs/op
BenchmarkIndexMap_3-16         	 2742052	       437.9 ns/op	      96 B/op	       5 allocs/op
BenchmarkIndexMapV2_3-16       	 4174024	       289.0 ns/op	      32 B/op	       4 allocs/op
BenchmarkGetNamedField_4-16    	  524474	      2358 ns/op	     320 B/op	      35 allocs/op
BenchmarkIndexMap_4-16         	  844498	      1393 ns/op	     224 B/op	      21 allocs/op
BenchmarkIndexMapV2_4-16       	 1000000	      1175 ns/op	     160 B/op	      20 allocs/op
PASS
ok  	github.com/georgysavva/scany/queryparser/utils	17.387s

georgysavva · 2021-12-10T10:00:10Z

Hello! Thank you for your effort and for drafting this PR. I've gone through the code and it seems to me that you are inserting data inside the SQL query text directly, so you are escaping the data yourself. It might be dangerous (SQL injections) and error-prune. We shouldn't take this job from the underlying database library.

Instead, the new named-query logic should compile the SQL query in a different way:

"INSERT INTO users (id, name, age) (:id, :name, :age)", userStruct ---> "INSERT INTO users (id, name, age) ($1, $2, $3)", userID, userName, userAge

So it replaces our name placeholders with the default placeholders for the database library and passes data from the struct argument as corresponding positional arguments to the database library function call.

Here is an example of how it's implemented for the sqlx library: https://github.com/jmoiron/sqlx/blob/a62bc6088664eb854c3d5ebbd08a1ffd59662ef1/named.go#L441

anton7r · 2021-12-10T18:02:56Z

sure, when i wrote the lexer 6 months ago i just wanted to learn how they work.

It is not only error prone, but it also doesn't support every data type.

anton7r · 2021-12-10T19:01:57Z

having looked at the way sqlx has done it i noticed that it is not done by the most optimal way as it converts the entire struct into a map[string]interface{} which would be slower on larger structs.

instead of doing exactly what sqlx has done i would probably take an approach of making a map which stores the indecies of already mapped fields where the field name is the key and the value is the index where the field is mapped to.

anton7r · 2021-12-10T19:10:33Z

allocating that much memory for the strings still sounds kinda bad in my opinion

maybe i could use the "ordinal" value of the field as the key which would reduce memory usage, but at the same time would maybe increase the processing time a bit.

anton7r · 2021-12-11T19:21:54Z

@georgysavva should the NamedGet and NamedSelect actually be GetNamed and SelectNamed since it would probably be easier when autocompleting in IDEs?

georgysavva · 2021-12-13T06:01:55Z

Yes. we don't need to stick to the sqlx implementation completely, I was talking more about the overall design.
Yes, GetNamed, SelectNamed look better for me.
Few more observations:

Now you hardcode $ sign on dbscan package level, but this sign is only valid for PostgreSQL database so it should be set at the pgxscan package. For sqlscan we don't know which database is actually been used so it should be configurable with no default.
I see that named parameters in SQL query are in camel case for struct fields. I think they should look as column names in the SELECT statement. For example:

type User struct {
    FirstName string
    Post struct {
        Title string
    }
}

For such struct Valid SQL queries using scany would be:

SELECT first_name as "first_name", post as "post.title" FROM users
----
INSERT INTO users(first_name, post) VALUES( :first_name, :post.title)

The default columns scany would look for are: "first_name", "post.title". The same names parameters should be used in named queries (snake cases and separated by a dot for nested structs). More specifically I think we should reuse the getColumnToFieldIndexMap() function to map struct fields to columns (name parameters in our case).

anton7r · 2021-12-13T17:13:09Z

If we do not give the sqlscan interface a default value for the compile delimeter. would we then need to get rid of the package level api that it has?

georgysavva · 2021-12-14T06:13:50Z

Sorry, I don't quite understand your message. Could you please elaborate?

The way I see it:
In the dbscan package, we have an optional field in the API object for the delimiter value which isn't set by default and if someone tries to use Named queries logic with no delimiter set, dbscan should return an error.
In the sqlscan package, we don't set any default delimiter to the dbscan API object.
In the pgxscan package, we set '$' as the delimiter to the dbscan API object by default.

anton7r · 2021-12-15T13:43:50Z

sqlscan/sqlscan.go

-// DefaultAPI is the default instance of API that is wrapped around the dbscan.DefaultAPI instance.
-var DefaultAPI = NewAPI(dbscan.DefaultAPI)
+// DefaultAPI is the default instance of API that is wrapped around the dbscan.API instance.
+var DefaultAPI = NewAPI(dbscan.NewAPI(dbscan.WithLexer(lexer.NewLexer(':', '$')))) //this or no default value at all


I mean that if we don't define the DefaultAPI with default parameters...
(1/2)

You define DefaultAPI as before, you just don't pass any delimiter sign '$' (or the whole lexer object) to dbscan

then we need to remove the package level SelectNamed, GetNamed, ExecNamed, because if we wouldn't it would lead to unexpected behaviour and in my opinion it does no look very good when there is a partly exposed DefaultAPI so it would be better to just get rid of the default api and the package level stuff because it would reduce confusion between the two interfaces (pgxscan and sqlscan).

i.e. when the user would call the sqlscan.SelectNamed it would probably run into an null pointer reference because the lexer is not defined.

I see. Yes, you are right. Since there is no default delimiter for sqlscan we can't offer package-level functions like SelectNamed(), etc. And yeah, let's not add them for the pgxscan package either, for consistency reasons.
But I would like to keep package level exported DefaultAPI object since some users might already depend on it and it's convenient to have it. We just need to document that one cannot call *Named() methods on DefaultAPI in the sqlscan package.

sqlscan/sqlscan.go

georgysavva · 2021-12-20T07:44:08Z

Few more observations:

The delimiter isn't always a constant character. So substitution logic must be smarter. Consider: for MySQL it's always '?', but for PorstresSQL it's: '$1', '$2' and etc. See sqlx code for details https://github.com/jmoiron/sqlx/blob/a62bc6088664eb854c3d5ebbd08a1ffd59662ef1/named.go#L373-L392
I think all this Named query compile logic should live in dbscan API object. No need to introduce a separate package like lexer and internal utils. Just put in a separate named.go file inside the dbscan package.
In case of named parameters for structs we should use existing getColumnToFieldIndexMap() function.
In pgxscan/sqlscan packages we should also introduce the fourth named method: function QueryNamed(...) (pgx/sql.Rows, error) because user might want to benefit from our named parameters logic, but want to handle rows on herself.

anton7r · 2021-12-20T23:41:17Z

i dont think using the getColumnToFieldIndexMap() with the same casing as the database table names etc... because if you have the named params camel cased it increases readability in my opinion.

INSERT INTO users(first_name, post) VALUES( :first_name, :post.title)

INSERT INTO users(first_name, post) VALUES( :FirstName, :Post.Title)

But i can for sure use the getColumnFieldIndexMap if i just add the camel cased option to it if it doesn't have it

georgysavva · 2021-12-21T07:13:23Z

The point here is we should have the same name mapping in both ways SQL -> Go, Go -> SQL. It is just simpler for users to reason about names that way. The name you used for the column alias to scan data from the database is the same you use to pass data into the database. Consider this:

type User struct {
    ID string
    FullName string
    ...
}

type Post struct {
    ID string
    UserID string
    Title string
    ...
}

type UserPost struct {
   User *User
   Post *Post
}

up:= &UserPost{User: &User{FullName: "foo"}, Post: &Post{Title: "bar"}}
GetNamed(ctx, db, up /* dst */, `SELECT u.full_name AS "user.full_name", p.title AS "post.title", ... FROM users u JOIN posts p ON u.id = p.user_id WHERE u.full_name = :user.full_name AND p.title = :post.title`, up /* arg */)

The struct always translates to the same names: "user.full_name", "post.title", ... . And you use them for both column aliases and named parameters.

anton7r · 2021-12-21T18:43:27Z

I undestand your point but in this case simple may not be the best solution if we want scany to be a high productivity tool. We also do not want to over complicate the library as it also brings its own new set of problems.

So should we then add an option to change the named query casing which would be by default the way that you want and then the users can configure it how ever they like? We could also ask the users which way they like it best.

anton7r · 2021-12-21T21:11:19Z

also it seems like getColumnToFieldIndexMap() may need some caching as it seems to do the expensive reflect field name computations

georgysavva · 2021-12-22T05:35:19Z

Users already have two ways to override the default mapping. Firstly, via struct tags. Secondly, providing a custom option WithFieldNameMapper() to API object to override the default one (snake case) to camel case or whatever they want. And those overriding methods are applied to both cases: scanning into a destination and named parameters.
sqlx library uses exactly the same approach, they expect the same mapping for both, destination type scanning and named parameters. So it means that users that come from sqlx will find our implementation familiar.
You are right, getColumnToFieldIndexMap() needs some caching. It's a separate task that's on my list. Now when we have the global API object it can be used to store the cached data.

anton7r · 2021-12-22T14:31:23Z

Oh yeah that seems to be the case with sqlx i just have forgotten about it since i have not used it in a while.
The caching is already done in my implementation using cmap, i could check wether or not it is faster than go's native sync.Map as they serve a very similar purpose and it would be preferable to use native implementations in order to reduce the number of 3rd party dependencies.

anton7r · 2021-12-22T15:43:53Z

as i have seperated lexer.Compile() and the function that binds the field names to args it currently would be pretty simple to add an option to prepare queries before hand. But if we don't want to add prepared queries we should combine them to reduce unnecessary allocations.

anton7r · 2021-12-22T20:29:26Z

var typeMap cmap.Cmap = cmap.Cmap{}

//getStructIndex tries to find the struct from the typeMap if it is not found it creates it
func getStructIndex(e reflect.Value) structIndex {
	t := e.Type()

	val, found := typeMap.Load(t)
	if found {
		return val.(structIndex)
	}

	return indexStruct(t)
}

//indexStruct creates an index of the struct
func indexStruct(t reflect.Type) structIndex {
	indexMap := structIndex{
		mappings: map[string]int{},
	}

	for i := 0; i < t.NumField(); i++ {
		indexMap.mappings[t.Field(i).Name] = i
	}

	typeMap.Store(t, indexMap)
	return indexMap
}

this is basically how the caching of struct reflection works in the getStructIndex() which works pretty similarly compared to getColumnToFieldIndexMap()

anton7r · 2021-12-22T20:50:56Z

why does getColumnToFieldIndexMap() return an integer array?

georgysavva · 2021-12-23T11:41:42Z

The standard library also has a concurrent map: sync.Map{} you can check that one as well.

The caching is already done in my implementation using cmap, i could check wether or not it is faster than go's native sync.Map as they serve a very similar purpose and it would be preferable to use native implementations in order to reduce the number of 3rd party dependencies.

Yeah, adding prepared named queries would be a desirable thing, go ahead!
getColumnToFieldIndexMap() returns a map where values are indexes of struct fields. Every struct field has an index and since we can have nested structs and we visit all of them - we get an array to uniquely identify and find the field. Consider example:

type User struct {
    Name string
    Post *Post
}

type Post struct {
    Title string
    Date time.Time
}

getColumnToFieldIndexMap(&User{}) 
// returns map:
// "name" -> [0]
// "post" -> [1]
// "post.title" [1, 0]
// "post.date" -> [1, 1]

anton7r · 2021-12-23T23:02:39Z

prepared queries still need tests and documentation other than that it should function as it should

anton7r · 2022-01-22T19:49:19Z

Yeah it is safe to say that we should go with the sync,Map from the standard library as the docs for it pretty clearly state that it is generally thought to be used in this sort of scenario.

https://pkg.go.dev/sync#Map

fuadarradhi · 2022-02-16T01:58:51Z

any updates?

georgysavva · 2022-02-16T06:18:40Z

Hi @fuadarradhi. I am waiting for the final notice from @anton7r that PR is done and ready for review. It seems to be still WIP.

anton7r · 2022-02-19T20:16:56Z

This is ready feature wise. But some of my chnages still may need better documentation and tests.

georgysavva · 2022-02-21T05:37:53Z

@anton7r understood. I will review it soon. Thank you for your work!

arp242 · 2022-04-01T18:10:59Z

Similar PR for sqlx, which uses https://github.com/muir/sqltoken: jmoiron/sqlx#775

Personally I'm not a huge fan of having a separate ExecNamed() etc. interface for this; it's a lot of "API clutter", and would prefer if the regular Exec() would work with named parameters. I've been using sqltoken in my own library for a while and the performance impact is very small (though not zero). Could maybe add a knob to enable/disable this, if desired.

More importantly, will your current implementation work with things like:

-- :site refers to the ID
select * from sites where id = :site

Will it replace the :site in the comment? I didn't run the code, but reading through it, I think it might replace :site here? This is an ever bigger issues in strings (select * from sites where id = ':site') or if you use some word for which there isn't a parameter (resulting in an error).

I also spent quite some time trying to find failures in sqltoken and couldn't find any, so I think that's probably a better way to go(?) It would also allow things like rebinding ? placeholders to $1.

georgysavva · 2022-04-20T03:36:54Z

Hello! @anton7r Thank you for your work and such a huge contribution. I am sorry, I need to put this feature on hold for now. I don't have enough time to dig deeper for review and etc.

anton7r · 2022-07-28T17:54:40Z

Similar PR for sqlx, which uses https://github.com/muir/sqltoken: jmoiron/sqlx#775

Personally I'm not a huge fan of having a separate ExecNamed() etc. interface for this; it's a lot of "API clutter", and would prefer if the regular Exec() would work with named parameters. I've been using sqltoken in my own library for a while and the performance impact is very small (though not zero). Could maybe add a knob to enable/disable this, if desired.

More importantly, will your current implementation work with things like:
-- :site refers to the ID
select * from sites where id = :site
Will it replace the :site in the comment? I didn't run the code, but reading through it, I think it might replace :site here? This is an ever bigger issues in strings (select * from sites where id = ':site') or if you use some word for which there isn't a parameter (resulting in an error).

I also spent quite some time trying to find failures in sqltoken and couldn't find any, so I think that's probably a better way to go(?) It would also allow things like rebinding ? placeholders to $1.

Indeed it will replace the :site in the comment however i'd like to know in which circumstances it becomes beneficial to leave the comments inside the query string, when those could be left outside of the string.

The sqltoken library could be definitely used, but i would have to dive deeper in to it in order to know more about its pros and cons. I would prefer that the sqltoken library would also have a callback function in order to avoid (in this case) unnecessary array creation and modification operations.

anton7r · 2022-07-28T20:58:42Z

**Managed to fix the git history on this one

Terminator637 · 2022-09-08T08:52:44Z

Any updates here?

georgysavva · 2022-09-09T14:58:16Z

This feature requires a thorough review and design decisions. Unfortunately, I don't have time at this moment to do that. I will get back to it as soon as I can. I hope you understand!

anton7r · 2022-12-23T22:25:44Z

After all it may or may not be a good idea to try to add named query parameters to scany as scany has focused on scanning the mapping the query results and on that regard being lightweight and easily extendable. If we were to add named query parameters it could possibly hinder the extendability of scany and likely the maintainability which wouldn't be desirable. Especially since the current implementation of named query parameters is pretty opionated and may not be able to support some specific database drivers even though it is customisable.

I'd also like to hear other opinions on the matter. But right now it looks like the best option here would be to create another library that is just using scany for the data mapping part and a custom implementation for the named queries so that it could be more specific towards supporting postgresql/mysql etc...

georgysavva · 2022-12-25T18:50:16Z

@anton7r, thank you so much for such an excellent summary. I am aligned with everything you've written. I think it's the best course for now.

Thank you for your effort and for trying to tackle that.

anton7r mentioned this pull request Dec 3, 2021

Thank you for this library #12

Closed

anton7r marked this pull request as ready for review December 11, 2021 21:32

anton7r commented Dec 15, 2021

View reviewed changes

anton7r requested a review from georgysavva March 19, 2022 10:59

anton7r force-pushed the master branch from 8df95e9 to 17bc2b8 Compare July 28, 2022 21:11

anton7r force-pushed the master branch from 17bc2b8 to 1dbe101 Compare August 8, 2022 18:02

anton7r closed this Mar 8, 2023

anton7r force-pushed the master branch from 1dbe101 to 958ed3c Compare March 8, 2023 17:43

Named query parameters implemented #71

Named query parameters implemented #71

Conversation

anton7r commented Dec 3, 2021 • edited

anton7r commented Dec 4, 2021 • edited

anton7r commented Dec 4, 2021

georgysavva commented Dec 10, 2021

anton7r commented Dec 10, 2021

anton7r commented Dec 10, 2021

anton7r commented Dec 10, 2021

anton7r commented Dec 11, 2021

georgysavva commented Dec 13, 2021

anton7r commented Dec 13, 2021 • edited

georgysavva commented Dec 14, 2021

anton7r Dec 15, 2021

Choose a reason for hiding this comment

georgysavva Dec 17, 2021 • edited

Choose a reason for hiding this comment

anton7r Dec 18, 2021

Choose a reason for hiding this comment

anton7r Dec 18, 2021

Choose a reason for hiding this comment

georgysavva Dec 20, 2021

Choose a reason for hiding this comment

georgysavva commented Dec 20, 2021

anton7r commented Dec 20, 2021

georgysavva commented Dec 21, 2021

anton7r commented Dec 21, 2021

anton7r commented Dec 21, 2021

georgysavva commented Dec 22, 2021 • edited

anton7r commented Dec 22, 2021

anton7r commented Dec 22, 2021

anton7r commented Dec 22, 2021

anton7r commented Dec 22, 2021

georgysavva commented Dec 23, 2021

anton7r commented Dec 23, 2021

anton7r commented Jan 22, 2022 • edited

fuadarradhi commented Feb 16, 2022

georgysavva commented Feb 16, 2022

anton7r commented Feb 19, 2022

georgysavva commented Feb 21, 2022

arp242 commented Apr 1, 2022

georgysavva commented Apr 20, 2022

anton7r commented Jul 28, 2022

anton7r commented Jul 28, 2022 • edited

Terminator637 commented Sep 8, 2022

georgysavva commented Sep 9, 2022

anton7r commented Dec 23, 2022

georgysavva commented Dec 25, 2022

anton7r commented Dec 3, 2021 •

edited

anton7r commented Dec 4, 2021 •

edited

anton7r commented Dec 13, 2021 •

edited

georgysavva Dec 17, 2021 •

edited

georgysavva commented Dec 22, 2021 •

edited

anton7r commented Jan 22, 2022 •

edited

anton7r commented Jul 28, 2022 •

edited