RFC: nullability (and other metadata) for input arguments / parameters to a query #1525

skabbes · 2022-04-02T16:27:11Z

What do you want to change?

Kudos
Before I start, I wanted to give a huge kudos for sqlc - I love it. As a backend developer who knows SQL very well, I fight every single ORM I have used trying to do any SQL more complicated than CRUD. Thank you for creating this.

Background
I use sqlc alongside Postgres, as sort of a foreign function interface to SQL. I manage a PostgreSQL database with about 100 different tables and working in Typescript and Go to support every stage of solar operations for Bright. We do not currently use sqlc at Bright - but I do use it personally, but I am evaluating it Bright as well.

The types of problems I work on are not always CRUD-y (though sqlc works really well if they are). And the challenge I am facing is that SQL is a little less-expressive than the target languages in some cases, and therefore using sqlc as a FFI is a little clunky. A quick glance through the some open issues, shows that "nullability" is one of those deficiencies. sqlc does its best to always "do the right thing" and that has gotten it very far indeed. But I think a simple change can give developers full control, and allow for sqlc to keep getting smarter (better interred mappings / types).

Say for example, instead of finding a really good GeoJSON library in my target language, I wanted to rely on PostGIS.

- name: WKBToGeoJSON :one
select ST_AsGeoJSON(ST_GeomFromWKB(sqlc.arg('wkb_bytes'))::text as output;

I would love for this Go code to be generated:

func (q *Queries) WKBToGeoJSON(ctx context.Context, db DBTX, wkbBytes []byte) (string, error) {
    // ...
}

Or perhaps a more common use case, partial updates

-- name: EditThing :exec
update thing
set
  name = coalesce(sqlc.arg('name'), name),
  description = coalesce(sqlc.arg('description'), description),
  status = coalesce(sqlc.arg('status'), status)
where thing.id = sqlc.arg(id)

-- name, description and status are all nullable arguments, id is not.

Proposal
I think sqlc should expose a way (like overrides) in the query itself to control aspects of the generated code in the target languages.

Allow sqlc.arg(name) to accept more variadic arguments (similar to jsonb_build_object) to build up string key, and json value pairs of "configuration" for that arguments. I think sqlc itself should not really know too much about the semantics of these key/value pairs, and just pass them along to the code generators in the target language.

-- name: EditThing :exec
update thing
set
  name = coalesce(sqlc.arg('new_name', 'nullable', true, 'type', 'string'), name),
  description = coalesce(sqlc.arg('description', nullable, true), description),
  status = coalesce(sqlc.arg('status', 'nullable', true), status)
where thing.id = sqlc.arg(id)

This also allows the developer to control the API interface much better, even if the database is a little wacky. Imagine if there was a column name that was nullable for historical reasons, but any new insert to name it should be non null. sqlc should support that use case too.

I am not a super-polyglot, but if there were a C++ backend for sqlc, you probably wouldn't just want nullable but also maybe an option for the type of pointer that should be generated raw pointer? std::shared_ptr? std::unique_ptr? boost::? std::optional ?

-- name: GetThingForCpp :one
select
  id,
  first_name,
  last_name,
  sqlc.output(concat(first_name, ' ', last_name), 'nullable', false, 'ref_counted', true) as display_name
where ....

This is very similar to overrides, but on a query-by-query basis, and expressed directly in SQL instead of away from the code in a config file.

Implementation
I got pretty far along in the implementation, and I'd be happy to push it all the way through - but I hit a big snag at the SQL rewriting. Rewriting the AST is no big deal (and I had the code working for that). But the way in which sqlc rewrites the original source seems to be engine independent, and a little hacky. Performing the source rewrites with an AST "deparse" (like in pg_query_go) would solve this with almost no problem. Or if the AST has more source location info in it (not just a start location, but also an end location), that might work too. But this seemed like a bigger refactor, and I wanted to get feedback on it first.

I'd love to know your thoughts @kyleconroy and from the general sqlc community too. I'd be happy to help and make this a reality.

What database engines need to be changed?

PostgreSQL, MySQL

What programming language backends need to be changed?

No response

The text was updated successfully, but these errors were encountered:

skabbes · 2022-04-03T16:30:24Z

I just learned about named parameters in postgres thanks to #1446

I think that would be a significantly better API, but I think the rest can remain the same.

go-mez · 2022-04-04T15:03:02Z

@skabbes
I think this would be a very helpful enhancement. And, have you thought of using the comments for configuring the args?
Something like this (just an idea):

-- name: EditThing :exec
-- args: [{nullable: true, type: string},{nullable: true},{}]
update thing
set
  name = coalesce(sqlc.arg('new_name'), name),
  status = coalesce(sqlc.arg('status'), status)
where thing.id = sqlc.arg(id)

skabbes · 2022-04-04T15:33:49Z

ya, I think "arg json" could be nice. I don't know if I'd want to refer to them by their position like in your example (since that is something I had been trying to get away from with using sqlc in the first place).

But conceptually, that's a good option to consider - maybe something like this:

-- name: EditThing :exec
-- args.new_name: {nullable: true, type: string}
-- args.status: {nullable: true}
-- args.id could be omitted, since the default behavior is OK
update thing
set
  name = coalesce(sqlc.arg('new_name'), name),
  status = coalesce(sqlc.arg('status'), status)
where thing.id = sqlc.arg(id)

Alternatively I was thinking that maybe an easier change, that solves 95% of the issue people are having would be to skip the extra metadata, and simply focus on nullability only at first.

sqlc.arg('new_name')  -- new_name's type is inferred and nullability is inferred
sqlc.arg('new_name!') -- new_name's type is inferred, and is non-null
sqlc.arg('new_name?') -- new_name's type is inferred, and is nullable

kyleconroy · 2022-04-06T02:23:04Z

@skabbes Thanks for such a detailed proposal. There's a lot to unpack, so I'll tackle each part

I think sqlc itself should not really know too much about the semantics of these key/value pairs

One of the goals of sqlc is to be a compiler. Compilers enforce rules and prevent users from making mistakes, so I'm going to shy away from features like this. You can always cast the sqlc.arg() function to a type and sqlc will choose that type.

I am not a super-polyglot, but if there were a C++ backend for sqlc,

C++ is strictly out of scope for now. I'd love to support more languages in the future (#296, #373), but we should only consider the three languages sqlc supports today. It's difficult to speculate what future language support might look like.

But the way in which sqlc rewrites the original source seems to be engine independent, and a little hacky.

Indeed it is :(. We'll be moving to an AST-based rewriter at some point in the future, just not there yet.

And, have you thought of using the comments for configuring the args?

I've come to regret using comments to power sqlc. They work great at the start, but quickly make it difficult to support more structured data. In the future, we'll implement more features using macros (design very TBD) without having to touch comments.

Alternatively I was thinking that maybe an easier change, that solves 95% of the issue people are having would be to skip the extra metadata, and simply focus on nullability only at first.

Agreed! I think the biggest impact we can have is to implement #1155. Syntactic sugar can always come later, so I think a named nullable parameter is the way to go.

skabbes · 2022-04-06T02:57:19Z

One of the goals of sqlc is to be a compiler [...]

Great appreciate you sticking to your vision.

You can always cast the sqlc.arg() function to a type and sqlc will choose that type.

Ya, I think I was worried about a theoretical case (particularly with like jsonb) where you might want a different concrete type depending on the query (for 1 database type (jsonb) to generate different target language types). I think the column overrides probably cover the 99% use case here.

We'll be moving to an AST-based rewriter [...]

I think implementing sqlc.arg(name, nullable => true) will itself be difficult without the AST serialization. I think something like sqlc.narg(name) (nullable arg) would be a lot simpler to inject into the codebase as it is now.

@kyleconroy do you have any thoughts between those 2?

And if the first one - do you have a branch started for AST serialization I can take a look at?

And if not... Is your vision for rewritten AST serialization to be based on the individual AST's of each target database (PostgreSQL, MySQL, SQLite?), or do you intend to transform them into some kind of sqlc intermediate representation?

DavidLarsKetch · 2022-04-28T15:17:51Z

In light of #1536, @skabbes did you consider exposing a new function like sqlc.narg(name)? Does not having AST serialization specifically prevent it?

I personally find a new function simpler to understand. It would be more discoverable for an end user and avoids needing to keep in mind additional syntax (sure, ?! is simple enough).

Lastly, thank you for the detailed write up and first pass! Really excited to see this move forward as I'd love to leverage the coalesce pattern to have flexible, partial UPDATE parameters, as others have shown.

skabbes · 2022-04-28T19:11:11Z

I think at this point - just waiting to here back from @kyleconroy on what his preferred next step would be, in the meantime - we can have a good discussion here.

!? has the added benefit that it allows for 2 different options, inferred nullability, explicitly nullable, explicitly NOT nullable.

sqlc.arg kind of has to be the "inferred nullability" option for historical reasons.

sqlc.narg is great, but it doesn't allow for specifying NOT nullable. So I think there need to be a sibling to narg if that path is chosen:

sqlc.rarg required arg
sqlc.nnarg - non-null arg (probably too easy to typo from narg).

Some other options that have come to mind as I've been thinkingn about this are:

using casts to define nullability

select sqlc.arg('arg_name')::sqlc.null;
select sqlc.arg('arg_name')::sqlc.nonnull;

-- define arg_name to be nullable string / non-null string
select sqlc.arg('arg_name')::text::sqlc.null;
select sqlc.arg('arg_name')::text::sqlc.nonnull;

using modifier functions to define nullability

select sqlc.nonnull(sqlc.arg('arg_name'));
select sqlc.null(sqlc.arg('arg_name'));
select sqlc.narg('arg_name'); -- equivalent to select sqlc.null(sqlc.arg('arg_name'));

Both of the above have the added benefit that you should be able to use them for outputs as well. For example

create table authors (name text );

-- change authors.name from nullable string to non-null string
select name::sqlc.nonnull as name from authors;

-- change authors.name from nullable string to non-null string
select sqlc.nonnull(name) as name from authors;

kyleconroy · 2022-04-30T06:25:21Z

I've found that it's best to add new features slowly, especially when they're related to SQL syntax. While using !? inside the name is clever, it will cause issues for people that have those characters in their column names. PostgreSQL will let you put almost anything in a column name if it's in double quotes.

I'm not convinced of the need for a non-nullable argument type. Going forward, let's add support for sqlc.narg which has the same signature as sqlc.arg but always generated a nullable parameter.

skabbes · 2022-04-30T17:04:45Z

Going forward, let's add support for sqlc.narg which has the same signature as sqlc.arg but always generated a nullable parameter.

Cool cool cool. This PR is 95% of the way there, I'll just need some time over the coming week(s) to finish it. Or anyone else can pick up where I left off if they'd like. No need to block on the below "=" comments, but I wanted to see if I could articulate the cases I had in mind better for user-defined-non-nullability.

====================================================================

I'm not convinced of the need for a non-nullable argument type.

Let me see if I can convince you with a few different scenarios. If one of these strikes you as "oh yea, I'd probably want that". Then I'd argue that non-nullability is a good addition too (and we should incorporate that into the design, so the sqlc.* api is consistent and cohesive).

Scenario 1
You are a SaaS product company, and you want to start sending daily digests to your users of activity on their account at 8AM. In order to implement that feature, you realize you need to track the user's timezone. You will populate this column lazily, since you need them to login to get that info from their Browser / mobile device.

-- first, add the nullable timezone column
ALTER TABLE "user" ADD COLUMN "timezone" TEXT;

-- next, write an an endpoint and sqlc query to update it.
-- name: UserUpdateTimezone :exec
UPDATE "user"
SET timezone = sqlc.arg('timezone')
WHERE id = sqlc.arg('id');

As an API designer, I'd want UserUpdateTimezone.timezone to be non-nullable, even if the database column is (temporarily) nullable. Of course any select queries on the user table would still need a nullable timezone.

Scenario 2
You are still a SaaS company, and in the past you have received requests to delete personally identifiable info. You did this by nulling out those columns, and adding a __deleted flag. Doing this instead of deleting the data outright maintained the referential integrity of other systems.

You wanted to maintain that name and email were not nullable for active users, so you added an additional check constraint to ensure that.

-- I didn't double check my SQL here, it is just to illustrate.
CREATE TABLE "user" (
  id primary key,
  name text, -- nullable
  email text, -- nullable
  __deleted boolean default false
)

ALTER TABLE "user" ADD CONSTRAINT nullable_only_when_deleted CHECK (
  __deleted = true OR (name is not null and email is not null)
);

Your "update profile" sqlc query, should only allow non-null name and email.

-- name: UserUpdateProfile :exec
UPDATE "user" SET
  name = sqlc.arg('name'),  -- I personally would want this to be non-null
  email = sqlc.arg('email') -- I personally would want this to be non-null too
WHERE id = sqlc.arg('id')
and __deleted = false;

This example also kind of shows why non-nullable output might be desirable too (I know that it is completely out of scope right now, but I think it helps inform a good design thinking about nullability in sqlc holistically).

-- name: ActiveUsers :many
select
  id,
  name, -- i want this to be non-null
  email -- I want this to be non-null
from "user"
where __deleted = false;

kyleconroy · 2022-04-30T19:57:54Z

Thanks for writing out a few different scenarios. I'm still not convinced due to how Go handles default values for structs.

-- name: UserUpdateTimezone :exec
UPDATE "user"
SET timezone = sqlc.arg('timezone')
WHERE id = sqlc.arg('id');

In this query, if you force the timezone and id arg to be non-null, but use struct-based params, the values will be the zero value of the field. I'd argue that you want these to default to NULL if not passed so that you don't add empty strings to your database. COALESCE can also be used to pick default values if none are provided.

skabbes · 2022-04-30T21:02:55Z

So, are you advocating that every single struct param generated by sqlc be nullable? That way we can protect then end users from accidentally missing them in struct params? 🤔 I think the "Go way" here is that it is up to the developer to make the 0 value have meaning. "meaning" is an application specific thing, so I feel it should be the up to the developer, not sqlc.

I kind of see the question not as

Should we allow user-defined non-nullable arguments?

But more like

We have decided to allow developers to specify nullable parameters, but sqlc believes that specifying non-nullable parameters is the source of many errors - we have decided that sqlc is a better product by disallowing these.

So anyyyyyyway....
I think 2 is what I am hearing you say. But I probably that it needs more time and feedback to get confidence one way or another. I think that's cool and exactly what these discussions are for.

FWIW - Typescript has first-class support for "struct optionality" so maybe if sqlc gets a Typescript generator, this discussion will resurface.

skabbes · 2022-04-30T22:00:54Z

Alright - "weeks' turned into hours. PR for sqlc.narg available here #1536

skabbes · 2022-05-03T01:49:24Z

Closing this, after we all use narg for a while, we'll see if it is necessary to revisit any of these.

DavidLarsKetch · 2022-05-03T17:32:43Z

Coming back around to extend my gratitude to @skabbes & @kyleconroy for landing this. Thanks! Not having this functionality was the last major tradeoff we faced for migrating to sqlc 🎉.

talksik · 2022-11-08T22:33:16Z

same here, this is super useful, I don't have to change my schema, and am still able to let sql error out on inserts for NOT NULL columns

skabbes added enhancement New feature or request triage New issues that hasn't been reviewed labels Apr 2, 2022

This was referenced Apr 2, 2022

NULLIF returns bool instead of correct type #133

Closed

Overwrites do not work on query only fields #1346

Open

Left Join not generating correct Null* Types #1334

Closed

skabbes mentioned this issue Apr 9, 2022

UPDATED: sqlc.narg() - allow user-specified nullability #1536

Merged

skabbes closed this as completed May 3, 2022

This was referenced Nov 27, 2022

Min/Max on Datetimes result in interface{} as type #1574

Closed

feat: add sqlc.embed to allow model re-use #1615

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: nullability (and other metadata) for input arguments / parameters to a query #1525

RFC: nullability (and other metadata) for input arguments / parameters to a query #1525

skabbes commented Apr 2, 2022

skabbes commented Apr 3, 2022

go-mez commented Apr 4, 2022

skabbes commented Apr 4, 2022 •

edited

kyleconroy commented Apr 6, 2022

skabbes commented Apr 6, 2022 •

edited

DavidLarsKetch commented Apr 28, 2022

skabbes commented Apr 28, 2022

kyleconroy commented Apr 30, 2022

skabbes commented Apr 30, 2022

kyleconroy commented Apr 30, 2022

skabbes commented Apr 30, 2022 •

edited

skabbes commented Apr 30, 2022

skabbes commented May 3, 2022

DavidLarsKetch commented May 3, 2022

talksik commented Nov 8, 2022

RFC: nullability (and other metadata) for input arguments / parameters to a query #1525

RFC: nullability (and other metadata) for input arguments / parameters to a query #1525

Comments

skabbes commented Apr 2, 2022

What do you want to change?

What database engines need to be changed?

What programming language backends need to be changed?

skabbes commented Apr 3, 2022

go-mez commented Apr 4, 2022

skabbes commented Apr 4, 2022 • edited

kyleconroy commented Apr 6, 2022

skabbes commented Apr 6, 2022 • edited

DavidLarsKetch commented Apr 28, 2022

skabbes commented Apr 28, 2022

kyleconroy commented Apr 30, 2022

skabbes commented Apr 30, 2022

kyleconroy commented Apr 30, 2022

skabbes commented Apr 30, 2022 • edited

skabbes commented Apr 30, 2022

skabbes commented May 3, 2022

DavidLarsKetch commented May 3, 2022

talksik commented Nov 8, 2022

skabbes commented Apr 4, 2022 •

edited

skabbes commented Apr 6, 2022 •

edited

skabbes commented Apr 30, 2022 •

edited