Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: nullability (and other metadata) for input arguments / parameters to a query #1525

Closed
skabbes opened this issue Apr 2, 2022 · 15 comments
Labels
enhancement New feature or request triage New issues that hasn't been reviewed

Comments

@skabbes
Copy link
Contributor

skabbes commented Apr 2, 2022

What do you want to change?

Kudos
Before I start, I wanted to give a huge kudos for sqlc - I love it. As a backend developer who knows SQL very well, I fight every single ORM I have used trying to do any SQL more complicated than CRUD. Thank you for creating this.

Background
I use sqlc alongside Postgres, as sort of a foreign function interface to SQL. I manage a PostgreSQL database with about 100 different tables and working in Typescript and Go to support every stage of solar operations for Bright. We do not currently use sqlc at Bright - but I do use it personally, but I am evaluating it Bright as well.

The types of problems I work on are not always CRUD-y (though sqlc works really well if they are). And the challenge I am facing is that SQL is a little less-expressive than the target languages in some cases, and therefore using sqlc as a FFI is a little clunky. A quick glance through the some open issues, shows that "nullability" is one of those deficiencies. sqlc does its best to always "do the right thing" and that has gotten it very far indeed. But I think a simple change can give developers full control, and allow for sqlc to keep getting smarter (better interred mappings / types).

Say for example, instead of finding a really good GeoJSON library in my target language, I wanted to rely on PostGIS.

- name: WKBToGeoJSON :one
select ST_AsGeoJSON(ST_GeomFromWKB(sqlc.arg('wkb_bytes'))::text as output;

I would love for this Go code to be generated:

func (q *Queries) WKBToGeoJSON(ctx context.Context, db DBTX, wkbBytes []byte) (string, error) {
    // ...
}

Or perhaps a more common use case, partial updates

-- name: EditThing :exec
update thing
set
  name = coalesce(sqlc.arg('name'), name),
  description = coalesce(sqlc.arg('description'), description),
  status = coalesce(sqlc.arg('status'), status)
where thing.id = sqlc.arg(id)

-- name, description and status are all nullable arguments, id is not.

Proposal
I think sqlc should expose a way (like overrides) in the query itself to control aspects of the generated code in the target languages.

Allow sqlc.arg(name) to accept more variadic arguments (similar to jsonb_build_object) to build up string key, and json value pairs of "configuration" for that arguments. I think sqlc itself should not really know too much about the semantics of these key/value pairs, and just pass them along to the code generators in the target language.

-- name: EditThing :exec
update thing
set
  name = coalesce(sqlc.arg('new_name', 'nullable', true, 'type', 'string'), name),
  description = coalesce(sqlc.arg('description', nullable, true), description),
  status = coalesce(sqlc.arg('status', 'nullable', true), status)
where thing.id = sqlc.arg(id)

This also allows the developer to control the API interface much better, even if the database is a little wacky. Imagine if there was a column name that was nullable for historical reasons, but any new insert to name it should be non null. sqlc should support that use case too.

I am not a super-polyglot, but if there were a C++ backend for sqlc, you probably wouldn't just want nullable but also maybe an option for the type of pointer that should be generated raw pointer? std::shared_ptr? std::unique_ptr? boost::? std::optional ?

-- name: GetThingForCpp :one
select
  id,
  first_name,
  last_name,
  sqlc.output(concat(first_name, ' ', last_name), 'nullable', false, 'ref_counted', true) as display_name
where ....

This is very similar to overrides, but on a query-by-query basis, and expressed directly in SQL instead of away from the code in a config file.

Implementation
I got pretty far along in the implementation, and I'd be happy to push it all the way through - but I hit a big snag at the SQL rewriting. Rewriting the AST is no big deal (and I had the code working for that). But the way in which sqlc rewrites the original source seems to be engine independent, and a little hacky. Performing the source rewrites with an AST "deparse" (like in pg_query_go) would solve this with almost no problem. Or if the AST has more source location info in it (not just a start location, but also an end location), that might work too. But this seemed like a bigger refactor, and I wanted to get feedback on it first.

I'd love to know your thoughts @kyleconroy and from the general sqlc community too. I'd be happy to help and make this a reality.

What database engines need to be changed?

PostgreSQL, MySQL

What programming language backends need to be changed?

No response

@skabbes
Copy link
Contributor Author

skabbes commented Apr 3, 2022

I just learned about named parameters in postgres thanks to #1446

I think that would be a significantly better API, but I think the rest can remain the same.

@go-mez
Copy link
Contributor

go-mez commented Apr 4, 2022

@skabbes
I think this would be a very helpful enhancement. And, have you thought of using the comments for configuring the args?
Something like this (just an idea):

-- name: EditThing :exec
-- args: [{nullable: true, type: string},{nullable: true},{}]
update thing
set
  name = coalesce(sqlc.arg('new_name'), name),
  status = coalesce(sqlc.arg('status'), status)
where thing.id = sqlc.arg(id)

@skabbes
Copy link
Contributor Author

skabbes commented Apr 4, 2022

ya, I think "arg json" could be nice. I don't know if I'd want to refer to them by their position like in your example (since that is something I had been trying to get away from with using sqlc in the first place).

But conceptually, that's a good option to consider - maybe something like this:

-- name: EditThing :exec
-- args.new_name: {nullable: true, type: string}
-- args.status: {nullable: true}
-- args.id could be omitted, since the default behavior is OK
update thing
set
  name = coalesce(sqlc.arg('new_name'), name),
  status = coalesce(sqlc.arg('status'), status)
where thing.id = sqlc.arg(id)

Alternatively I was thinking that maybe an easier change, that solves 95% of the issue people are having would be to skip the extra metadata, and simply focus on nullability only at first.

sqlc.arg('new_name')  -- new_name's type is inferred and nullability is inferred
sqlc.arg('new_name!') -- new_name's type is inferred, and is non-null
sqlc.arg('new_name?') -- new_name's type is inferred, and is nullable

@kyleconroy
Copy link
Collaborator

@skabbes Thanks for such a detailed proposal. There's a lot to unpack, so I'll tackle each part

I think sqlc itself should not really know too much about the semantics of these key/value pairs

One of the goals of sqlc is to be a compiler. Compilers enforce rules and prevent users from making mistakes, so I'm going to shy away from features like this. You can always cast the sqlc.arg() function to a type and sqlc will choose that type.

I am not a super-polyglot, but if there were a C++ backend for sqlc,

C++ is strictly out of scope for now. I'd love to support more languages in the future (#296, #373), but we should only consider the three languages sqlc supports today. It's difficult to speculate what future language support might look like.

But the way in which sqlc rewrites the original source seems to be engine independent, and a little hacky.

Indeed it is :(. We'll be moving to an AST-based rewriter at some point in the future, just not there yet.

And, have you thought of using the comments for configuring the args?

I've come to regret using comments to power sqlc. They work great at the start, but quickly make it difficult to support more structured data. In the future, we'll implement more features using macros (design very TBD) without having to touch comments.

Alternatively I was thinking that maybe an easier change, that solves 95% of the issue people are having would be to skip the extra metadata, and simply focus on nullability only at first.

Agreed! I think the biggest impact we can have is to implement #1155. Syntactic sugar can always come later, so I think a named nullable parameter is the way to go.

@skabbes
Copy link
Contributor Author

skabbes commented Apr 6, 2022

One of the goals of sqlc is to be a compiler [...]

Great appreciate you sticking to your vision.

You can always cast the sqlc.arg() function to a type and sqlc will choose that type.

Ya, I think I was worried about a theoretical case (particularly with like jsonb) where you might want a different concrete type depending on the query (for 1 database type (jsonb) to generate different target language types). I think the column overrides probably cover the 99% use case here.

We'll be moving to an AST-based rewriter [...]

I think implementing sqlc.arg(name, nullable => true) will itself be difficult without the AST serialization. I think something like sqlc.narg(name) (nullable arg) would be a lot simpler to inject into the codebase as it is now.

@kyleconroy do you have any thoughts between those 2?

And if the first one - do you have a branch started for AST serialization I can take a look at?

And if not... Is your vision for rewritten AST serialization to be based on the individual AST's of each target database (PostgreSQL, MySQL, SQLite?), or do you intend to transform them into some kind of sqlc intermediate representation?

@DavidLarsKetch
Copy link

In light of #1536, @skabbes did you consider exposing a new function like sqlc.narg(name)? Does not having AST serialization specifically prevent it?

I personally find a new function simpler to understand. It would be more discoverable for an end user and avoids needing to keep in mind additional syntax (sure, ?! is simple enough).

Lastly, thank you for the detailed write up and first pass! Really excited to see this move forward as I'd love to leverage the coalesce pattern to have flexible, partial UPDATE parameters, as others have shown.

@skabbes
Copy link
Contributor Author

skabbes commented Apr 28, 2022

I think at this point - just waiting to here back from @kyleconroy on what his preferred next step would be, in the meantime - we can have a good discussion here.

!? has the added benefit that it allows for 2 different options, inferred nullability, explicitly nullable, explicitly NOT nullable.

sqlc.arg kind of has to be the "inferred nullability" option for historical reasons.

sqlc.narg is great, but it doesn't allow for specifying NOT nullable. So I think there need to be a sibling to narg if that path is chosen:

  • sqlc.rarg required arg
  • sqlc.nnarg - non-null arg (probably too easy to typo from narg).

Some other options that have come to mind as I've been thinkingn about this are:

using casts to define nullability

select sqlc.arg('arg_name')::sqlc.null;
select sqlc.arg('arg_name')::sqlc.nonnull;

-- define arg_name to be nullable string / non-null string
select sqlc.arg('arg_name')::text::sqlc.null;
select sqlc.arg('arg_name')::text::sqlc.nonnull;

using modifier functions to define nullability

select sqlc.nonnull(sqlc.arg('arg_name'));
select sqlc.null(sqlc.arg('arg_name'));
select sqlc.narg('arg_name'); -- equivalent to select sqlc.null(sqlc.arg('arg_name'));

Both of the above have the added benefit that you should be able to use them for outputs as well. For example

create table authors (name text );

-- change authors.name from nullable string to non-null string
select name::sqlc.nonnull as name from authors;

-- change authors.name from nullable string to non-null string
select sqlc.nonnull(name) as name from authors;

@kyleconroy
Copy link
Collaborator

I've found that it's best to add new features slowly, especially when they're related to SQL syntax. While using !? inside the name is clever, it will cause issues for people that have those characters in their column names. PostgreSQL will let you put almost anything in a column name if it's in double quotes.

I'm not convinced of the need for a non-nullable argument type. Going forward, let's add support for sqlc.narg which has the same signature as sqlc.arg but always generated a nullable parameter.

@skabbes
Copy link
Contributor Author

skabbes commented Apr 30, 2022

Going forward, let's add support for sqlc.narg which has the same signature as sqlc.arg but always generated a nullable parameter.

Cool cool cool. This PR is 95% of the way there, I'll just need some time over the coming week(s) to finish it. Or anyone else can pick up where I left off if they'd like. No need to block on the below "=" comments, but I wanted to see if I could articulate the cases I had in mind better for user-defined-non-nullability.

====================================================================

I'm not convinced of the need for a non-nullable argument type.

Let me see if I can convince you with a few different scenarios. If one of these strikes you as "oh yea, I'd probably want that". Then I'd argue that non-nullability is a good addition too (and we should incorporate that into the design, so the sqlc.* api is consistent and cohesive).

Scenario 1
You are a SaaS product company, and you want to start sending daily digests to your users of activity on their account at 8AM. In order to implement that feature, you realize you need to track the user's timezone. You will populate this column lazily, since you need them to login to get that info from their Browser / mobile device.

-- first, add the nullable timezone column
ALTER TABLE "user" ADD COLUMN "timezone" TEXT;

-- next, write an an endpoint and sqlc query to update it.
-- name: UserUpdateTimezone :exec
UPDATE "user"
SET timezone = sqlc.arg('timezone')
WHERE id = sqlc.arg('id');

As an API designer, I'd want UserUpdateTimezone.timezone to be non-nullable, even if the database column is (temporarily) nullable. Of course any select queries on the user table would still need a nullable timezone.

Scenario 2
You are still a SaaS company, and in the past you have received requests to delete personally identifiable info. You did this by nulling out those columns, and adding a __deleted flag. Doing this instead of deleting the data outright maintained the referential integrity of other systems.

You wanted to maintain that name and email were not nullable for active users, so you added an additional check constraint to ensure that.

-- I didn't double check my SQL here, it is just to illustrate.
CREATE TABLE "user" (
  id primary key,
  name text, -- nullable
  email text, -- nullable
  __deleted boolean default false
)

ALTER TABLE "user" ADD CONSTRAINT nullable_only_when_deleted CHECK (
  __deleted = true OR (name is not null and email is not null)
);

Your "update profile" sqlc query, should only allow non-null name and email.

-- name: UserUpdateProfile :exec
UPDATE "user" SET
  name = sqlc.arg('name'),  -- I personally would want this to be non-null
  email = sqlc.arg('email') -- I personally would want this to be non-null too
WHERE id = sqlc.arg('id')
and __deleted = false;

This example also kind of shows why non-nullable output might be desirable too (I know that it is completely out of scope right now, but I think it helps inform a good design thinking about nullability in sqlc holistically).

-- name: ActiveUsers :many
select
  id,
  name, -- i want this to be non-null
  email -- I want this to be non-null
from "user"
where __deleted = false;

@kyleconroy
Copy link
Collaborator

Thanks for writing out a few different scenarios. I'm still not convinced due to how Go handles default values for structs.

-- name: UserUpdateTimezone :exec
UPDATE "user"
SET timezone = sqlc.arg('timezone')
WHERE id = sqlc.arg('id');

In this query, if you force the timezone and id arg to be non-null, but use struct-based params, the values will be the zero value of the field. I'd argue that you want these to default to NULL if not passed so that you don't add empty strings to your database. COALESCE can also be used to pick default values if none are provided.

@skabbes
Copy link
Contributor Author

skabbes commented Apr 30, 2022

So, are you advocating that every single struct param generated by sqlc be nullable? That way we can protect then end users from accidentally missing them in struct params? 🤔 I think the "Go way" here is that it is up to the developer to make the 0 value have meaning. "meaning" is an application specific thing, so I feel it should be the up to the developer, not sqlc.

I kind of see the question not as

  1. Should we allow user-defined non-nullable arguments?

But more like

  1. We have decided to allow developers to specify nullable parameters, but sqlc believes that specifying non-nullable parameters is the source of many errors - we have decided that sqlc is a better product by disallowing these.

So anyyyyyyway....
I think 2 is what I am hearing you say. But I probably that it needs more time and feedback to get confidence one way or another. I think that's cool and exactly what these discussions are for.

FWIW - Typescript has first-class support for "struct optionality" so maybe if sqlc gets a Typescript generator, this discussion will resurface.

@skabbes
Copy link
Contributor Author

skabbes commented Apr 30, 2022

Alright - "weeks' turned into hours. PR for sqlc.narg available here #1536

@skabbes
Copy link
Contributor Author

skabbes commented May 3, 2022

Closing this, after we all use narg for a while, we'll see if it is necessary to revisit any of these.

@skabbes skabbes closed this as completed May 3, 2022
@DavidLarsKetch
Copy link

Coming back around to extend my gratitude to @skabbes & @kyleconroy for landing this. Thanks! Not having this functionality was the last major tradeoff we faced for migrating to sqlc 🎉.

@talksik
Copy link

talksik commented Nov 8, 2022

same here, this is super useful, I don't have to change my schema, and am still able to let sql error out on inserts for NOT NULL columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage New issues that hasn't been reviewed
Projects
None yet
Development

No branches or pull requests

5 participants