Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CockroachDB inserting additional rowid column when using .primary() #5211

Closed
rijkvanzanten opened this issue Jun 6, 2022 · 17 comments · Fixed by #5212
Closed

CockroachDB inserting additional rowid column when using .primary() #5211

rijkvanzanten opened this issue Jun 6, 2022 · 17 comments · Fixed by #5212

Comments

@rijkvanzanten
Copy link
Collaborator

Environment

Knex version: 2.1.0
Database + version: CockroachDB v22.1 (v21 affected as well)
OS: macOS 12.3.1

Bug

Behavior

When using CockroachDB, creating a primary key of type UUID:

await database.schema.createTable("example_a", (table) => {
	table.uuid("id").notNullable().primary();
});

will cause a secondary "rowid" column to be autogenerated. This is most likely due to the fact that knex generates two SQL queries to create this table+id combo:

db.schema
	.createTable("example_a", (table) => {
		table.uuid("id").notNullable().primary();
	})
	.toSQL();

// [
//   {
//     sql: 'create table "example_a" ("id" uuid not null)',
//     bindings: []
//   },
//   {
//     sql: 'alter table "example_a" add constraint "example_a_pkey" primary key ("id")',
//     bindings: []
//   }
// ]

The first SQL query executed in Cockroach will generate the default "rowid" column, as cockroach requires a primary key to exist in the table. This means the issue (and hopefully the fix) is very similar to #4141 / #5017.

A temporary workaround is to use a specificType, for example:

await database.schema.createTable("example_b", (table) => {
	table.specificType("id", "UUID PRIMARY KEY");
});

as that generates SQL that correctly creates the primary key in the first query in CRDB:

database.schema
	.createTable("example_b", (table) => {
		table.specificType("id", "UUID PRIMARY KEY");
	})
	.toSQL();

// [
//   {
//     sql: 'create table "example_b" ("id" UUID PRIMARY KEY)',
//     bindings: []
//   }
// ]

Reproduction

https://github.com/rijkvanzanten/knex-crdb-primary

@hairmare
Copy link
Contributor

hairmare commented Jun 6, 2022

It looks like implementing a primaryKey option as in table.uuid('id', {primaryKey: true}); would be much easier than adding support for table.uuid('id').primary().

I have a basic PoC for PostgreSQL working locally, if you're ok with adding the option to uuid() I'll look into implementing it across the board. The option would work like the primaryKey option on increments and bigincrements but it would default to false and only do something on PostgreSQL style databases.

Let me know if this would be something you would accept and I'll prepare a PR.

@rijkvanzanten
Copy link
Collaborator Author

@hairmare Isn't any column technically able to be turned into a primary key? Even though it's not common, I should be able to do something like .timestamp('id').primary() as well right? 🤔

@kibertoad
Copy link
Collaborator

@hairmare Solution looks good!

@hairmare
Copy link
Contributor

hairmare commented Jun 6, 2022

@rijkvanzanten indeed, that should be possible, the underlying issue here is that CockroachDB will just create the rowid primary key if none was added in the initial create statement. My propoosal to add this to uuid is mainly due to the fact that uuid is a pretty common primary key type.

@rijkvanzanten
Copy link
Collaborator Author

Does/will every column type support the { primaryKey: true } flag in general?

(Mostly asking out of selfishness; I'm working on a thing where the end-user chooses the type and whether or not it should be the primary key, so any deviation in the signatures there causes a bit more headache in the usage on our end 🙂 Beyond the bug presented above, it's currently nice-n-easy to do a

const column = table[selectedType]();

if (userWantsThisToBeAPrimaryKey) {
  column.primary();
}

@kibertoad
Copy link
Collaborator

@rijkvanzanten I have no strong opinion on this; while technically it's a possibility, I'm yet to see non-number, non-string, non-uuid primary keys.

@hairmare
Copy link
Contributor

hairmare commented Jun 6, 2022

afaict currently only increments and bigincrements support { primaryKey: true } (and it's the default for them)

it looks like adding support for calling primary after the create statement has already been generated would add quite a bit of complexity to the schema builder.

this would result in needing to treat uuid more like increments/bigincrements in downstream, but given increments already is a special case i think this is going to be the easiest way to solve this bug.

I'm not sure if it's currently supported but having mixed primary keys (as in .primary(['id', 'date'])) is another exotic case that would add even more complexity. Luckily mixed primary keys are something i haven't come across in a really long time.

@rijkvanzanten
Copy link
Collaborator Author

I'm not sure if it's currently supported but having mixed primary keys (as in .primary(['id', 'date'])) is another exotic case that would add even more complexity. Luckily mixed primary keys are something i haven't come across in a really long time.

Interesting! I thought they would be fairly common in the usage of junction tables, where you use the mixed primary key as the identifier of the junction row:

┌───────────┐         ┌───────────────────┐
│articles   │         │articles_categories│       ┌──────────┐
│-----------│         │------------------ │       │categories│
│id         │◄───────►│article_id         │       │----------│
└───────────┘         │category_id        │◄─────►│id        │
                      └───────────────────┘       └──────────┘

where the primary key of articles_categories is (article_id, category_id)

@hairmare
Copy link
Contributor

hairmare commented Jun 6, 2022

@rijkvanzanten yeah, junction tables. I usually tend to create them with their own separate primary key mainly for supporting multiple databases. I've probably been cargo-culting that pattern from an oci to mysql migration decades ago tho.

@intech
Copy link
Contributor

intech commented Jun 6, 2022

To support composite keys, I thought to add to the table schema the addition of specifying the keys marked with the primary flag:

PRIMARY KEY (article_id, category_id)

or we can do the same in PostgreSQL dialect:

CONSTRAINT "primary" PRIMARY KEY (article_id ASC, category_id ASC)

@intech
Copy link
Contributor

intech commented Jun 6, 2022

Generating this schema would be the ideal solution:

CREATE TABLE articles_categories (
    article_id UUID DEFAULT gen_random_uuid(),
    category_id UUID,
    PRIMARY KEY (article_id ASC, category_id ASC)
);

For the article_id column, I deliberately specified the default value to avoid missing this option.

The simplest thing left is to find the time to implement this :)

@hairmare
Copy link
Contributor

hairmare commented Jun 7, 2022

I think #5212 is now ready. I didn't add tests/support for composite keys since i think the change is already enough on it's own and it's somewhat offtopic wrt the itch i'm scratching anyways.

@rijkvanzanten
Copy link
Collaborator Author

Sweet! Thanks @hairmare 🙂

So just to confirm, table.uuid('x', { primaryKey: true }); is NOT supposed to be the same as table.uuid('x').primary(), and only uuid() and increments()/bigIncrements() will support this alternative primaryKey flag?

The following accurate / intended?

// Create auto-increment primary key (single query)
table.increments('x');

// Create auto-increment primary key (two queries)
table.increments('x', { primaryKey: false }).primary();

// Create UUID primary key (single query)
table.uuid('x', { primaryKey: true });

// Create UUID primary key (two queries)
table.uuid('x').primary();

// Create any other primary key (two queries)
table.integer('x').primary();
table.string('x').primary();

@kibertoad Is the accepted solution here not support single-query primary keys on anything that isn't auto-increment/uuid in PG/CRDB, or is the long-term goal to make primary() work consistently across vendors? Right now, there's an interesting discrepancy between MySQL and Postgres/CRDB in which MySQL has been updated (#5017) to include the .primary() in the single create statement (therefore running the above example with all "single query"), while Postgres/CRDB works differently in that it has the "old" behavior of using multiple queries, but has an additional workaround for UUIDs specifically 🤔

@hairmare
Copy link
Contributor

hairmare commented Jun 8, 2022

@rijkvanzanten yes, i can/have to confirm that the behaviour won't be consistent wrt primary(). I do believe that .uuid('id;, {primaryKey: true}).primary() will work consistently given primary() will be a no-op if primaryKey=true.

I will be doing some testing wrt this over the next few weeks & if it turns out to be an issue that affects a lot of users/sql-dialects, then i'll gladly look into how we can make .uuid('id', {primaryKey: true}) work across multiple databases in a consistent fashion.

my current understanding is that my change here in knex will enable finding a non-bc-breaking solution for directus/directus#13749 that should be acceptable whilst not constraining us wrt to a future fix that supports just using primary() in the most obvious fashion.

Once #5212 has been tagged, I'l look into finding a way forward (which might possibly involve adding more dialects that support primaryKey=true). Let me know if that sounds unreasonable so we can find something that works in the long-term.

@hairmare
Copy link
Contributor

hairmare commented Jun 8, 2022

I should also mention i do plan on looking into #5017, it does look like it contains a interesting approach that i missed

@rijkvanzanten
Copy link
Collaborator Author

my current understanding is that my change here in knex will enable finding a non-bc-breaking solution for directus/directus#13749 that should be acceptable whilst not constraining us wrt to a future fix that supports just using primary() in the most obvious fashion.

I don't fully agree, but lets discuss that further on directus/directus#13749 rather than spamming this thread 🙂

@kibertoad
Copy link
Collaborator

Released in 2.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants