Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgresql >= 10: use identity columns for autoincrement #874

Merged
merged 1 commit into from Jul 8, 2019

Conversation

grmblfrz
Copy link

@grmblfrz grmblfrz commented Apr 4, 2019

Use identity columns on postgres versions > 9 (see https://blog.2ndquadrant.com/postgresql-10-identity-columns/). Thoughts?

@michael-robbins
Copy link

When can we get this merged in? Run into this usecase today, would be nice to see this merged!

@michael-robbins
Copy link

@nvoxland is there anything blocking this?

@nvoxland nvoxland merged commit 1990f64 into liquibase:master Jul 8, 2019
@nvoxland
Copy link
Contributor

nvoxland commented Jul 8, 2019

Merged in, thanks!

@alexkau
Copy link

alexkau commented Jul 22, 2019

This change means that we now have different database schemas depending on which version of Liquibase was used to run the upgrade.

As an example, let's say we're running Liquibase 3.6.3 on version 1 and 2 of an application, and 3.7.0 on version 3. In version 2, we create a table with an ID column set to auto increment.

On one server, we upgrade from version 1 to version 2, at which point the table is created with the ID column using a sequence. We then upgrade to version 3, and the schema is maintained.
On a second server, we upgrade directly from version 1 to version 3. In the migration, the table is created with an identity column instead.

Are there any recommended workarounds for this besides setting a mandatory stop-off point (e.g. "you must upgrade to version 2 before you can upgrade to version 3") or rewriting all of our migrations and ID columns that are affected by this? I feel like this may have made more sense as a different command, i.e. keeping autoincrement="true" as it is and adding something else to indicate "generate as an identity column", but since 3.7.0 has already been released that may not be feasible.

At the very minimum, I think this should be documented in the release notes, since having the exact same migration generate a significantly different schema is something that should really have attention called to it. This change isn't mentioned at all in either the release notes or the release highlights.

@nvoxland
Copy link
Contributor

I tend to think of autoincrement="true" as meaning "make this column an identity/auto-increment column in whatever way is best for this database type".

Previously postgresql's "identity" logic was slightly different than mssql/mysql/etc. because they had their SERIAL types being a syntatic sugar on top of creating a sequence and a default value, but now that 10+ supports a more sql-standard GENERATED BY IDENTITY clause that is the "better" way to implement autoIncrement="true".

However, I see your point of it generating slightly different schemas depending on the versions used. They should be logically identical because the columns are always "autoincrement", but on some the columns have a default value of "GENERATED BY IDENTITY" and some have default value of "next_val('the_sequence')".

But, am I correct in understanding how they are different? Or are there more problems? And, where and how do those differences affect you?

@alexkau
Copy link

alexkau commented Jul 25, 2019

That all sounds correct to me.

The exact issue we're running into is that in some cases we modify the generated sequences, and this modification fails since there's no longer a sequence.

I guess the question is: when using the XML syntax, is only the outward-facing behavior guaranteed and not the schema underneath? This change is just fine if all you care about is "having a column that auto increments", but if you decide you want to change your schema in the future (as a simple example, renaming a table and its associated ID sequences) then having an inconsistent schema can be a real headache when you hit changes like this.

@nvoxland
Copy link
Contributor

Yes, the assumption in the Liquibase design is when you are using XML you are looking to specify the semantic definition of how it should work without having to get into the implementation details of how it actually works. Those details can vary from database platform platform and even database version to version. It even lets you plug in completely different implementations of those statements like "doing a 'dropColumn' shouldn't simply drop the column but instead rename it to something random" that is still conceptually "drop the column". The end goal, is to make have liquibase do the "right" thing based on the semantic description of what you have.

However, like you say, the SQL implementation liquibase uses can have an impact. This issue where people may have been relying on the way sequences were auto-generated is one example, but there are also things like how liquibase tries to "fix"/standardize data types, default constraint names liquibase may or may not generate, etc. Even more fun is "there was a bug in the generated SQL that needs to be fixed, but that is going to break people that are expecting/working-around the 'wrong' behavior"

This current behavior/expectation has grown organically from beginning, but needs to improve and/or change. I've been trying to think for a while and don't have a great solution beyond needing to better document/explain when SQL vs. XML is the right choice and what to expect from both. But, as a developer who doesn't always read the docs, I know documentation isn't the best solution.

@nvoxland
Copy link
Contributor

For this particular issue, I'm leaning to leaving the change in since the new GENERATED BY DEFAULT is "better" SQL and I think that for the most part people don't interact with the sequences that are generated by default. If you do have to work with the sequences (like to set min/step values) it has probably always been safest to not rely on autoincrement=true but instead create the sequences and default values directly. Of course not everyone did that and they are hitting a problem with this change, but feeling like they are probably a minority compared to the people that like this change in.

For people who are being bit by this, I think there are a couple work arounds to use.

One would be to add sequenceExists preconditions to changeSets that have to care about the sequence names that may be changing so you can find the correct sequence to work on. That could work if there are not a lot of cases to care about.

Another option could be to create a local postgresql extension that just undoes this change. If you create subclasses of these changed classes with a higher priority, liquibase will use your classes instead and in those classes you can specify the old behavior. That will automatically keep things the way they were, but will require a bit of java programming.

The last option I can think of is adding tags to the changeSets you care about in order to replace the new GENERATED BY DEFAULT with SERIAL SQL instead.

Thoughts?

@alexkau
Copy link

alexkau commented Jul 26, 2019

This all seems sane to me, if there's an assumption that XML isn't guaranteed to generate a consistent schema for any given database. It's going to be a lot of extra work for us to fix the hole we've dug ourselves into with many dozens of tables using autogenerated sequences created by XML syntax and then later modified with raw SQL, so we're probably going to just switch away from XML syntax going forward, and then either manually fix every migration affected by this or do a mandatory stop-off point to ensure all of the existing XML migrations take place on 3.6.3. That said, it sounds like this is our problem and not yours, since we've been working off of an incorrect assumption.

The main thing I'd ask going forward is to give better documentation that the XML syntax makes no guarantee of schema consistency between versions. (I skimmed the docs for five minutes or so and wasn't able to find anything to that effect, and I'm interpreting your comments above as saying that it's either undocumented or not well-documented.) If this were well-documented, then even if I hadn't read the docs, you'd be able to deal with this issue far easier by pointing me to the docs and telling me to reread them. :)

One last note: Schema changes like this do mean that you're going to have to account for every version's syntax if you ever want to add XML syntax that affects anything touched by a schema change, so it's probably worth keeping that in mind if you're not already. With that in mind, it'd probably also be helpful to have a document keeping track of changes in the schemas generated by each version, since that'd (1) make it a lot harder for people misusing XML schemas to fall into the trap that we almost fell into, and (2) make life a lot easier for anyone developing extensions that depend on schemas that have been changed. (For example, if someone were developing an extension to add syntax for "change auto-increment start/increment by/etc", then it'd be very helpful for them to have a list of every change that had ever been made to what autoincrement="true" had generated.)

@nvoxland
Copy link
Contributor

Yes, that assumption is definitely not as well documented as it should be, and I'm sorry it pushed you into a lot of extra work.

And yes, the whole management of changes like this needs to be better managed, messaged, and accounted for. We're looking to be improving all that over the next few releases, but there is also a larger "is that assumption right?" discussion that is ongoing and this is a great example to include in there.

@grzegorz-aniol
Copy link

I think the condition in the code is opposite and it's wrong. It creates BIGINT instead of BIGSERIAL for PgSQL 11.

@seckinsq
Copy link

I believe this is not backward compatible. We had many environments that ran our changesets and also new environments that would run same changesets from scratch with the latest version of liquibase that we would use. Now if a new environment applies these changesets to a clean database then many tables will not have default value with nextval on sequence (there are many developers, some used autoIncrement and some just left as serial as they knew liquibase 3.5.3 was creating sequences).

At least there could be an option, maybe a property to use for compatiblity reasons. If I would be creating new project then I could adjust myself to use autoIncrement = true always. But for an existing project, I could still use liquibase's latest version with a property that still creates serials with sequence (behaves like identity) like liquibase.acceptSerialAsIdentity = true (default false)

We are editing all changesets that create table and add validCheckSum = ANY to those changesets and add autoIncrement="true" to the column. So even we updated the changeset it won't give an error on existing environments because of checksum and if it has to run on clean database it will create create it as identity with liquibase 3.7.0+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants