Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-1413 Support PostgreSQL Domain and Enum types #1079

Merged
merged 13 commits into from Dec 10, 2019

Conversation

Naros
Copy link
Member

@Naros Naros commented Oct 15, 2019

@Naros Naros requested a review from jpechane October 15, 2019 14:31
@Naros Naros force-pushed the DBZ-1413 branch 2 times, most recently from b4abf2a to 6972269 Compare November 5, 2019 23:06
@Naros
Copy link
Member Author

Naros commented Nov 5, 2019

@jpechane If you have time, could you give this a skim and let me know if I am on the right track

Copy link
Contributor

@jpechane jpechane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naros Generally, this lookf nice to me! I've left a few comments in the code.
I also like the extended test covergae. Just please make sure that all domain usecases are covered and domain of domain also works.

What will happen when uknown type is encountered that is not a domain type?

@@ -36,12 +38,17 @@
private static final Logger LOGGER = LoggerFactory.getLogger(AbstractColumnValue.class);

@Override
public LocalDate asLocalDate() {
public Object asLocalDate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this need to be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The came down to the fact that PgProtoColumnValue extended AbstractColumnValue and it needed to return a Duration rather than a LocalDate. In fact, for all the return type changes I had to make to the abstract class and its interface, these were due to keeping the same return types from the prior code this replaced. Do you have a more preferred solution for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't quite understand this. Why wasn't this needed before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When AbstractColumnValue was first implemented, it was only used by pgoutput and wal2json, of which both returned a LocalDate type. When migrating decodebufs to use this class when implementing PgProtoColumnValue, I changed this to account for the fact that PgProtoReplicationMessage#getValue returned different types.

In hindsight I guess I could have left this as-is and instead implemented PgColumnValue#asLocalDate as follows:

    @Override
    public LocalDate asLocalDate() {
        if (value.hasDatumInt32()) {
            return LocalDate.ofEpochDay((long) value.getDatumInt32());
        }

        final String s = asString();
        return s != null ? DateTimeFormat.get().date(s) : null;
    }

Is there a preference here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that one seems definitely nicer. So perhaps most (all?) return type changes in AbstractColumnValue could be reverted then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I've reverted all but the ones related to asTime() and asLocalTime() given that I have yet to find a solution that works with PostgreSQL allowing 24:00:00 in TIME fields; see commit 4389c16.

@Naros
Copy link
Member Author

Naros commented Nov 11, 2019

While adding more test coverage for snapshot phase, I uncovered the following: pgjdbc/pgjdbc#1604

The cliff notes version here is if a user defines a column's data type as:

CREATE DOMAIN varbit2 AS varbit(3);
CREATE TABLE test_table (pk serial, value varbit2, primary key(pk));

When we build the table's schema during snapshot, we get the column's dimensions from the JDBC driver's DatabaseMetadata#getColumns call which currently reports the value column's length as 2147483647 rather than the anticipated 3.

Obviously this would lead to erroneous schema change events emitted when a streaming change is detected for the table as streaming properly resolves a column's dimensions based on the column type's modifier value.

@davecramer
Copy link
Contributor

@Naros OK, so I see a fairly easy way to fix this in the driver. However.... I'm curious what else it needed? Are you planning on supporting UDT's ?

@Naros
Copy link
Member Author

Naros commented Nov 12, 2019

@davecramer A recent comment on the JIRA issue leads me to say yes.
Specifically mentioned on the issue is CREATE TYPE mytype as ENUM ( ... );

@davecramer
Copy link
Contributor

ugh.. ENUM's suck in pg.

@gunnarmorling
Copy link
Member

@Naros, could you clarify whether this one here then also would resolve the enum issue? If so, that'd be great. A user just was asking for that again on DBZ-920.

@davecramer
Copy link
Contributor

@gunnarmorling as far as I know it resolves the enum issue and I just pushed pgjdbc/pgjdbc#1611 for numeric. I haven't released a new version yet though

@Naros
Copy link
Member Author

Naros commented Nov 25, 2019

@gunnarmorling Commit 36e1b06 should provide Enum support.

When we emit events, enum-based columns are now io.debezium.data.Enum schemas. If a user wants to get the underlying database type information, they can also enable the column source type propagation configuration, column.propagate.source.type for the column to get the underlying database type as well.

For example CREATE TYPE test_type as ENUM ('V1', 'V2') will specify that the type is TEST_TYPE when column source type propagation is enabled for a column which uses test_type.

The tests will likely fail for this until we update the driver, but I did test this with a local build of pgjdbc using commit: pgjdbc/pgjdbc@0507979 .

@Naros Naros changed the title DBZ-1413 Support PostgreSQL domain types DBZ-1413 Support PostgreSQL Domain and Enum types Nov 25, 2019
Copy link
Member

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naros, huge piece of work, thanks a lot! A few comments inline, mostly minor things. I'm still struggling a bit though to understand the larger picture of the change. Could you perhaps provide a few sentences that explain the overall design/strategy and the refactorings you did? That'd help a lot. Thanks!

@@ -47,6 +54,17 @@ public boolean isArrayType() {
return elementType != null;
}

/**
* @return true if this type is a base type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "base type"? Can you clarify in the JavaDoc?

Copy link
Member Author

@Naros Naros Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the recent commits, I decided to expand on this with the notion that a PostgresType has both a base type, which is documented but also allows for fetching a type's root type, the top most type in the type hierarchy. This proved nice to abstract the iterative nature away where we needed this as part of the type.

@@ -907,4 +920,54 @@ protected Object convertString(Column column, Field fieldDefn, Object data) {
}
return super.convertString(column, fieldDefn, data);
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid such empty comments. But could you add one to the method in the base class, describing its purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By moving associating the right jdbc type and native type to a column at construction time, these Types.DISTINCT specific handler methods were removed as they're no longer needed.

// fact the resolved base type is bool, not some oid that resolves to an unhandled type.
//
// Perhaps there are better ways - TBD.
return column.edit().jdbcType(baseType.getJdbcId()).nativeType(baseType.getOid()).create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, this side-effect of the get... method is unexpected. Could this be moved to the construction time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly, I'll look and let you know my findings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option I explored as well was to instead make a small change to JdbcValueConverters where we move what the switch statements are based upon to a method argument and then delegate to those as needed, e.g.

    @Override
    public SchemaBuilder schemaBuilder(Column column) {
        return resolveSchemaBuilderByJdbcType(column.jdbcType(), column);
    }

    protected SchemaBuilder resolveSchemaBuilderByJdbcType(int jdbcType, Column column) {
        switch (jdbcType) {
            ...
        }
    }

This allows PostgresValueConverter to actually call those new resolve methods directly and rather than create a new Column instance solely for the sake of controlling the switch argument, we can instead change those methods like so:

    @Override
    protected SchemaBuilder distinctSchema(Column column) {
        final int rootTypeJdbcId = typeRegistry.get(column.nativeType()).getRootType().getJdbcId();
        return resolveSchemaBuilderByJdbcType(rootTypeJdbcId, column);
    }

I am still going to test your suggestion with Column construction to be sure there are no side in the event you don't particularly care for all this method delegation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with your construction approach as it seemed to solve all the corner nuances without introducing any additional side effects.

@Naros
Copy link
Member Author

Naros commented Dec 2, 2019

There are quite a number of changes here and it can definitely be difficult to see the bigger picture, let me know if you need anymore details than the below:

When working with decoderbufs to start, I noticed that the user-defined domain types are always sent as byte arrays. After talks with Jiri, we concluded we'd make the changes in the connector to be able to parse those byte arrays when detected. We decided that it made sense to align the connector's handler for decoderbufs with the value resolution we had already done for wal2json and pgoutput. This is what lead to the changes to AbstractColumnValue and its subclasses.

The remaining changes revolve around supporting PostgresType as a part of a hierarchy of types, allowing us to traverse from some point in the hierarchy upward. This was important to be able to accurately resolve length/scale for columns for domain types as well as the appropriate schema builder and value converters.

@Naros
Copy link
Member Author

Naros commented Dec 2, 2019

@gunnarmorling @jpechane While I was working on porting the streaming test that checks a slew of domain types to the snapshot integration test, I noticed a comment that we probably should discuss before merging this.

Lets assume we have a table with a column defined with a data type of BOX.

During snapshot a record exists where it was inserted using the value (0,0),(1,1), the emitted value is a bytebuffer with the contents of (0.0,0.0),(1.0,1.0). I want to point out that the emitted value contains a .0 for each of the coordinates for the box. The odd behavior comes into play during streaming however, if using wal2json or pgoutput, the emitted value is a bytebuffer with the same contents of that from the snapshot; however, when using decoderbufs its actually a bytebuffer with the contents of (0,0),(1,1), notice the missing .0.

I checked master and the same behavior is observed there. Should we open another issue to try and standardize this across the decoders? The test in RecordsStreamProducerIT shows this is an inconsistency for box, circle, line,lseg, path, and polygon data types afaict. Thoughts?

@Naros
Copy link
Member Author

Naros commented Dec 5, 2019

During snapshot a record exists where it was inserted using the value (0,0),(1,1), the emitted value is a bytebuffer with the contents of (0.0,0.0),(1.0,1.0). I want to point out that the emitted value contains a .0 for each of the coordinates for the box. The odd behavior comes into play during streaming however, if using wal2json or pgoutput, the emitted value is a bytebuffer with the same contents of that from the snapshot; however, when using decoderbufs its actually a bytebuffer with the contents of (0,0),(1,1), notice the missing .0.

As a follow-up, this was resolved by the changes added in commit 4389c16 where we aligned the decoderbufs decoder to return the PG types for boxes, circles, etc instead of the explicit byte array.

@Naros Naros requested a review from jpechane December 5, 2019 00:30
@gunnarmorling
Copy link
Member

align the connector's handler for decoderbufs with the value resolution we had already done for wal2json and pgoutput.

Ok, thanks. That make sense 👍

As a follow-up, this was resolved by the changes added in commit

Ok, cool. Had filed already the follow-up, should have read to the end :)

@gunnarmorling
Copy link
Member

@Naros, there are a few test failures which seem related. Can you take a look?

Another question: can or should we merge this one before the PG driver update is there? My preference would to get this in quickly (so to avoid potential merge conflicts), also if that means the behavior isn't actually user-visibly fixed until the driver update is pulled in subsequentially. This might imply having to disable some tests temporarily. WDYT?

Copy link
Member

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naros LGTM, great work! Grew quite a bit bigger than we originally anticipated, thanks for pulling through with it. Last items I see:

  • Address test failures on CI
  • Log any follow up JIRA issues (if needed)
  • Decide on strategy in regards to awaiting the PG driver release

Once these are done, it's good to go from my perspective. Thanks again!

@Naros
Copy link
Member Author

Naros commented Dec 5, 2019

@gunnarmorling the test failures are most definitely due to the driver incompatibility and this comes into play when we refresh the schema from the database metadata because the column's length and scale are incorrectly reported in the metadata result-set. These failures should pass once we can upgrade to a production-ready version of the 42.2.9 driver.

I think we only really have 3 options here unfortunately:

  • Use 42.2.9-SNAPSHOT in CR1, hopefully being able to upgrade to 42.2.9 in our CR2/Final.
  • Wait on the 42.2.9 driver then incorporate it in the PR, rebasing/merging conflicts as needed.
  • Disable all tests pertinent to this behavior and incorporate it, follow-up jiras to upgrade driver & enable/verify tests there-after.

I'm fine with any of these. It would seem your preference is the latter so I can work toward that if that's what we'd rather do.

@gunnarmorling
Copy link
Member

It would seem your preference is the latter so I can work toward that if that's what we'd rather do.

Yes, exactly. We cannot publish anything depending on SNAPSHOT versions (Central will reject the deployment) and I'd prefer to get this merged, avoiding any dependency on the PG team's schedule for our own release. So let's disable the tests requiring it, go back to the stable driver version, make this one ready to merge and then log a follow-up JIRA for updating again and enabling the tests. Could you also clarify what will work then with this intermediary version: enums, domains partially, domains not at all?

@Naros
Copy link
Member Author

Naros commented Dec 5, 2019

Could you also clarify what will work then with this intermediary version: enums, domains partially, domains not at all?

Support for enum data types is good to go even with an intermediate solution. The changes needed for these involved actually identifying an enum type correctly and propagating that through the pipeline. There were no changes needed from the driver perspective to get this to work.

Support for domain types is slightly a different beast. When we first snapshot a table and refresh the schema, we do this by obtaining the columns length/scale from the driver's metadata. This is where the driver's fix is important.

If a domain type inherits from a base type and defines a custom type modifier definition then length/scale isn't read correctly. For example:

CREATE DOMAIN varbit2 AS varbit(2);
CREATE TABLE t (pk SERIAL, data varbit2, PRIMARY KEY(pk));
INSERT INTO table t (data) values (B'101');

This leads to the data column having a schema of Bits.builder(Integer.MAX_VALUE) rather than Bits.builder(3). Additionally when column.propagate.source.type is enabled, the length will be emitted with the value of Integer.MAX_VALUE instead of 3. Scale is also affected although in this example it would be 0 in either case.

However, in the highly unlikely situation where a user extends a base type but does not explicitly set any type modifiers on the definition, e.g. CREATE DOMAIN varbit2 AS varbit, then the emitted schema for the snapshot event would have the right expected values as they're the defaults.

The reason this isn't an issue for streaming is that we use the TypeRegistry to resolve length/scale when we build the columns from the logical replication stream.

That however does raise the question if we could override JdbcConnection#readTableColumn for the postgres connector and utilize the TypeRegistry to resolve a type's length/scale instead rather than sourcing those two attributes from the driver metadata. That might actually work around the driver changes entirely. I'm going to give that a test drive and see as it might be a workaround solution while we wait for the driver.

@Naros
Copy link
Member Author

Naros commented Dec 5, 2019

That however does raise the question if we could override JdbcConnection#readTableColumn for the postgres connector and utilize the TypeRegistry to resolve a type's length/scale instead rather than sourcing those two attributes from the driver metadata. That might actually work around the driver changes entirely. I'm going to give that a test drive and see as it might be a workaround solution while we wait for the driver.

As a follow-up, this does indeed allow us to ship this PR without the driver changes.

There is also one major advantage of this strategy is that we can also resolve the length/scale of types in a deep nested hierarchy with how our TypeRegistry works. This is something the driver changes introduced by @davecramer won't be able to support as its capped at a depth of 1 rather than being unbounded most likely for performance reasons. This means that the test case I added where we expected a failure will now pass.

I'm going to push this small change and if we decide we'd rather not have these, I can always remove the commit and force push, but I think this is probably a good change to have.

WDYT @gunnarmorling ?

@gunnarmorling
Copy link
Member

Excellent, thanks a lot, @Naros. Merging.

@gunnarmorling gunnarmorling merged commit c32b771 into debezium:master Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants