New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-1413 Support PostgreSQL Domain and Enum types #1079
Conversation
b4abf2a
to
6972269
Compare
@jpechane If you have time, could you give this a skim and let me know if I am on the right track |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naros Generally, this lookf nice to me! I've left a few comments in the code.
I also like the extended test covergae. Just please make sure that all domain usecases are covered and domain of domain also works.
What will happen when uknown type is encountered that is not a domain type?
debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresType.java
Outdated
Show resolved
Hide resolved
debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/TypeRegistry.java
Outdated
Show resolved
Hide resolved
debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/TypeRegistry.java
Outdated
Show resolved
Hide resolved
@@ -36,12 +38,17 @@ | |||
private static final Logger LOGGER = LoggerFactory.getLogger(AbstractColumnValue.class); | |||
|
|||
@Override | |||
public LocalDate asLocalDate() { | |||
public Object asLocalDate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The came down to the fact that PgProtoColumnValue
extended AbstractColumnValue
and it needed to return a Duration
rather than a LocalDate
. In fact, for all the return type changes I had to make to the abstract class and its interface, these were due to keeping the same return types from the prior code this replaced. Do you have a more preferred solution for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite understand this. Why wasn't this needed before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When AbstractColumnValue
was first implemented, it was only used by pgoutput and wal2json, of which both returned a LocalDate
type. When migrating decodebufs to use this class when implementing PgProtoColumnValue
, I changed this to account for the fact that PgProtoReplicationMessage#getValue
returned different types.
In hindsight I guess I could have left this as-is and instead implemented PgColumnValue#asLocalDate
as follows:
@Override
public LocalDate asLocalDate() {
if (value.hasDatumInt32()) {
return LocalDate.ofEpochDay((long) value.getDatumInt32());
}
final String s = asString();
return s != null ? DateTimeFormat.get().date(s) : null;
}
Is there a preference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that one seems definitely nicer. So perhaps most (all?) return type changes in AbstractColumnValue
could be reverted then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've reverted all but the ones related to asTime()
and asLocalTime()
given that I have yet to find a solution that works with PostgreSQL allowing 24:00:00
in TIME
fields; see commit 4389c16.
While adding more test coverage for snapshot phase, I uncovered the following: pgjdbc/pgjdbc#1604 The cliff notes version here is if a user defines a column's data type as:
When we build the table's schema during snapshot, we get the column's dimensions from the JDBC driver's Obviously this would lead to erroneous schema change events emitted when a streaming change is detected for the table as streaming properly resolves a column's dimensions based on the column type's modifier value. |
@Naros OK, so I see a fairly easy way to fix this in the driver. However.... I'm curious what else it needed? Are you planning on supporting UDT's ? |
@davecramer A recent comment on the JIRA issue leads me to say yes. |
ugh.. ENUM's suck in pg. |
@gunnarmorling as far as I know it resolves the enum issue and I just pushed pgjdbc/pgjdbc#1611 for numeric. I haven't released a new version yet though |
@gunnarmorling Commit 36e1b06 should provide Enum support. When we emit events, enum-based columns are now For example The tests will likely fail for this until we update the driver, but I did test this with a local build of pgjdbc using commit: pgjdbc/pgjdbc@0507979 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naros, huge piece of work, thanks a lot! A few comments inline, mostly minor things. I'm still struggling a bit though to understand the larger picture of the change. Could you perhaps provide a few sentences that explain the overall design/strategy and the refactorings you did? That'd help a lot. Thanks!
@@ -47,6 +54,17 @@ public boolean isArrayType() { | |||
return elementType != null; | |||
} | |||
|
|||
/** | |||
* @return true if this type is a base type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "base type"? Can you clarify in the JavaDoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the recent commits, I decided to expand on this with the notion that a PostgresType
has both a base type, which is documented but also allows for fetching a type's root type, the top most type in the type hierarchy. This proved nice to abstract the iterative nature away where we needed this as part of the type.
...es/src/main/java/io/debezium/connector/postgresql/connection/pgproto/PgProtoColumnValue.java
Show resolved
Hide resolved
debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresType.java
Show resolved
Hide resolved
@@ -907,4 +920,54 @@ protected Object convertString(Column column, Field fieldDefn, Object data) { | |||
} | |||
return super.convertString(column, fieldDefn, data); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid such empty comments. But could you add one to the method in the base class, describing its purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By moving associating the right jdbc type and native type to a column at construction time, these Types.DISTINCT
specific handler methods were removed as they're no longer needed.
// fact the resolved base type is bool, not some oid that resolves to an unhandled type. | ||
// | ||
// Perhaps there are better ways - TBD. | ||
return column.edit().jdbcType(baseType.getJdbcId()).nativeType(baseType.getOid()).create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, this side-effect of the get...
method is unexpected. Could this be moved to the construction time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly, I'll look and let you know my findings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option I explored as well was to instead make a small change to JdbcValueConverters
where we move what the switch statements are based upon to a method argument and then delegate to those as needed, e.g.
@Override
public SchemaBuilder schemaBuilder(Column column) {
return resolveSchemaBuilderByJdbcType(column.jdbcType(), column);
}
protected SchemaBuilder resolveSchemaBuilderByJdbcType(int jdbcType, Column column) {
switch (jdbcType) {
...
}
}
This allows PostgresValueConverter
to actually call those new resolve methods directly and rather than create a new Column
instance solely for the sake of controlling the switch argument, we can instead change those methods like so:
@Override
protected SchemaBuilder distinctSchema(Column column) {
final int rootTypeJdbcId = typeRegistry.get(column.nativeType()).getRootType().getJdbcId();
return resolveSchemaBuilderByJdbcType(rootTypeJdbcId, column);
}
I am still going to test your suggestion with Column construction to be sure there are no side in the event you don't particularly care for all this method delegation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go with your construction approach as it seemed to solve all the corner nuances without introducing any additional side effects.
There are quite a number of changes here and it can definitely be difficult to see the bigger picture, let me know if you need anymore details than the below: When working with decoderbufs to start, I noticed that the user-defined domain types are always sent as byte arrays. After talks with Jiri, we concluded we'd make the changes in the connector to be able to parse those byte arrays when detected. We decided that it made sense to align the connector's handler for decoderbufs with the value resolution we had already done for wal2json and pgoutput. This is what lead to the changes to The remaining changes revolve around supporting |
@gunnarmorling @jpechane While I was working on porting the streaming test that checks a slew of domain types to the snapshot integration test, I noticed a comment that we probably should discuss before merging this. Lets assume we have a table with a column defined with a data type of During snapshot a record exists where it was inserted using the value I checked master and the same behavior is observed there. Should we open another issue to try and standardize this across the decoders? The test in |
…mn native/jdbc type resolution happens
As a follow-up, this was resolved by the changes added in commit 4389c16 where we aligned the decoderbufs decoder to return the PG types for boxes, circles, etc instead of the explicit byte array. |
Ok, thanks. That make sense 👍
Ok, cool. Had filed already the follow-up, should have read to the end :) |
@Naros, there are a few test failures which seem related. Can you take a look? Another question: can or should we merge this one before the PG driver update is there? My preference would to get this in quickly (so to avoid potential merge conflicts), also if that means the behavior isn't actually user-visibly fixed until the driver update is pulled in subsequentially. This might imply having to disable some tests temporarily. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naros LGTM, great work! Grew quite a bit bigger than we originally anticipated, thanks for pulling through with it. Last items I see:
- Address test failures on CI
- Log any follow up JIRA issues (if needed)
- Decide on strategy in regards to awaiting the PG driver release
Once these are done, it's good to go from my perspective. Thanks again!
@gunnarmorling the test failures are most definitely due to the driver incompatibility and this comes into play when we refresh the schema from the database metadata because the column's length and scale are incorrectly reported in the metadata result-set. These failures should pass once we can upgrade to a production-ready version of the 42.2.9 driver. I think we only really have 3 options here unfortunately:
I'm fine with any of these. It would seem your preference is the latter so I can work toward that if that's what we'd rather do. |
Yes, exactly. We cannot publish anything depending on SNAPSHOT versions (Central will reject the deployment) and I'd prefer to get this merged, avoiding any dependency on the PG team's schedule for our own release. So let's disable the tests requiring it, go back to the stable driver version, make this one ready to merge and then log a follow-up JIRA for updating again and enabling the tests. Could you also clarify what will work then with this intermediary version: enums, domains partially, domains not at all? |
Support for enum data types is good to go even with an intermediate solution. The changes needed for these involved actually identifying an enum type correctly and propagating that through the pipeline. There were no changes needed from the driver perspective to get this to work. Support for domain types is slightly a different beast. When we first snapshot a table and refresh the schema, we do this by obtaining the columns length/scale from the driver's metadata. This is where the driver's fix is important. If a domain type inherits from a base type and defines a custom type modifier definition then length/scale isn't read correctly. For example:
This leads to the However, in the highly unlikely situation where a user extends a base type but does not explicitly set any type modifiers on the definition, e.g. The reason this isn't an issue for streaming is that we use the That however does raise the question if we could override |
As a follow-up, this does indeed allow us to ship this PR without the driver changes. There is also one major advantage of this strategy is that we can also resolve the length/scale of types in a deep nested hierarchy with how our I'm going to push this small change and if we decide we'd rather not have these, I can always remove the commit and force push, but I think this is probably a good change to have. WDYT @gunnarmorling ? |
Excellent, thanks a lot, @Naros. Merging. |
https://issues.jboss.org/browse/DBZ-1413
https://issues.jboss.org/browse/DBZ-920