Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/sources: provide context on upstream schema versioning #26975

Open
morsapaes opened this issue May 8, 2024 · 1 comment
Open

storage/sources: provide context on upstream schema versioning #26975

morsapaes opened this issue May 8, 2024 · 1 comment
Assignees
Labels
A-STORAGE Topics related to the Storage layer C-feature Category: new feature or request

Comments

@morsapaes
Copy link
Contributor

Feature request

The schema of a source is set on creation. As the upstream schema changes, Materialize either tolerates these changes or puts the source in an errored state, but doesn't explicitly flag that a newer schema version is available. To help identify (sub)sources that have newer schemas available, we should provide users with context on upstream schema versioning.

  • For Kafka sources using a schema registry, we can write down the schema version used when the source was created. Ideally, we'd also ping the registry for the latest version on some cadence, so we can flip on a bool to signal there is a newer version available.
  • For PostgreSQL and MySQL sources, we can flip on a bool when we receive a new Relation message in the replication stream.

Once this is available in the system catalog, we can use it to annotate (sub)sources in the Console, as a first step. cc @ggnall @parkerhendo

@morsapaes morsapaes added C-feature Category: new feature or request A-STORAGE Topics related to the Storage layer labels May 8, 2024
@bosconi bosconi assigned bosconi and guswynn and unassigned bosconi May 15, 2024
@morsapaes
Copy link
Contributor Author

morsapaes commented May 15, 2024

Noting down a few thoughts after discussing with the team:

From @benesch:

I'm not entirely certain about the implementation here. It may be simpler/better/easier to periodically ping the upstream system to get the table's current schema and diff it against the schema in Materialize.

From @guswynn:

For PostgreSQL and MySQL, we might want to wait until sources and subsources get untangled on the statistics side before implementing this. Gus will get an estimate for the Kafka bit, which is the most pressing user need for the short-term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-STORAGE Topics related to the Storage layer C-feature Category: new feature or request
Projects
None yet
Development

No branches or pull requests

3 participants