Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Provide a WAL in RisingWave for source and DML #16772

Open
kwannoel opened this issue May 15, 2024 · 2 comments
Open

Discussion: Provide a WAL in RisingWave for source and DML #16772

kwannoel opened this issue May 15, 2024 · 2 comments

Comments

@kwannoel
Copy link
Contributor

Currently as much as possible, we depend on upstream system's properties to ensure our exactly once processing.
Take mysql / PG for instance, we need to depend on their WAL log.

If we provide our own WAL, we don't depend on external systems for the exactly once processing part. This reduces complexity and increases maintainability.

For DML, we can also provide the exactly once processing. Currently after running DML, it may not get committed immediately. If recovery happens, we lose the records.

@github-actions github-actions bot added this to the release-1.10 milestone May 15, 2024
@fuyufjh
Copy link
Contributor

fuyufjh commented May 16, 2024

I just realized that log store seems not a good choice. Log store, as part of the Hummock, relies on the 1-time-per-second checkpoints to commit (i.e. make sure it's persisted and won't be lost in any way). While, in this case, we want to commit the DML changes as fast as possible, the 1-second commit latency sounds too long.


Then it comes back to the very early discussion - shall we set up a Kafka before RW to hold the DML requests? That is, when the frontend node accepts DML statements, it writes it into Kafka and return OK to users as long as the Kafka producer acknowledges.

BEFORE:  DML statements -> frontend node -> compute nodes
AFTER:   DML statements -> frontend node -> Kafka WAL -> compute nodes

Of course Kafka is not necessary. As long as something can provide such ability.

@StrikeW
Copy link
Contributor

StrikeW commented May 24, 2024

Take mysql / PG for instance, we need to depend on their WAL log.
If we provide our own WAL, we don't depend on external systems for the exactly once processing part. This reduces complexity and increases maintainability.

Suppose we have our own WAL, but we are still the consumer of upstream WAL which means we still need to depend on the WAL in external system. For example, when a recovery occurs, we still need to reset the upstream offset and resume the consumption .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants