Discussion: Provide a WAL in RisingWave for source and DML #16772

kwannoel · 2024-05-15T16:56:51Z

Currently as much as possible, we depend on upstream system's properties to ensure our exactly once processing.
Take mysql / PG for instance, we need to depend on their WAL log.

If we provide our own WAL, we don't depend on external systems for the exactly once processing part. This reduces complexity and increases maintainability.

For DML, we can also provide the exactly once processing. Currently after running DML, it may not get committed immediately. If recovery happens, we lose the records.

The text was updated successfully, but these errors were encountered:

fuyufjh · 2024-05-16T05:02:08Z

I just realized that log store seems not a good choice. Log store, as part of the Hummock, relies on the 1-time-per-second checkpoints to commit (i.e. make sure it's persisted and won't be lost in any way). While, in this case, we want to commit the DML changes as fast as possible, the 1-second commit latency sounds too long.

Then it comes back to the very early discussion - shall we set up a Kafka before RW to hold the DML requests? That is, when the frontend node accepts DML statements, it writes it into Kafka and return OK to users as long as the Kafka producer acknowledges.

BEFORE:  DML statements -> frontend node -> compute nodes
AFTER:   DML statements -> frontend node -> Kafka WAL -> compute nodes

Of course Kafka is not necessary. As long as something can provide such ability.

StrikeW · 2024-05-24T07:58:51Z

Take mysql / PG for instance, we need to depend on their WAL log.
If we provide our own WAL, we don't depend on external systems for the exactly once processing part. This reduces complexity and increases maintainability.

Suppose we have our own WAL, but we are still the consumer of upstream WAL which means we still need to depend on the WAL in external system. For example, when a recovery occurs, we still need to reset the upstream offset and resume the consumption .

kwannoel added the type/feature label May 15, 2024

github-actions bot added this to the release-1.10 milestone May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Provide a WAL in RisingWave for source and DML #16772

Discussion: Provide a WAL in RisingWave for source and DML #16772

kwannoel commented May 15, 2024

fuyufjh commented May 16, 2024 •

edited

StrikeW commented May 24, 2024

Discussion: Provide a WAL in RisingWave for source and DML #16772

Discussion: Provide a WAL in RisingWave for source and DML #16772

Comments

kwannoel commented May 15, 2024

fuyufjh commented May 16, 2024 • edited

StrikeW commented May 24, 2024

fuyufjh commented May 16, 2024 •

edited