Titan support write without WAL #81

yiwu-arbug · 2019-09-17T23:58:12Z

Current GC implementation assume WAL is always enabled for both user writes and GC writes. However, if user disable WAL, it could lead to data inconsistency after GC.

Example:

There's two version of a key, (k, v1), and (k, v2). (k, v1) has been flushed and persisted in SST file and blob file b1. (k, v2) is in memtable.
A GC job kinks in and use b1 as input. It skip overwriting (k, v1) to new blob file, since there's a newer version of the key.
After the GC job, b1 is deleted.
db restart. Since there's no WAL, (k, v2) is dropped since its in memtable, which is expected. However, (k, v1) is missing since b1 is deleted, which is not expected.

We need to find a way to allow user write without WAL.

DorianZheng · 2020-03-15T14:00:51Z

How about delay adding old blob files to obsolete files until we can make sure all corresponding key has been presisted in SST file. Maybe we can use titan table builder and event listener to collect some useful information.

yiwu-arbug · 2020-03-16T02:29:52Z

@DorianZheng Long time no see! Yeah, we should probably do what you suggest.

Connor1996 · 2020-03-16T03:42:21Z

Seems the GC rewritten index suffers the same issue. The new blob index is still in memtable when db restart, then old blob index is exposed but the blob file is already deleted.

How about just recording the sequence number after rewriting index, and mark the blob file as deleted until the newest sequence of the L0 file exceeds the sequence number.

DorianZheng · 2020-03-18T15:03:15Z

@yiwu-arbug Thanks for the kindness.
@Connor1996

How about just recording the sequence number after rewriting index, and mark the blob file as deleted until the newest sequence of the L0 file exceeds the sequence number.

I think it works. Maybe we can listen the OnFlushCompleted event and we can retrive the largest_seqno from FlushJobInfo. @yiwu-arbug What do you think.

yiwu-arbug · 2020-03-19T00:52:36Z

@yiwu-arbug Thanks for the kindness.
@Connor1996
How about just recording the sequence number after rewriting index, and mark the blob file as deleted until the newest sequence of the L0 file exceeds the sequence number.
I think it works. Maybe we can listen the OnFlushCompleted event and we can retrive the largest_seqno from FlushJobInfo. @yiwu-arbug What do you think.

Sounds good. And do take care of restart.

DorianZheng · 2020-07-02T14:54:09Z

#facebook/rocksdb#7069

yiwu-arbug · 2020-07-06T23:53:34Z

@DorianZheng Thanks. Though I think we can access rocksdb internals from Titan directly instead of adding new rocksdb API.

DorianZheng · 2020-07-17T16:44:56Z

@yiwu-arbug Good Idea, will file PR soon.

yiwu-arbug self-assigned this Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Titan support write without WAL #81

Titan support write without WAL #81

yiwu-arbug commented Sep 17, 2019

DorianZheng commented Mar 15, 2020

yiwu-arbug commented Mar 16, 2020

Connor1996 commented Mar 16, 2020

DorianZheng commented Mar 18, 2020 •

edited

yiwu-arbug commented Mar 19, 2020

DorianZheng commented Jul 2, 2020

yiwu-arbug commented Jul 6, 2020

DorianZheng commented Jul 17, 2020

Titan support write without WAL #81

Titan support write without WAL #81

Comments

yiwu-arbug commented Sep 17, 2019

DorianZheng commented Mar 15, 2020

yiwu-arbug commented Mar 16, 2020

Connor1996 commented Mar 16, 2020

DorianZheng commented Mar 18, 2020 • edited

yiwu-arbug commented Mar 19, 2020

DorianZheng commented Jul 2, 2020

yiwu-arbug commented Jul 6, 2020

DorianZheng commented Jul 17, 2020

DorianZheng commented Mar 18, 2020 •

edited