Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Titan support write without WAL #81

Open
yiwu-arbug opened this issue Sep 17, 2019 · 8 comments
Open

Titan support write without WAL #81

yiwu-arbug opened this issue Sep 17, 2019 · 8 comments
Assignees

Comments

@yiwu-arbug
Copy link
Collaborator

Current GC implementation assume WAL is always enabled for both user writes and GC writes. However, if user disable WAL, it could lead to data inconsistency after GC.

Example:

  1. There's two version of a key, (k, v1), and (k, v2). (k, v1) has been flushed and persisted in SST file and blob file b1. (k, v2) is in memtable.
  2. A GC job kinks in and use b1 as input. It skip overwriting (k, v1) to new blob file, since there's a newer version of the key.
  3. After the GC job, b1 is deleted.
  4. db restart. Since there's no WAL, (k, v2) is dropped since its in memtable, which is expected. However, (k, v1) is missing since b1 is deleted, which is not expected.

We need to find a way to allow user write without WAL.

@yiwu-arbug yiwu-arbug self-assigned this Sep 17, 2019
@DorianZheng
Copy link
Contributor

How about delay adding old blob files to obsolete files until we can make sure all corresponding key has been presisted in SST file. Maybe we can use titan table builder and event listener to collect some useful information.

@yiwu-arbug
Copy link
Collaborator Author

@DorianZheng Long time no see! Yeah, we should probably do what you suggest.

@Connor1996
Copy link
Member

Seems the GC rewritten index suffers the same issue. The new blob index is still in memtable when db restart, then old blob index is exposed but the blob file is already deleted.

How about just recording the sequence number after rewriting index, and mark the blob file as deleted until the newest sequence of the L0 file exceeds the sequence number.

@DorianZheng
Copy link
Contributor

DorianZheng commented Mar 18, 2020

@yiwu-arbug Thanks for the kindness.
@Connor1996

How about just recording the sequence number after rewriting index, and mark the blob file as deleted until the newest sequence of the L0 file exceeds the sequence number.

I think it works. Maybe we can listen the OnFlushCompleted event and we can retrive the largest_seqno from FlushJobInfo. @yiwu-arbug What do you think.

@yiwu-arbug
Copy link
Collaborator Author

@yiwu-arbug Thanks for the kindness.
@Connor1996

How about just recording the sequence number after rewriting index, and mark the blob file as deleted until the newest sequence of the L0 file exceeds the sequence number.

I think it works. Maybe we can listen the OnFlushCompleted event and we can retrive the largest_seqno from FlushJobInfo. @yiwu-arbug What do you think.

Sounds good. And do take care of restart.

@DorianZheng
Copy link
Contributor

#facebook/rocksdb#7069

@yiwu-arbug
Copy link
Collaborator Author

@DorianZheng Thanks. Though I think we can access rocksdb internals from Titan directly instead of adding new rocksdb API.

@DorianZheng
Copy link
Contributor

@yiwu-arbug Good Idea, will file PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants