Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnFlushCompleted should update file discardable size #86

Open
yiwu-arbug opened this issue Sep 23, 2019 · 4 comments
Open

OnFlushCompleted should update file discardable size #86

yiwu-arbug opened this issue Sep 23, 2019 · 4 comments

Comments

@yiwu-arbug
Copy link
Collaborator

Currently we only update file discardable size in OnCompactionCompleted. We assume blob file data would not be discarded during flush, which is not true. Consider the case:

  1. There's a key "foo" in blob file b1.
  2. GC rewrite b1 into b2 and insert "foo" to memtable.
  3. Before memtable is flushed, user delete "foo".
  4. After flush, "foo" is removed from the output of flush, since it's deleted. The fact that "foo" is discarded is not reflected in b2's discardable size.
@yiwu-arbug
Copy link
Collaborator Author

I don't know how to fix this one, since OnFlushCompleted don't know what's the input data size for each blob file.

@JiayuZzz
Copy link
Contributor

JiayuZzz commented Oct 15, 2019

@yiwu-arbug
I think we can use output data size to get discardable_size of flushed blob file:

  1. If discardable_size == 0, then discardable_size = file_size - output_size
  2. if discardable_size > 0, then discardable_size = discardable_size - output_size ( for those blob file indexes that not be flushed completely in one run )

The problem is if all keys of a blob_file are deleted in memtable then we will still lose discardable_size.

@yiwu-arbug
Copy link
Collaborator Author

@wujy-cs sounds reasonable. So basically, a) mark the latest memtable id (say, M) after GC finish, b) wait till M flushed, then subtract file size by sum of output size of all memtables?

@JiayuZzz
Copy link
Contributor

@yiwu-arbug
Yes,. And we can do it in OnFlushCompleted without distinguishing gc rewritten blob file and flushed blob file, since outpu_size of flushed blob file always equals to file_size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants