Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict buffered value size for blob zstd dictionary compression #197

Open
Connor1996 opened this issue Dec 2, 2020 · 5 comments
Open
Labels
status/help-wanted Status: Help wanted. Contributions are very welcome!

Comments

@Connor1996
Copy link
Member

Connor1996 commented Dec 2, 2020

When blob zstd dictionary compression is enabled, all the values will be buffered and replayed after the compression dictionary is finalized. So if there are multiple concurrent flushes and compactions, the memory footprint would be considerable considering the blob file size is 256MB by default.
For a small RAM instance, it may cause OOM easily. It's better to add a config to control the max concurrent buffered size.

@Connor1996
Copy link
Member Author

/cc @yiwu-arbug

@Connor1996 Connor1996 added the status/discussion Status: Under discussion or need discussion label Dec 2, 2020
@yiwu-arbug
Copy link
Collaborator

yes, we saw similar issue when enabling dictionary compression on vanilla RocksDB. we end up limit the sample size to 8MB per SST. cc @hunterlxt

@yiwu-arbug yiwu-arbug added status/help-wanted Status: Help wanted. Contributions are very welcome! and removed status/discussion Status: Under discussion or need discussion labels Dec 2, 2020
@ZhenhanGong
Copy link
Contributor

We can use zstd_max_train_bytes to limit the max buffered size. @Connor1996

@hunterlxt
Copy link
Member

We can use zstd_max_train_bytes to limit the max buffered size. @Connor1996

zstd_max_train_bytes only affect the bottommost level. And reducing zstd_max_train_bytes creates potentially small dictionary, which reduces the compression ration.

@ZhenhanGong
Copy link
Contributor

We can use zstd_max_train_bytes to limit the max buffered size. @Connor1996

zstd_max_train_bytes only affect the bottommost level. And reducing zstd_max_train_bytes creates potentially small dictionary, which reduces the compression ration.

This "zstd_max_train_bytes" belongs to Titan's config, RocksDB has a config with the same name.
Titan's config is used for value compression, while RocksDB's is used for key compression.
I check the code to confirm that this "zstd_max_train_bytes" can be used to limit sampling buffer size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/help-wanted Status: Help wanted. Contributions are very welcome!
Projects
None yet
Development

No branches or pull requests

4 participants