Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't leak dropped column families #509

Merged
merged 10 commits into from Jul 22, 2021

Conversation

ryoqun
Copy link
Contributor

@ryoqun ryoqun commented Apr 19, 2021

problem

Regardless SingleThreaded or MultiThreaded, rust-rocksdb crate doesn't free files even after drop_cf() as long as we're opening DB instance. I added test_no_leaked_column_family to demonstrate that.

For SingleThreaded, it has been existed since the begging. So this isn't regression of #506...

For MultiThreaded, we (Solana) originally introduced the MultiThreaded for the ultimate goal of freeing dropped cf resources in the middle of in-process cf mutation under multi-threaded context. For that end, we managed to introduce the very MultiThreaded mode thanks for persistent review and eventual merge and release of 0.16.0. But due to this existing bug, we still can't realize the ultimate goal.

The root cause of the leak is that rocksdb_column_family_handle_destroy isn't called until we Drop::drop() the whole DB.

solution

Properly call rocksdb_column_family_handle_destroy at earlier (but proper) timings.

For SingleThreaded, it's relatively simple: just define Drop for ColumnFamily. There is no api incompatibility.

For MultiThreaded, the story isn't that simple because of its relaxed lifetime relation between DB and CFs. So, I had to wrap BoundColumnFamily with Arc. (I added small assertion for this concern in test_no_leaked_column_family as well) So, I had to introduce api incompatibility to fix this sadly.

Although there is unavoidable api breakage, I approached the problem under the constraint of providing some kind of source-level compatibility code for downstream consumers for the sake of general simpler code maintenace and generic programming. Specifically, leveraging the compiler's auto ref resolution thing, I realized that property using &&ColumnFamily for the single threaded. So, all tests compiles adding another extra & at call side for the both MultiThreaded and SingleThraded.

Also note that all of existing code should compile as before for default crate feature setting (no multi-threaded-cf). I just added & across the board to make the test suite compile for the both crate configuration.

As for performance, there is no overhead in SingleThreaded. and there is small Arc one for MultiThreaded.

(for the record)

Below is draft pr description:

first of all, thanks for releasing 0.16.0!!!!

this pr is still in draft phase. don't waste your time before I properly document things...

this is very unfortunate but, and after our further testing, we found we need another bugfix: solana-labs/solana#16589

src/db.rs Show resolved Hide resolved
tests/test_db.rs Outdated
db.compact_range_cf(cf, None::<&[u8]>, None::<&[u8]>);

assert!(db.get_cf(cf, b"key1").unwrap().is_none());
let outlived_db_or_cf = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are just free-riding on unrelated test..

I'll write a proper one.

@ryoqun
Copy link
Contributor Author

ryoqun commented May 28, 2021

long time no see. I'm going to revisit this shortly.

@aleksuss aleksuss mentioned this pull request Jun 23, 2021
@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 4, 2021

Hi, I worked on this finally. it took some time write a proper test to detect leak... I'll clean up this wip branch tomorrow: https://github.com/rust-rocksdb/rust-rocksdb/compare/master...ryoqun:fix-dropped-cf-leak-wip?expand=1

@ryoqun ryoqun marked this pull request as ready for review July 5, 2021 14:36
@ryoqun ryoqun changed the title [wip] Don't leak dropped column families Don't leak dropped column families Jul 5, 2021
@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 5, 2021

@aleksuss hi, everythihg is done! :) could you review this?

@aleksuss
Copy link
Member

aleksuss commented Jul 5, 2021

@ryoqun yes, sure. Thanks for your contribution

@@ -102,7 +102,7 @@ pub use crate::{
ColumnFamilyRef, DEFAULT_COLUMN_FAMILY_NAME,
},
compaction_filter::Decision as CompactionDecision,
db::{DBWithThreadMode, LiveFile, MultiThreaded, SingleThreaded, DB},
db::{DBWithThreadMode, LiveFile, MultiThreaded, SingleThreaded, ThreadMode, DB},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one line is enough for #532

@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 6, 2021

fyi, this is how you can write generic-over-ThreadMode or {multi,single}-threaded-cfcrate feature agnostic code: https://github.com/solana-labs/solana/compare/master...ryoqun:rocksdb-multi-threaded?expand=1


impl UnboundColumnFamily {
pub(crate) fn bound_column_family<'a>(
self: std::sync::Arc<Self>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't add use std::sync::Arc ? Especially you use it more than once.

Copy link
Contributor Author

@ryoqun ryoqun Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad to see a code hygiene like this from you. :)

I was just lazy...: 201d718

src/db.rs Outdated
use std::sync::RwLock;
use std::time::Duration;

// Marker trait to specify single or multi threaded column family alternations for DB
// Also, this is minimum common API sharable between SingleThreaded and
// MultiThreaded. Others differ in self mutability and return type.
pub trait ThreadMode {
fn new(cf_map: BTreeMap<String, ColumnFamily>) -> Self;
fn new(cf_map: BTreeMap<String, *mut crate::ffi::rocksdb_column_family_handle_t>) -> Self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use crate::ffi::rocksdb_column_family_handle_t ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to git grep, it looks like those C types tend to be prefixed with ffi::. So, I followed the convension instead of use just for rocksdb_column_family_handle_t (201d718):

$ git grep 'rocksdb_.*_t[^_a-z0-9]' src/
src/backup.rs:    inner: *mut ffi::rocksdb_backup_engine_t,
src/backup.rs:    inner: *mut ffi::rocksdb_options_t,
src/backup.rs:    inner: *mut ffi::rocksdb_restore_options_t,
src/backup.rs:        let be: *mut ffi::rocksdb_backup_engine_t;
src/checkpoint.rs:    inner: *mut ffi::rocksdb_checkpoint_t,
src/checkpoint.rs:        let checkpoint: *mut ffi::rocksdb_checkpoint_t;
src/column_family.rs:    pub(crate) inner: *mut ffi::rocksdb_column_family_handle_t,
src/column_family.rs:    pub(crate) inner: *mut ffi::rocksdb_column_family_handle_t,
src/column_family.rs:    pub(crate) inner: *mut ffi::rocksdb_column_family_handle_t,
src/column_family.rs:fn destroy_handle(handle: *mut ffi::rocksdb_column_family_handle_t) {
src/column_family.rs:    fn inner(&self) -> *mut ffi::rocksdb_column_family_handle_t;
src/column_family.rs:    fn inner(&self) -> *mut ffi::rocksdb_column_family_handle_t {
src/column_family.rs:    fn inner(&self) -> *mut ffi::rocksdb_column_family_handle_t {
src/column_family.rs:    fn inner(&self) -> *mut ffi::rocksdb_column_family_handle_t {
src/compaction_filter_factory.rs:    unsafe fn from_raw(ptr: *mut ffi::rocksdb_compactionfiltercontext_t) -> Self {
src/compaction_filter_factory.rs:    context: *mut ffi::rocksdb_compactionfiltercontext_t,
src/db.rs:    fn new(cf_map: BTreeMap<String, *mut ffi::rocksdb_column_family_handle_t>) -> Self;
src/db.rs:    fn new(cfs: BTreeMap<String, *mut ffi::rocksdb_column_family_handle_t>) -> Self {
src/db.rs:    fn new(cfs: BTreeMap<String, *mut ffi::rocksdb_column_family_handle_t>) -> Self {
src/db.rs:        cfopts: &[*const ffi::rocksdb_options_t],
src/db.rs:        cfhandles: &mut Vec<*mut ffi::rocksdb_column_family_handle_t>,
src/db.rs:    ) -> Result<*mut ffi::rocksdb_column_family_handle_t, Error> {
src/db.rs:            // rocksdb_wal_readoptions_t does not appear to have any functions
src/db.rs:            let opts: *const ffi::rocksdb_wal_readoptions_t = ptr::null();
src/db.rs:        cf_inner: *mut ffi::rocksdb_column_family_handle_t,
src/db_iterator.rs:    inner: *mut ffi::rocksdb_iterator_t,
src/db_iterator.rs:        cf_handle: *mut ffi::rocksdb_column_family_handle_t,
src/db_iterator.rs:        cf_handle: *mut ffi::rocksdb_column_family_handle_t,
src/db_iterator.rs:    pub(crate) inner: *mut ffi::rocksdb_wal_iterator_t,
src/db_options.rs:fn new_cache(capacity: size_t) -> *mut ffi::rocksdb_cache_t {
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_cache_t,
src/db_options.rs:    inner: *mut ffi::rocksdb_env_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_options_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_writeoptions_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_flushoptions_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_block_based_table_options_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_readoptions_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_cuckoo_table_options_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_ingestexternalfileoptions_t,
src/db_options.rs:        unsafe { ffi::rocksdb_block_based_options_set_data_block_index_type(self.inner, index_t) }
src/db_options.rs:            .map(|path| path.inner as *const ffi::rocksdb_dbpath_t)
src/db_options.rs:            ffi::rocksdb_readoptions_set_readahead_size(self.inner, v as size_t);
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_fifo_compaction_options_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_universal_compaction_options_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_compactoptions_t,
src/db_options.rs:    pub(crate) inner: *mut ffi::rocksdb_dbpath_t,
src/db_pinnable_slice.rs:    ptr: *mut ffi::rocksdb_pinnableslice_t,
src/db_pinnable_slice.rs:    pub(crate) unsafe fn from_c(ptr: *mut ffi::rocksdb_pinnableslice_t) -> DBPinnableSlice<'a> {
src/perf.rs:    pub(crate) inner: *mut ffi::rocksdb_perfcontext_t,
src/perf.rs:    inner: *mut ffi::rocksdb_memory_usage_t,
src/perf.rs:    inner: *mut ffi::rocksdb_memory_consumers_t,
src/slice_transform.rs:    pub inner: *mut ffi::rocksdb_slicetransform_t,
src/snapshot.rs:    pub(crate) inner: *const ffi::rocksdb_snapshot_t,
src/sst_file_writer.rs:    pub(crate) inner: *mut ffi::rocksdb_sstfilewriter_t,
src/sst_file_writer.rs:    inner: *mut ffi::rocksdb_envoptions_t,
src/sst_file_writer.rs:    fn create_raw(opts: &Options, env_opts: &EnvOptions) -> *mut ffi::rocksdb_sstfilewriter_t {
src/write_batch.rs:    pub(crate) inner: *mut ffi::rocksdb_writebatch_t,

}

// if we're not leaking, the dir bytes should be well under 10M bytes in total
let dir_bytes = fs_extra::dir::get_size(&n).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get rid of extra dependency to decrease compile time. I found this function on stackoverflow:

fn dir_size(path: impl Into<PathBuf>) -> io::Result<u64> {
    fn dir_size(mut dir: fs::ReadDir) -> io::Result<u64> {
        dir.try_fold(0, |acc, file| {
            let file = file?;
            let size = match file.metadata()? {
                data if data.is_dir() => dir_size(fs::read_dir(file.path())?)?,
                data => data.len(),
            };
            Ok(acc + size)
        })
    }

    dir_size(fs::read_dir(path.into())?)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, done: 201d718

@@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
use rocksdb::{Options, DB};

/// Temporary database path which calls DB::Destroy when DBPath is dropped.
#[derive(Debug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. remnant from my dirty dbg!()-ing... 201d718

@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 9, 2021

@aleksuss thanks for review! I've cleaned this pr up accordingly.

src/db.rs Outdated
use std::sync::RwLock;
use std::time::Duration;

// Marker trait to specify single or multi threaded column family alternations for DB
// Also, this is minimum common API sharable between SingleThreaded and
// MultiThreaded. Others differ in self mutability and return type.
pub trait ThreadMode {
fn new(cf_map: BTreeMap<String, ColumnFamily>) -> Self;
fn new(cf_map: BTreeMap<String, *mut ffi::rocksdb_column_family_handle_t>) -> Self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really good idea to use ffi types in public api ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly, it's not good idea... but we have to do that... kind of. I added some reasoning here: 0d5ff02

I googled a bit but found that it's quite bothersome to hide actual trait methods from public traits...

In short, these methods aren't intended to be consumed, nevertheless both pre- and post- this pr.

Copy link
Member

@aleksuss aleksuss Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the trait ThreadMode be public ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. I've seen a description a bit later.

impl<'a> AsColumnFamilyRef for &'a ColumnFamily {
fn inner(&self) -> *mut ffi::rocksdb_column_family_handle_t {
self.inner
}
}

impl<'a> AsColumnFamilyRef for BoundColumnFamily<'a> {
impl<'a> AsColumnFamilyRef for Arc<BoundColumnFamily<'a>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impl<'a> AsColumnFamilyRef for Arc<BoundColumnFamily<'a>> {
impl<'a> AsColumnFamilyRef for ColumnFamilyRef<'a> {

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super good eyes you have to spot these unspoken subtleties... :)

I added some missing docs: eb509fb

src/db.rs Outdated
}
}

impl ThreadMode for MultiThreaded {
fn new(cfs: BTreeMap<String, ColumnFamily>) -> Self {
fn new(cfs: BTreeMap<String, *mut ffi::rocksdb_column_family_handle_t>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we create ffi::rocksdb_column_family_handle_t inside the method new and then a signature of the method could be changed to fn new(cfs: BTreeSet<String>) -> Self ? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, yes. but I double its merit...

As I went long explaining here (0d5ff02), this trait method is called deep inside DB construction at (open_cf_descriptors_internal). So, we have to move large part of code inside ThreadMode.. Then ThreadMode will be significantly fatter than now, although it's not intended to be used for users still as was before. Then this only gives us the ability of getting rid of those ffi:: types in strictly public APIs.

@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 13, 2021

@aleksuss Sorry for delay! I'm feeling this pr is getting polished up from your reviews. Thanks for continued review. :)

@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 15, 2021

@aleksuss aleksuss requested review from @stanislav-tkach and @aleksuss 10 days ago

🐶 (Hachikō) <(I'm waiting for further review comments if any from you)

Copy link
Member

@aleksuss aleksuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for long intervals between reviews. Too busy lately.

src/db.rs Show resolved Hide resolved
src/db.rs Show resolved Hide resolved
src/db.rs Show resolved Hide resolved
src/db.rs Show resolved Hide resolved
@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 21, 2021

Sorry for long intervals between reviews. Too busy lately.

@aleksuss hi! I've replied to your comments. please have a look when convenient. thanks!

@aleksuss aleksuss merged commit abf121f into rust-rocksdb:master Jul 22, 2021
@ryoqun
Copy link
Contributor Author

ryoqun commented Jul 22, 2021

@aleksuss really thanks very much for merging and prepareing 0.17.0. I noticed it already #539 . :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants