Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove need for &mut self in create_cf and drop_cf #496

Closed
wants to merge 13 commits into from

Conversation

carllin
Copy link

@carllin carllin commented Feb 12, 2021

Problem: Creating and Dropping column families currently requires a mutable reference. In a system where we have shared access to the database object across multiple threads, this means we need exclusive access to the DB

Proposal: Wrap the cfs column in an RwLock, so an exclusive lock is not needed on the entire DB object

@aleksuss
Copy link
Member

#298

@carllin
Copy link
Author

carllin commented Feb 12, 2021

@aleksuss thanks for the comment!

From my understanding of reading: #279 (comment), the reason this previous change was reverted was because The cf_handle function is often called very frequently and it's implementation requires acquiring a read-lock to an underlying map., and thus to avoid this, a decision was made to make create_cf and drop_cf and any other function that mutates the internal map require a mutable reference to DB and make the caller responsible for acquiring any locks in order to mutate the DB.

However this seems to imply the caller will have to eat the cost of at least one call to some locking mechanism on the DB in order to get the mutable reference anyways.

If instead we wrapped the cfs in an RwLock as we have done here, the caller will still eat just one call to the lock, then the caller can cache the handles after they've fetched them for further use. This way it seems:

  1. Every subsequent operation that needs a column family can use the cached handle, addressing the concern outlined in the issue above.
  2. Modifying the column families doesn't require blocking all other database operations.

@carllin
Copy link
Author

carllin commented Feb 20, 2021

@aleksuss pinging for update, thanks

@ryoqun
Copy link
Contributor

ryoqun commented Feb 25, 2021

@aleksuss @stanislav-tkach Hi, I'm colleague with @carllin. You reacted very promptly to my pr in the past (wow, already ~1.5 years ago): #339 We greatly appreciated that. :)

As a quick context, this pr is blocking us like the previous pr, so we'd like this to move forward in some way. The problem we're experiencing is spontaneous but whole processing stalls of our product due to the underlying shared rocksdb (ref: solana-labs/solana#14586), happening on production systems.. Hope you can understand our sentiment of this bug... ;) To implement the possible solution to the stall, we want to modify column families via a background thread.

Ideally, we don't want to take a unfortunate fork/[patch] path. Thanks for your considerations (and working on this great project). :)

@aleksuss
Copy link
Member

To be honest, I'm not ready to merge this PR, at least right now. I added changes like this in PR: #197 and then I reverted it as you can see. I think if you need shared mutable access to a db you should provide synchronization by yourself. E.g. I use rocksdb in my project with shared access and I use SharedLock for sync access. And one more thing. Why should I pay for sync if I use rocksdb in a single thread?

@ryoqun
Copy link
Contributor

ryoqun commented Feb 25, 2021

@aleksuss Thanks for quick reply and valuable advise! At least, we can progress in some way. :) We'll consider whether the approach can work for our case or not. :)

@carllin
Copy link
Author

carllin commented Feb 26, 2021

@aleksuss @stanislav-tkach Thanks for the careful consideration!

Unfortunately for our use case, I think wrapping the entire database object in a lock is sub-optimal, because:

  1. Most of the time, we don't need the write lock, and we have heavy write operations, so grabbing the read lock over and over seems like needless overhead
  2. Starvation/contention concerns. If we have heavy insertion cycles, repeatedly grabbing reads lock on the entire DB for each of those insertions might block the write access when we want to ocassionally drop the cf's. However, the underlying rocks implementation has nice read/write access that is non-blocking and keeps the old cf alive until all the references are dropped.

I noted your concern about single-threaded lock access perf concerns, and have proposed a change in my latest commit that I think will fit both single and multi-threaded scenarios.

The proposed change here would introduce a configurable ColumnFamilyHandles enum to the Db for both multi-threaded and single-threaded scenarios (Note I have yet to plumb this through the constructors). The benefits are:

  1. The existing create_cf and drop_cf API's are unchanged, so downstream users will not be affected
  2. This single threaded minimizes uncalled perf overhead to just 1 branch-prediction friendly switch (enum variant check).
  3. Mutli-threaded use case can optionally avoid mut access by the additional create_cf_multithreaded(), drop_cf_multithreaded() functions

Would really appreciate your thoughts, and thanks for taking the time!

Copy link
Member

@aleksuss aleksuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin The proposed approach in commit c51484c looks much much better. Thanks. I really like it and I think I will use it in my project 👍🏻
But couple nitpiks:

  1. Could you please add comment-docs to public methods?
  2. I think if you add tests to cover multithreaded case with cfs then clippy will be happy 😄

@@ -43,6 +43,7 @@ impl ColumnFamilyDescriptor {

/// An opaque type used to represent a column family. Returned from some functions, and used
/// in others
#[derive(Clone)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really think it is safe ? 🤔

Copy link
Author

@carllin carllin Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, here's my analysis:

We are cloning this object

pub struct ColumnFamily {
    pub(crate) inner: *mut ffi::rocksdb_column_family_handle_t,
}

where inner is this pointer https://github.com/facebook/rocksdb/blob/master/db/c.cc#L973-L974 rocksdb_column_family_handle_t* handle = new rocksdb_column_family_handle_t;, which aliases an object defined here:

class ColumnFamilyHandle {
 public:
  virtual ~ColumnFamilyHandle() {}
  // Returns the name of the column family associated with the current handle.
  virtual const std::string& GetName() const = 0;
  // Returns the ID of the column family associated with the current handle.
  virtual uint32_t GetID() const = 0;
  // Fills "*desc" with the up-to-date descriptor of the column family
  // associated with this handle. Since it fills "*desc" with the up-to-date
  // information, this call might internally lock and release DB mutex to
  // access the up-to-date CF options.  In addition, all the pointer-typed
  // options cannot be referenced any longer than the original options exist.
  //
  // Note that this function is not supported in RocksDBLite.
  virtual Status GetDescriptor(ColumnFamilyDescriptor* desc) = 0;
  // Returns the comparator of the column family associated with the
  // current handle.
  virtual const Comparator* GetComparator() const = 0;
};

where the destructor ~ColumnFamilyHandle() is what decrements the refcount to the column family and performs cleanup of old files: https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L68-L84

The destructor however is invoked through ffi::rocksdb_column_family_handle_destroy https://github.com/facebook/rocksdb/blob/master/db/c.cc#L997-L1000, which is not what's called in the wrapper's drop_cf: https://github.com/rust-rocksdb/rust-rocksdb/blob/master/src/db.rs#L687, which is only called when the DB object is dropped:

fn drop(&mut self) {
        unsafe {
            self.cfs.get(|cf_map| {
                for cf in cf_map.values() {
                    ffi::rocksdb_column_family_handle_destroy(cf.inner);
                }
                ffi::rocksdb_close(self.inner);
            })
        }
    }

This means cloning the handle here should be a harmless operation that does not incur any cleanup or additional data races.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. But my concern here is that the Clone will allow you to write something like this:

#[test]
fn test_unsafe_clone_column_family() {
    let danger_cf;
    let db_path = DBPath::new("_rust_rocksdb_create_clone_column_family1");
    {
        let mut opts = Options::default();
        opts.create_if_missing(true);
        let mut db = DB::open(&opts, &db_path).expect("db open");
        db.create_cf("cf1", &opts).unwrap();
        let cf = db.cf_handle("cf1").unwrap();
        danger_cf = cf.clone();
        db.put_cf(&cf, b"1", b"2").unwrap();
    }
    let db_path = DBPath::new("_rust_rocksdb_create_clone_column_family2");
    {
        let mut opts = Options::default();
        opts.create_if_missing(true);
        let mut db = DB::open(&opts, &db_path).expect("db open");
        db.create_cf("cf1", &opts).unwrap();
        db.put_cf(&danger_cf, b"1", b"2").unwrap(); // BOOOOM !!! (signal 11, invalid memory reference)
    }
}

It's a common case in C++, but Rust is positioned as a safe language and to allow such cases it's a bad practice. IMHO.
cc @stanislav-tkach

Copy link
Author

@carllin carllin Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aleksuss Taking your concern into account, in my latest commit: 2cf5f6d I've removed the Clone on the object and instead implemented a separate cf_handle_multi_threaded() path that returns an Arc<ColumnFamily>. This means only users of the multithreaded config will risk running into the inconsistency you described above, while by default, existing users will be unaffected.

src/db.rs Outdated Show resolved Hide resolved
tests/test_property.rs Outdated Show resolved Hide resolved
@carllin
Copy link
Author

carllin commented Feb 26, 2021

@aleksuss sorry for the dirty PR, it was a proof-of-concept and not entirely cleaned up as you have duly noted 😄

I wanted to get sign-off on this direction before investing more in polishing up the code/tests. Now that I have it, I will get this cleaned up to be more robust. Thanks for reviewing!

@carllin carllin force-pushed the DropCf branch 2 times, most recently from 130c61d to 8694813 Compare March 2, 2021 06:13
@carllin
Copy link
Author

carllin commented Mar 2, 2021

@aleksuss I have added a test_column_family_multi_threaded(), added the comment-docs, and addressed remaining code concerns, feel free to review at your convenience, thanks!

src/db.rs Outdated Show resolved Hide resolved
src/db.rs Outdated Show resolved Hide resolved
@@ -43,6 +43,7 @@ impl ColumnFamilyDescriptor {

/// An opaque type used to represent a column family. Returned from some functions, and used
/// in others
#[derive(Clone)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. But my concern here is that the Clone will allow you to write something like this:

#[test]
fn test_unsafe_clone_column_family() {
    let danger_cf;
    let db_path = DBPath::new("_rust_rocksdb_create_clone_column_family1");
    {
        let mut opts = Options::default();
        opts.create_if_missing(true);
        let mut db = DB::open(&opts, &db_path).expect("db open");
        db.create_cf("cf1", &opts).unwrap();
        let cf = db.cf_handle("cf1").unwrap();
        danger_cf = cf.clone();
        db.put_cf(&cf, b"1", b"2").unwrap();
    }
    let db_path = DBPath::new("_rust_rocksdb_create_clone_column_family2");
    {
        let mut opts = Options::default();
        opts.create_if_missing(true);
        let mut db = DB::open(&opts, &db_path).expect("db open");
        db.create_cf("cf1", &opts).unwrap();
        db.put_cf(&danger_cf, b"1", b"2").unwrap(); // BOOOOM !!! (signal 11, invalid memory reference)
    }
}

It's a common case in C++, but Rust is positioned as a safe language and to allow such cases it's a bad practice. IMHO.
cc @stanislav-tkach

src/db.rs Outdated Show resolved Hide resolved
tests/test_column_family.rs Outdated Show resolved Hide resolved
src/db.rs Outdated Show resolved Hide resolved
src/db.rs Outdated Show resolved Hide resolved
src/db.rs Outdated Show resolved Hide resolved
src/db.rs Outdated Show resolved Hide resolved
let inner = self.create_inner_cf_handle(name.as_ref(), opts)?;
match &mut self.cfs {
ColumnFamilyHandles::MultiThread(_) => {
panic!("Not available on MultiThread implementation")
Copy link
Member

@aleksuss aleksuss Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my pickiness, but I have a feeling that it's not good to panic here. What about to bound an invoke of this with trait ??? I'll try to provide the idea with code soon.

Copy link
Contributor

@ryoqun ryoqun Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aleksuss Hi, again. I want this to move forward :) (cc: @carllin)

What about to bound an invoke of this with trait ??? I'll try to provide the idea with code soon.

Maybe, were you thinking about like this?: ryoqun@1be386d

Sorry for very rough code. But this eliminates bunch of various concerns arisen from this pr's review:

  • ColumnFamily is bounded in the case of MultiThread with zero runtime overhead (I even stopped using Arc)
  • ... and no derive(Clone) on ColumnFamily
  • will have no panic!s (if everything is correctly rewritten)
  • keeps api compatibility and no multithread methods.
  • no enum runtime variant branching (although this is cheap)
  • downside: a bit of internal code churn (I'll promise I'll work hard to fix them all)

If this direction is really to go from maintainer's perspective, I'll finish this very quickly. :)

Copy link
Member

@aleksuss aleksuss Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, were you thinking about like this?: ryoqun/rust-rocksdb@1be386d

Yes. That is exactly what I meant 👍🏻
Unfortunately, I have no time now to provide my vision on it. But your approach is very very near to my vision. Let's try to implement it 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aleksuss as promised, I created the cleaned-up pr: #506 Please have a look at your convenient time. :)

@aleksuss
Copy link
Member

aleksuss commented Apr 6, 2021

@carllin @ryoqun could I close this PR in favour of #506 ?

@ryoqun
Copy link
Contributor

ryoqun commented Apr 6, 2021

@carllin @ryoqun could I close this PR in favour of #506 ?

Sure thing! let's close this. :)

@aleksuss aleksuss closed this Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants