Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider making Value optional #109

Open
ordian opened this issue Jan 24, 2019 · 12 comments
Open

consider making Value optional #109

ordian opened this issue Jan 24, 2019 · 12 comments
Labels
enhancement New feature or request

Comments

@ordian
Copy link
Contributor

ordian commented Jan 24, 2019

Currently Rkv provides 4 types of Stores: Single, Multi, Integer and MultiInteger. All of them require the value to be of type Value, which imposes certain overhead, since the values must be encoded and decoded (and copied). This can be undesirable if the user only uses Blob type for values.

In general, it feels like this type (Value) and its encoding logic (or compression) should be specific for each user and I don't quite understand why it is here. I could use lmdb-rkv, but it would be nice not to deal with UNC paths on Windows and restrictions like one environment per process per path. Consider adding methods do deal with &[u8] values instead of Value.

@ordian
Copy link
Contributor Author

ordian commented Jan 24, 2019

cc #4

Avoiding copies where necessary. (Rather than decoding the value, return &[u8].)

@mykmelez
Copy link
Contributor

The Value type is designed to prevent an accidental "cast" from one type to another when storing and then retrieving a value.

I understand how Value might feel extraneous for a use case that only ever deals with Blob values. And I'm open to ideas for mitigating the overhead. However, I'm also wary of adding more complexity to the interface, and a parallel set of methods that take &[u8] values seems like it would double the cognitive overhead of the *Store types.

@ordian
Copy link
Contributor Author

ordian commented Feb 4, 2019

@mykmelez why not just this (a breaking change)?

pub fn put<K: AsRef<[u8]>, V: AsRef<[u8]>>(self, txn: &mut RwTransaction, k: K, v: V) -> Result<(), StoreError>;
pub fn get<T: Transaction, K: AsRef<[u8]>>(self, txn: &T, k: K) -> Result<Option<&[u8]>, StoreError> 

@ordian
Copy link
Contributor Author

ordian commented Feb 4, 2019

or if you want to keep the Value type

pub fn put<'a, K: AsRef<[u8]>, V: Cow<'a, [u8]>>(self, txn: &mut RwTransaction, k: K, v: V) -> Result<(), StoreError>;

EDIT: V: AsRef<[u8]> should work for Value as well.

And for a value the calls would be

// serialization to vec in `to_bytes` can't fail, right?
store.put(&mut writer, key, value.to_bytes()?);
store.get(&reader, key).and_then(Value::parse);

// Expose a helper function to transform the result.
impl Value {
    /// Tries to parse the value from bytes.
    pub fn parse(value: Option<&[u8]>) -> Result<Option<Value>, StoreError> {
        Ok(match value {
            Some(bytes) => Value::from_tagged_slice(bytes).map(Some).map_err(StoreError::DataError)?,
            None => None,
        })
    }
}

and for bytes it would simply be

store.put(&mut writer, key, bytes);
store.get(&reader, key);

@mykmelez
Copy link
Contributor

mykmelez commented Feb 4, 2019

store.put(&mut writer, key, value.to_bytes()?);
store.get(&reader, key).and_then(Value::parse);

This feels harder than it should be to read and write typed values.

I wonder if it would be sufficient to expose a new Store type (RawStore?) that takes and returns only &[u8] values. I'm unsure how to generalize that across the types of keys (AsRef<[u8]>, PrimitiveInt) and single vs. multi-stores, however. I don't want to end up with four new store types!

@devinrsmith
Copy link

First of all - rkv looks like a great project. I like the level of documentation.

I was going to open an issue that was almost exactly the same as this one. For my use case, I'd always be using Blobs - and the necessity of always wrapping and unwrapping against that type is a bit against the rust mantra of "zero cost abstractions".

My guess is that most use cases will always choose a single value type, and in the current implementation would always end up wrapping and unwrapping against that. I think it would be good to abstract a little bit, and pull the value type up to the db layer - allow the user to choose any generic value type they want (even Value), as long as it implemented the required traits.

@devinrsmith
Copy link

I did a quick survey of some other rust LMDB libraries -

lmdb - uses trait AsRef<[u8]> for values
lmdb-zero:

pub trait AsLmdbBytes {
    fn as_lmdb_bytes(&self) -> &[u8];
}

lmdb-rs-m:

pub trait ToMdbValue {
    fn to_mdb_value<'a>(&'a self) -> MdbValue<'a>;
}

@ordian
Copy link
Contributor Author

ordian commented Feb 17, 2019

I wonder if it would be sufficient to expose a new Store type (RawStore?) that takes and returns only &[u8] values. I'm unsure how to generalize that across the types of keys (AsRef<[u8]>, PrimitiveInt) and single vs. multi-stores, however. I don't want to end up with four new store types!

@mykmelez What do you think of making store types generic over a value type?
Something like this:

pub struct SingleStore<V: Value> {
    /// ...,
    phantom: std::marker::PhantomData<V>,
}

impl<V: Value> for SingleStore<V> {
    pub fn get<T: Readable, K: AsRef<[u8]>>(
        self,
        reader: &T,
        k: K
    ) -> Result<Option<V>, StoreError> {...}
    pub fn put<K: AsRef<[u8]>>(
        self,
        writer: &mut Writer,
        k: K,
        v: V
    ) -> Result<(), StoreError> {...}
}


// FIXME: inherit from std::convert::TryFrom<Error=Error, [u8]> when it's stabilized
extern crate try_from;
use try_from::TryFrom;

pub trait Value<'a>: TryFrom<&'a [u8], Err = Error> + AsRef<[u8]> {}   

impl Value for rkv::value::Value {
    /// ...
}

Note, that trait Value inherits AsRef<[u8]>, I think value.to_bytes() should always succeed (for the struct Value, it allocates a vector a writes bytes to it, which should never fail).

@mykmelez
Copy link
Contributor

pub struct SingleStore<V: Value> {
    /// ...,
    phantom: std::marker::PhantomData<V>,
}

This would require structs that own a SingleStore to specify the lifetime of its Value type:

pub struct OwnsSingleStore<'s> {
    store: SingleStore<Value<'s>>,
}

Which limits usage in Firefox, where rkv is being exposed to JavaScript code via an FFI that cannot wrap generic types.

I wonder if it would be possible to make store types enums, with a variant that accepts the value::Value type and another that generalizes across types implementing the Value trait (here called ValueTrait to distinguish it from the value::Value type and the SingleStore::Value variant ):

#[derive(Copy, Clone)]
pub enum SingleStore<V: ValueTrait> {
    Value { db: Database, },
    ValueTrait {
        db: Database,
        phantom: PhantomData<V>,
    },
}

Then, assuming one can implement ValueTrait for some type that doesn't require specifying its lifetime, perhaps even the unit type, it should be possible for a struct to own a SingleStore<()> of the Value variant.

That would still be ergonomically suboptimal, as consumers would have to specify the type of ValueTrait even if they use the SingleStore::Value variant, which ignores it. And I suppose we could simplify via a newtype like pub struct ValueSimpleStore(SingleStore<()>). But if each store type has such a newtype, then we're back to doubling the number of store types, and we might as well implement a separate store type that generalizes over ValueTrait.

I wonder if the approach taken by the lmdb crate would be preferable: provide access to the wrapped type, so consumers who aren't satisfied by the higher-level type can access the lower-level type directly. For the lmdb crate, that means you can get an ffi::MDB_dbi from an lmdb::Database. For rkv, it would mean you can get an lmdb::Database from an rkv::SingleStore.

So you could create a SingleStore, taking advantage of rkv's higher-level affordances, and then retrieve its Database to use &[u8] values directly.

@ordian
Copy link
Contributor Author

ordian commented Feb 22, 2019

I wonder if it would be possible to make store types enums, with a variant that accepts the value::Value type and another that generalizes across types implementing the Value trait

I'm not sure it's possible to specify ValueTrait w/o a lifetime specifier. For &'a [u8] values, 'a should be bound by the reader lifetime. Alternatively, one could make get and put methods generic over Value (instead of store types), but that would require a user to specify a value type for each method call, which is not nice.

For rkv, it would mean you can get an lmdb::Database from an rkv::SingleStore.
So you could create a SingleStore, taking advantage of rkv's higher-level affordances, and then retrieve its Database to use &[u8] values directly.

This seems like a good approach to me.

@rrichardson
Copy link
Contributor

Is there anything stopping us from simply adding a put_raw and get_raw (or put_bytes get_bytes whatever) to each of the Store APIs?

Failing that, I like the RawStore or ByteStore idea. I mean, Rkv is about type-safe serialized object storage. It makes sense to support optimizations for a &[u8] key/value, but I don't think that the data model should change for it.

@rnewman
Copy link
Contributor

rnewman commented Jan 7, 2020

One could conceivably implement SingleStore in terms of a RawStore, which adds to its appeal.

I intended rkv to expose most of LMDB's capabilities with way, way fewer sharp edges, and introducing a rudimentary type system of tagged values was part of that. But I had also intended to facilitate storing an arbitrary set of bytes.

I haven't thought it through in detail, but conceivably the Blob bincode calls to deserialize and serialize can be omitted, which should preserve the benefits of tagging with very little overhead — after all, they're just serializing/deserializing [u8] to [u8].

@rnewman rnewman added the enhancement New feature or request label Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants