Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding deserialization performance of SmallVec vs Vec #106

Open
xrl opened this issue Jul 25, 2018 · 4 comments
Open

Understanding deserialization performance of SmallVec vs Vec #106

xrl opened this issue Jul 25, 2018 · 4 comments

Comments

@xrl
Copy link

xrl commented Jul 25, 2018

I was profiling my serde deserialization library and I went down the rabbit hole of benchmarking different KV container storage types. Got down to Vec<(&'a[u8], &'a[u8]> as the fastest and I wanted to see if my code would be faster by putting those string pointers on the stack. So I tried out SmallVec, was actually a little bit slower (20%) than vec. Should this be expected? I did not consider memory gains but the deserialization change doesn't make sense to me.

I extract my benchmark to run inside of rust-smallvec:


#[macro_use]
extern crate smallvec;
extern crate test;
extern crate bincode;
extern crate serde;
#[macro_use] extern crate serde_derive;


use self::test::Bencher;
use smallvec::{ExtendFromSlice, SmallVec};

#[bench]
fn smallvec_i32_benchmark(b: &mut Bencher) {

    use self::bincode::{serialize, deserialize};
    let data = {
        let tinyvec : SmallVec<[i32; 5]> = smallvec![1,2,3,4,5];
        let sv = tinyvec;
        serialize(&sv).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : SmallVec<[i32; 5]> = deserialize(&data[..]).unwrap();
    });
}

#[bench]
fn vec_i32_benchmark(b: &mut Bencher) {
    use self::bincode::{serialize, deserialize};
    let data = {
        let tinyvec : Vec<i32> = vec![1,2,3,4,5];
        let sv = tinyvec;
        serialize(&sv).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : Vec<i32> = deserialize(&data[..]).unwrap();
    });
}

#[bench]
fn smallvec_tuple_benchmark(b: &mut Bencher) {

    use self::bincode::{serialize, deserialize};
    let data = {
        let k1 = "hey";
        let v1 = "now";
        let k2 = "you're";
        let v2 = "an";
        let k3 = "all";
        let v3 = "star";
        let k4 = "get";
        let v4 = "your";
        let k5 = "game";
        let v5 = "one";
        let tinyvec : SmallVec<[(&str, &str); 5]> = smallvec![(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5)];
        let sv = tinyvec;
        serialize(&sv).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : SmallVec<[(&str,&str); 5]> = deserialize(&data[..]).unwrap();
    });
}

#[bench]
fn vec_tuple_benchmark(b: &mut Bencher) {
    use smallvec::{ExtendFromSlice, SmallVec};

    use self::bincode::{serialize, deserialize};
    let data = {
        let k1 = "hey";
        let v1 = "now";
        let k2 = "you're";
        let v2 = "an";
        let k3 = "all";
        let v3 = "star";
        let k4 = "get";
        let v4 = "your";
        let k5 = "game";
        let v5 = "one";
        let regvec : Vec<(&str, &str)> = vec![(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5)];
        let v = regvec;
        serialize(&v).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : Vec<(&str,&str)> = deserialize(&data[..]).unwrap();
    });
}

on my laptop this spits out these numbers:

running 4 tests
test smallvec_i32_benchmark   ... bench:          84 ns/iter (+/- 10)
test smallvec_tuple_benchmark ... bench:         222 ns/iter (+/- 23)
test vec_i32_benchmark        ... bench:          45 ns/iter (+/- 4)
test vec_tuple_benchmark      ... bench:         186 ns/iter (+/- 20)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out
@mbrubeck
Copy link
Collaborator

mbrubeck commented Aug 6, 2018

Thanks for the benchmark! May I add it to the smallvec repo?

SmallVec's deserialize implementation is almost identical to Vec's: It creates an empty vector with capacity based on the sequence's size_hint (or zero if the hint is not available), then pushes to it in a loop until the sequence is exhausted. The SmallVec one is presumably slower because its push has extra branches that aren't optimized out.

In the case where size_hint is available, we can instead use unsafe code that writes to a raw pointer in a loop (similar to SmallVec::extend) to eliminate most of the performance difference.

@xrl
Copy link
Author

xrl commented Aug 7, 2018

Thanks for the detailed description, that is very interesting. And yes, please, feel free to add this benchmark to the repo.

@mbrubeck
Copy link
Collaborator

mbrubeck commented Aug 7, 2018

My unsafe code is here:

master...mbrubeck:serde

Unfortunately this is no longer as fast as Vec after I fixed a bug where it could panic on malformed input. It's only a 5–10% improvement (still about 25–50% slower than Vec), and I'm not sure that's worth adding more unsafe code.

@xrl
Copy link
Author

xrl commented Aug 7, 2018

should we just chalk this up to "rustc/llvm need to work a little harder"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants