Understanding deserialization performance of SmallVec vs Vec #106

xrl · 2018-07-25T00:39:55Z

I was profiling my serde deserialization library and I went down the rabbit hole of benchmarking different KV container storage types. Got down to Vec<(&'a[u8], &'a[u8]> as the fastest and I wanted to see if my code would be faster by putting those string pointers on the stack. So I tried out SmallVec, was actually a little bit slower (20%) than vec. Should this be expected? I did not consider memory gains but the deserialization change doesn't make sense to me.

I extract my benchmark to run inside of rust-smallvec:


#[macro_use]
extern crate smallvec;
extern crate test;
extern crate bincode;
extern crate serde;
#[macro_use] extern crate serde_derive;


use self::test::Bencher;
use smallvec::{ExtendFromSlice, SmallVec};

#[bench]
fn smallvec_i32_benchmark(b: &mut Bencher) {

    use self::bincode::{serialize, deserialize};
    let data = {
        let tinyvec : SmallVec<[i32; 5]> = smallvec![1,2,3,4,5];
        let sv = tinyvec;
        serialize(&sv).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : SmallVec<[i32; 5]> = deserialize(&data[..]).unwrap();
    });
}

#[bench]
fn vec_i32_benchmark(b: &mut Bencher) {
    use self::bincode::{serialize, deserialize};
    let data = {
        let tinyvec : Vec<i32> = vec![1,2,3,4,5];
        let sv = tinyvec;
        serialize(&sv).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : Vec<i32> = deserialize(&data[..]).unwrap();
    });
}

#[bench]
fn smallvec_tuple_benchmark(b: &mut Bencher) {

    use self::bincode::{serialize, deserialize};
    let data = {
        let k1 = "hey";
        let v1 = "now";
        let k2 = "you're";
        let v2 = "an";
        let k3 = "all";
        let v3 = "star";
        let k4 = "get";
        let v4 = "your";
        let k5 = "game";
        let v5 = "one";
        let tinyvec : SmallVec<[(&str, &str); 5]> = smallvec![(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5)];
        let sv = tinyvec;
        serialize(&sv).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : SmallVec<[(&str,&str); 5]> = deserialize(&data[..]).unwrap();
    });
}

#[bench]
fn vec_tuple_benchmark(b: &mut Bencher) {
    use smallvec::{ExtendFromSlice, SmallVec};

    use self::bincode::{serialize, deserialize};
    let data = {
        let k1 = "hey";
        let v1 = "now";
        let k2 = "you're";
        let v2 = "an";
        let k3 = "all";
        let v3 = "star";
        let k4 = "get";
        let v4 = "your";
        let k5 = "game";
        let v5 = "one";
        let regvec : Vec<(&str, &str)> = vec![(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5)];
        let v = regvec;
        serialize(&v).unwrap()
    };

    b.iter(|| {
        let tinyvec_2 : Vec<(&str,&str)> = deserialize(&data[..]).unwrap();
    });
}

on my laptop this spits out these numbers:

running 4 tests
test smallvec_i32_benchmark   ... bench:          84 ns/iter (+/- 10)
test smallvec_tuple_benchmark ... bench:         222 ns/iter (+/- 23)
test vec_i32_benchmark        ... bench:          45 ns/iter (+/- 4)
test vec_tuple_benchmark      ... bench:         186 ns/iter (+/- 20)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out

The text was updated successfully, but these errors were encountered:

mbrubeck · 2018-08-06T18:00:44Z

Thanks for the benchmark! May I add it to the smallvec repo?

SmallVec's deserialize implementation is almost identical to Vec's: It creates an empty vector with capacity based on the sequence's size_hint (or zero if the hint is not available), then pushes to it in a loop until the sequence is exhausted. The SmallVec one is presumably slower because its push has extra branches that aren't optimized out.

In the case where size_hint is available, we can instead use unsafe code that writes to a raw pointer in a loop (similar to SmallVec::extend) to eliminate most of the performance difference.

xrl · 2018-08-07T00:54:54Z

Thanks for the detailed description, that is very interesting. And yes, please, feel free to add this benchmark to the repo.

mbrubeck · 2018-08-07T18:24:55Z

My unsafe code is here:

master...mbrubeck:serde

Unfortunately this is no longer as fast as Vec after I fixed a bug where it could panic on malformed input. It's only a 5–10% improvement (still about 25–50% slower than Vec), and I'm not sure that's worth adding more unsafe code.

xrl · 2018-08-07T18:51:59Z

should we just chalk this up to "rustc/llvm need to work a little harder"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding deserialization performance of SmallVec vs Vec #106

Understanding deserialization performance of SmallVec vs Vec #106

xrl commented Jul 25, 2018

mbrubeck commented Aug 6, 2018

xrl commented Aug 7, 2018

mbrubeck commented Aug 7, 2018

xrl commented Aug 7, 2018

Understanding deserialization performance of SmallVec vs Vec #106

Understanding deserialization performance of SmallVec vs Vec #106

Comments

xrl commented Jul 25, 2018

mbrubeck commented Aug 6, 2018

xrl commented Aug 7, 2018

mbrubeck commented Aug 7, 2018

xrl commented Aug 7, 2018