Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io::Write via buf::Writer on BytesMut significantly slower than on Vec<u8> #531

Open
pablosichert opened this issue Feb 11, 2022 · 3 comments

Comments

@pablosichert
Copy link

pablosichert commented Feb 11, 2022

Heyo there,

when writing serde_json::to_writer on BytesMut vs. writing serde_json::to_vec I saw a slowdown of roughly 40% depending on input size – while I expected them to perform equally fast.

The following benchmarks

#[bench]
fn bytes_mut_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = BytesMut::with_capacity(128);
    let bytes = b"foo bar baz quux lorem ipsum dolor et";

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        (&mut buffer).writer().write(bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

#[bench]
fn vec_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = Vec::with_capacity(128);
    let bytes = b"foo bar baz quux lorem ipsum dolor et";

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        buffer.write(bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

yielded

bytes $ cargo +nightly bench --bench bytes_mut vec_io_write
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test vec_io_write             ... bench:           2 ns/iter (+/- 0) = 18500 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 19 filtered out; finished in 5.73s
bytes $ cargo +nightly bench --bench bytes_mut bytes_mut_io_write
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test bytes_mut_io_write       ... bench:           7 ns/iter (+/- 0) = 5285 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 19 filtered out; finished in 0.17s

Some more integrated benchmarks:

pub fn to_bytes<T>(value: &T) -> serde_json::Result<BytesMut>
where
    T: ?Sized + Serialize,
{
    let mut bytes = BytesMut::with_capacity(128);
    serde_json::to_writer((&mut bytes).writer(), value)?;
    Ok(bytes)
}
to_bytes(&log)

279.92 ns


pub fn to_bytes<T>(value: &T) -> serde_json::Result<BytesMut>
where
    T: ?Sized + Serialize,
{
    let bytes = serde_json::to_vec(value)?;
    let bytes = BytesMut::from(bytes.as_slice());
    Ok(bytes)
}
to_bytes(&log)

226.44 ns


serde_json::to_vec(&log)

195.28 ns

Give that even writing to Vec<u8> and copying the result over to a new BytesMut was faster than taking the Writer, I think this could use some attention.

I'm wondering if this is a conceputal difference in BytesMut or something that could be improved via #425/#478?

@Stargateur
Copy link

Stargateur commented Feb 12, 2022

I could be wrong, but at 2 versus 7 ns by iter you are not testing anything, it's just too close and < 10 ns is very very very small.

Your two snipped to_bytes use very different feature, I expect using vec as writer with serde would have similar result. serde_json must be faster working with a vec instead of a writer.

@pablosichert
Copy link
Author

Your two snipped to_bytes use very different feature, I expect using vec as writer with serde would have similar result. serde_json must be faster working with a vec instead of a writer.

serde_json::to_vec uses sede_json::to_writer internally: https://github.com/serde-rs/json/blob/5fe9bdd3562bf29d02d1ab798bbcff069173306b/src/ser.rs#L2191-L2198.

@pablosichert
Copy link
Author

I could be wrong, but at 2 versus 7 ns by iter you are not testing anything, it's just too close and < 10 ns is very very very small.

This is definitely a small amount of time - but also consider that only a very small workload of writing 128 bytes is carried out. If you look at the throughput numbers, you can see that they are is still far from saturating main memory bandwidth. Also, while 2 and 7ns are close in absolute terms, they are definitely not in relative ones. Which matters if you do a large amount of small writes.

I carried out the same experiment with a larger payload of 1024 bytes and also included a "nothing" benchmark:

#[bench]
fn bytes_mut_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = BytesMut::with_capacity(1024);
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        (&mut buffer).writer().write(&bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

#[bench]
fn vec_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = Vec::with_capacity(1024);
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        buffer.write(&bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

#[bench]
fn array_write(b: &mut Bencher) {
    let mut buffer = [0u8; 1024];
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        unsafe { std::ptr::copy_nonoverlapping(bytes.as_ptr(), buffer.as_mut_ptr(), bytes.len()) };
        test::black_box(&buffer);
    })
}

#[bench]
fn nothing(b: &mut Bencher) {
    let buffer = [0u8; 1024];
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        test::black_box(&buffer);
    })
}
bytes $ cargo +nightly bench --bench bytes_mut bytes_mut_io_write
   Compiling bytes v1.1.0 (/Volumes/git/com.github/tokio-rs/bytes)
    Finished bench [optimized] target(s) in 0.62s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test bytes_mut_io_write       ... bench:          17 ns/iter (+/- 0) = 60235 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.96s

bytes $ cargo +nightly bench --bench bytes_mut vec_io_write
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test vec_io_write             ... bench:          16 ns/iter (+/- 0) = 64000 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.36s

bytes $ cargo +nightly bench --bench bytes_mut array_write
    Finished bench [optimized] target(s) in 0.01s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test array_write              ... bench:          12 ns/iter (+/- 0) = 85333 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.33s

bytes $ cargo +nightly bench --bench bytes_mut nothing
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test nothing                  ... bench:           0 ns/iter (+/- 0) = 1024000 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.66s

The relative difference between Vec and BytesMut gets smaller as the payload increases, which indicates that there is a relatively high fixed cost to writing, and therefore hurting large amounts of small writes over proportionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants