`io::Write` via `buf::Writer` on `BytesMut` significantly slower than on `Vec<u8>` #531

pablosichert · 2022-02-11T18:02:49Z

Heyo there,

when writing serde_json::to_writer on BytesMut vs. writing serde_json::to_vec I saw a slowdown of roughly 40% depending on input size – while I expected them to perform equally fast.

The following benchmarks

#[bench]
fn bytes_mut_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = BytesMut::with_capacity(128);
    let bytes = b"foo bar baz quux lorem ipsum dolor et";

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        (&mut buffer).writer().write(bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

#[bench]
fn vec_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = Vec::with_capacity(128);
    let bytes = b"foo bar baz quux lorem ipsum dolor et";

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        buffer.write(bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

yielded

bytes $ cargo +nightly bench --bench bytes_mut vec_io_write
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test vec_io_write             ... bench:           2 ns/iter (+/- 0) = 18500 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 19 filtered out; finished in 5.73s

bytes $ cargo +nightly bench --bench bytes_mut bytes_mut_io_write
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test bytes_mut_io_write       ... bench:           7 ns/iter (+/- 0) = 5285 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 19 filtered out; finished in 0.17s

Some more integrated benchmarks:

pub fn to_bytes<T>(value: &T) -> serde_json::Result<BytesMut>
where
    T: ?Sized + Serialize,
{
    let mut bytes = BytesMut::with_capacity(128);
    serde_json::to_writer((&mut bytes).writer(), value)?;
    Ok(bytes)
}
to_bytes(&log)

279.92 ns

pub fn to_bytes<T>(value: &T) -> serde_json::Result<BytesMut>
where
    T: ?Sized + Serialize,
{
    let bytes = serde_json::to_vec(value)?;
    let bytes = BytesMut::from(bytes.as_slice());
    Ok(bytes)
}
to_bytes(&log)

226.44 ns

serde_json::to_vec(&log)

195.28 ns

Give that even writing to Vec<u8> and copying the result over to a new BytesMut was faster than taking the Writer, I think this could use some attention.

I'm wondering if this is a conceputal difference in BytesMut or something that could be improved via #425/#478?

The text was updated successfully, but these errors were encountered:

Stargateur · 2022-02-12T12:49:52Z

I could be wrong, but at 2 versus 7 ns by iter you are not testing anything, it's just too close and < 10 ns is very very very small.

Your two snipped to_bytes use very different feature, I expect using vec as writer with serde would have similar result. serde_json must be faster working with a vec instead of a writer.

pablosichert · 2022-02-12T16:05:04Z

Your two snipped to_bytes use very different feature, I expect using vec as writer with serde would have similar result. serde_json must be faster working with a vec instead of a writer.

serde_json::to_vec uses sede_json::to_writer internally: https://github.com/serde-rs/json/blob/5fe9bdd3562bf29d02d1ab798bbcff069173306b/src/ser.rs#L2191-L2198.

pablosichert · 2022-02-21T14:29:40Z

I could be wrong, but at 2 versus 7 ns by iter you are not testing anything, it's just too close and < 10 ns is very very very small.

This is definitely a small amount of time - but also consider that only a very small workload of writing 128 bytes is carried out. If you look at the throughput numbers, you can see that they are is still far from saturating main memory bandwidth. Also, while 2 and 7ns are close in absolute terms, they are definitely not in relative ones. Which matters if you do a large amount of small writes.

I carried out the same experiment with a larger payload of 1024 bytes and also included a "nothing" benchmark:

#[bench]
fn bytes_mut_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = BytesMut::with_capacity(1024);
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        (&mut buffer).writer().write(&bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

#[bench]
fn vec_io_write(b: &mut Bencher) {
    use std::io::Write;
    let mut buffer = Vec::with_capacity(1024);
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        buffer.write(&bytes).unwrap();
        test::black_box(&buffer);
        unsafe {
            buffer.set_len(0);
        }
    })
}

#[bench]
fn array_write(b: &mut Bencher) {
    let mut buffer = [0u8; 1024];
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        unsafe { std::ptr::copy_nonoverlapping(bytes.as_ptr(), buffer.as_mut_ptr(), bytes.len()) };
        test::black_box(&buffer);
    })
}

#[bench]
fn nothing(b: &mut Bencher) {
    let buffer = [0u8; 1024];
    let bytes = [1u8; 1024];

    b.bytes = bytes.len() as u64;
    b.iter(|| {
        test::black_box(&buffer);
    })
}

bytes $ cargo +nightly bench --bench bytes_mut bytes_mut_io_write
   Compiling bytes v1.1.0 (/Volumes/git/com.github/tokio-rs/bytes)
    Finished bench [optimized] target(s) in 0.62s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test bytes_mut_io_write       ... bench:          17 ns/iter (+/- 0) = 60235 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.96s

bytes $ cargo +nightly bench --bench bytes_mut vec_io_write
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test vec_io_write             ... bench:          16 ns/iter (+/- 0) = 64000 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.36s

bytes $ cargo +nightly bench --bench bytes_mut array_write
    Finished bench [optimized] target(s) in 0.01s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test array_write              ... bench:          12 ns/iter (+/- 0) = 85333 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.33s

bytes $ cargo +nightly bench --bench bytes_mut nothing
    Finished bench [optimized] target(s) in 0.00s
     Running unittests (target/release/deps/bytes_mut-d70aabe3863f1aa3)

running 1 test
test nothing                  ... bench:           0 ns/iter (+/- 0) = 1024000 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 21 filtered out; finished in 0.66s

The relative difference between Vec and BytesMut gets smaller as the payload increases, which indicates that there is a relatively high fixed cost to writing, and therefore hurting large amounts of small writes over proportionally.

This was referenced Feb 11, 2022

chore: Use BytesMut instead of Vec<u8> in HttpSink related code vectordotdev/vector#11232

Merged

Improve performance of writing to BytesMut's Writer through std::io::Write vectordotdev/vector#11341

Open

pablosichert mentioned this issue Apr 6, 2022

enhancement(socket sink): Integrate encoding::Encoder with socket sink vectordotdev/vector#10684

Merged

pablosichert mentioned this issue Apr 14, 2022

Fix performance degradation in sinks due to slow writer in hot path vectordotdev/vector#12227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`io::Write` via `buf::Writer` on `BytesMut` significantly slower than on `Vec<u8>` #531

`io::Write` via `buf::Writer` on `BytesMut` significantly slower than on `Vec<u8>` #531

pablosichert commented Feb 11, 2022 •

edited

Stargateur commented Feb 12, 2022 •

edited

pablosichert commented Feb 12, 2022

pablosichert commented Feb 21, 2022

io::Write via buf::Writer on BytesMut significantly slower than on Vec<u8> #531

io::Write via buf::Writer on BytesMut significantly slower than on Vec<u8> #531

Comments

pablosichert commented Feb 11, 2022 • edited

Stargateur commented Feb 12, 2022 • edited

pablosichert commented Feb 12, 2022

pablosichert commented Feb 21, 2022

`io::Write` via `buf::Writer` on `BytesMut` significantly slower than on `Vec<u8>` #531

`io::Write` via `buf::Writer` on `BytesMut` significantly slower than on `Vec<u8>` #531

pablosichert commented Feb 11, 2022 •

edited

Stargateur commented Feb 12, 2022 •

edited