BytesMut's BufMut impl is much slower than Vec's #81

carllerche · 2017-03-19T04:44:42Z

Currently, it is ~2x slower. A cursory investigation reveals that there are way too many is_inline checks. Removing them brings the impls back on par.

The text was updated successfully, but these errors were encountered:

mzabaluev · 2019-06-17T18:28:29Z

Could branching between four different possible representations be a potential handicap as well?
I remember the times when small size optimization for Vec/String was discussed; the conclusion from people who tried to do this and measured the performance impact on rustc was that the added branching everywhere was just not worth it with the allocator that was in use (jemalloc was the default back then). I realise that the atomic ordering guarantees may be more costly, so a non-shared representation variant may still be worth it; not being much involved in this project's history, I can only assume that the other representations have shown performance improvement in realistic use cases.

I think a more promising direction for optimization would be to cut down on mutating APIs to optimize the primary use case for bytes, which is basically covered by the API of ~~Buf/~~BufMut, .split_to(n).freeze(), and BytesMut::reserve: repetitions of pulling data incrementally into a BytesMut in a single task and handing off complete chunks in Bytes for potentially shared processing that does not tend to modify the data in place. I believe instilling some usage discipline can significantly reduce overhead on the hot paths.

If extension API on Bytes and segmentation of BytesMut is removed, the internal representation of BytesMut could just own a Shared instance referenced by immutable Bytes views (those which are in the ARC form) and never need to touch the reference count until it's dropped. Bytes would not need the capacity member so the struct would be 3/4 the size. It would still be possible to convert Bytes into BytesMut with no copying in case the Bytes' reference to the buffer is unique.

I can put these reformist ideas in more detail in an RFC-like issue if there is interest in this direction from the development team.

carllerche · 2019-06-17T21:37:02Z

@mzabaluev Sure, more details would be needed to consider the change:

List of APIs added / removed.
Changes to internals.
Performance characteristic differences for common usage patterns of Bytes / BytesMut.

mzabaluev · 2019-06-23T17:58:00Z

I've done my homework in #268.

carllerche mentioned this issue Mar 19, 2017

Reduce is_inline calls in Bytes's BufMut impl #82

Closed

seanmonstar mentioned this issue Oct 16, 2019

Add benchmarks for BytesMut vs Vec #303

Merged

seanmonstar closed this as completed in #303 Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BytesMut's BufMut impl is much slower than Vec's #81

BytesMut's BufMut impl is much slower than Vec's #81

carllerche commented Mar 19, 2017

mzabaluev commented Jun 17, 2019 •

edited

carllerche commented Jun 17, 2019

mzabaluev commented Jun 23, 2019

BytesMut's BufMut impl is much slower than Vec's #81

BytesMut's BufMut impl is much slower than Vec's #81

Comments

carllerche commented Mar 19, 2017

mzabaluev commented Jun 17, 2019 • edited

carllerche commented Jun 17, 2019

mzabaluev commented Jun 23, 2019

mzabaluev commented Jun 17, 2019 •

edited