Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BytesMut's BufMut impl is much slower than Vec's #81

Closed
carllerche opened this issue Mar 19, 2017 · 3 comments · Fixed by #303
Closed

BytesMut's BufMut impl is much slower than Vec's #81

carllerche opened this issue Mar 19, 2017 · 3 comments · Fixed by #303

Comments

@carllerche
Copy link
Member

Currently, it is ~2x slower. A cursory investigation reveals that there are way too many is_inline checks. Removing them brings the impls back on par.

@mzabaluev
Copy link
Contributor

mzabaluev commented Jun 17, 2019

Could branching between four different possible representations be a potential handicap as well?
I remember the times when small size optimization for Vec/String was discussed; the conclusion from people who tried to do this and measured the performance impact on rustc was that the added branching everywhere was just not worth it with the allocator that was in use (jemalloc was the default back then). I realise that the atomic ordering guarantees may be more costly, so a non-shared representation variant may still be worth it; not being much involved in this project's history, I can only assume that the other representations have shown performance improvement in realistic use cases.

I think a more promising direction for optimization would be to cut down on mutating APIs to optimize the primary use case for bytes, which is basically covered by the API of Buf/BufMut, .split_to(n).freeze(), and BytesMut::reserve: repetitions of pulling data incrementally into a BytesMut in a single task and handing off complete chunks in Bytes for potentially shared processing that does not tend to modify the data in place. I believe instilling some usage discipline can significantly reduce overhead on the hot paths.

If extension API on Bytes and segmentation of BytesMut is removed, the internal representation of BytesMut could just own a Shared instance referenced by immutable Bytes views (those which are in the ARC form) and never need to touch the reference count until it's dropped. Bytes would not need the capacity member so the struct would be 3/4 the size. It would still be possible to convert Bytes into BytesMut with no copying in case the Bytes' reference to the buffer is unique.

I can put these reformist ideas in more detail in an RFC-like issue if there is interest in this direction from the development team.

@carllerche
Copy link
Member Author

@mzabaluev Sure, more details would be needed to consider the change:

  • List of APIs added / removed.
  • Changes to internals.
  • Performance characteristic differences for common usage patterns of Bytes / BytesMut.

@mzabaluev
Copy link
Contributor

I've done my homework in #268.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants