Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize for protos containing large blobs #31

Closed
rajatgoel opened this issue Sep 11, 2017 · 10 comments
Closed

optimize for protos containing large blobs #31

rajatgoel opened this issue Sep 11, 2017 · 10 comments

Comments

@rajatgoel
Copy link

One small optimization prost can do for proto message containing large blobs is - specialize decoding protobuf from reference counted buffers and give a reference to a slice of the underlying buffer as bytes instead of copying them. Similarly, when encoding the message simply chain these big blobs instead of copying them to output buffer. I think the C++ does this optimization for all bytes fields that specify ctype=cord.

@danburkert
Copy link
Collaborator

There's no support for this yet, but it should be possible to add. The 'plan' is to:

  • Add an option to prost_build::Config to output string and bytes fields as bytes::Bytes instead of String and Vec<u8>.
  • Add specialization into the decoding stack so that when decoding into a bytes::Bytes field from a bytes::Bytes source, it does a shallow copy (bytes::Bytes is ref-counted).

This solution unfortunately isn't too general - it only works with types known to prost through explicit specializations, but I think bytes::Bytes will cover the majority of cases. Perhaps if there's a rope data structure that gains prominence down the road that could be added too.

@nerdrew
Copy link

nerdrew commented Oct 12, 2018

Has anyone started on this? What does "specialization in the decoding stack" look like?

@vorner
Copy link
Contributor

vorner commented Nov 13, 2018

Looking at it, using Bytes instead of String feels a little wrong, doesn't it? I mean, something that checks the unicode and derefs into &str would be nice.

Or is the plan to provide some wrapper?

@quininer
Copy link
Member

@vorner I think String<Bytes> is what you want.

@nrc
Copy link
Contributor

nrc commented May 24, 2019

Using Bytes for bytes would be useful for small bytes fields too - at the moment we spend a huge amount of time allocating small Vecs for such fields. Being able to reference the underlying buffer, or even allocate a large slab of memory once would be a big performance win.

@mzabaluev
Copy link
Contributor

As a container for string buffers backed by Bytes, I have published the strchunk crate.

@vorner
Copy link
Contributor

vorner commented Nov 25, 2019

What is the advantage of using that over that String<Bytes> thing above?

@mzabaluev
Copy link
Contributor

mzabaluev commented Nov 25, 2019

@vorner I did not look at it in detail, but it seems, as a more generic container, it can't generally make use of some optimizations that are available for the specific Bytes-backed container. On the other hand, the TakeRange impls could be implemented for String<Bytes> as well, since I have already implemented them for Bytes.

@mzabaluev
Copy link
Contributor

optimizations that are available for the specific Bytes-backed container.

Not directly applicable in prost, but the implementation of FromIterator<char>/Extend<char> is one example.

This can be bridged as well now that the impl of BufMut for BytesMut is consistently growable (tokio-rs/bytes#316); except that would have to be String<BytesMut> so there'd need to be a conversion to String<Bytes>, or am I lost in these types now?

@danburkert
Copy link
Collaborator

danburkert commented Nov 15, 2020

Hi all, on master this should be implemented on the deserialization side as of #387. The heavy lifting landed in #341, which added support for specifying that a .proto bytes field should result in a bytes::Bytes Rust field. When deserializing such a bytes field and the buffer being deserialized (the B type param) is of type bytes::Bytes, then it should be zero-copy.

This is essentially what I laid out back in 2017, except instead of using a language specialization feature it uses a nifty specialized method built in to the bytes crate.

As such, I'm going to close out this issue. If there are concerns or follow up requests please feel free to file a new issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants