optimize for protos containing large blobs #31

rajatgoel · 2017-09-11T23:50:05Z

One small optimization prost can do for proto message containing large blobs is - specialize decoding protobuf from reference counted buffers and give a reference to a slice of the underlying buffer as bytes instead of copying them. Similarly, when encoding the message simply chain these big blobs instead of copying them to output buffer. I think the C++ does this optimization for all bytes fields that specify ctype=cord.

The text was updated successfully, but these errors were encountered:

danburkert · 2017-09-12T01:17:46Z

There's no support for this yet, but it should be possible to add. The 'plan' is to:

Add an option to prost_build::Config to output string and bytes fields as bytes::Bytes instead of String and Vec<u8>.
Add specialization into the decoding stack so that when decoding into a bytes::Bytes field from a bytes::Bytes source, it does a shallow copy (bytes::Bytes is ref-counted).

This solution unfortunately isn't too general - it only works with types known to prost through explicit specializations, but I think bytes::Bytes will cover the majority of cases. Perhaps if there's a rope data structure that gains prominence down the road that could be added too.

nerdrew · 2018-10-12T15:41:56Z

Has anyone started on this? What does "specialization in the decoding stack" look like?

vorner · 2018-11-13T08:45:08Z

Looking at it, using Bytes instead of String feels a little wrong, doesn't it? I mean, something that checks the unicode and derefs into &str would be nice.

Or is the plan to provide some wrapper?

quininer · 2019-05-19T06:54:20Z

@vorner I think String<Bytes> is what you want.

nrc · 2019-05-24T22:09:03Z

Using Bytes for bytes would be useful for small bytes fields too - at the moment we spend a huge amount of time allocating small Vecs for such fields. Being able to reference the underlying buffer, or even allocate a large slab of memory once would be a big performance win.

mzabaluev · 2019-11-25T09:46:32Z

As a container for string buffers backed by Bytes, I have published the strchunk crate.

vorner · 2019-11-25T09:53:37Z

What is the advantage of using that over that String<Bytes> thing above?

mzabaluev · 2019-11-25T10:55:01Z

@vorner I did not look at it in detail, but it seems, as a more generic container, it can't generally make use of some optimizations that are available for the specific Bytes-backed container. On the other hand, the TakeRange impls could be implemented for String<Bytes> as well, since I have already implemented them for Bytes.

mzabaluev · 2019-11-25T11:27:38Z

optimizations that are available for the specific Bytes-backed container.

Not directly applicable in prost, but the implementation of FromIterator<char>/Extend<char> is one example.

This can be bridged as well now that the impl of BufMut for BytesMut is consistently growable (tokio-rs/bytes#316); except that would have to be String<BytesMut> so there'd need to be a conversion to String<Bytes>, or am I lost in these types now?

danburkert · 2020-11-15T19:52:48Z

Hi all, on master this should be implemented on the deserialization side as of #387. The heavy lifting landed in #341, which added support for specifying that a .proto bytes field should result in a bytes::Bytes Rust field. When deserializing such a bytes field and the buffer being deserialized (the B type param) is of type bytes::Bytes, then it should be zero-copy.

This is essentially what I laid out back in 2017, except instead of using a language specialization feature it uses a nifty specialized method built in to the bytes crate.

As such, I'm going to close out this issue. If there are concerns or follow up requests please feel free to file a new issue. Thanks!

danburkert added enhancement help wanted labels Sep 12, 2017

danburkert mentioned this issue Sep 16, 2017

Using borrowed values #35

Closed

nerdrew mentioned this issue Oct 12, 2018

Implement extract for a type that implements From<Vec<u8>>? carllerche/tower-web#117

Open

vorner mentioned this issue Nov 11, 2018

Zero-copy mode #134

Closed

vorner mentioned this issue Nov 13, 2018

WIP: Specialization of decoding for Bytes #135

Closed

nrc mentioned this issue May 30, 2019

Experiment with using bytes::Bytes to back bytes and string fields #190

Closed

danburkert mentioned this issue Jun 7, 2019

Avoid cloning for parameter in RPC call. tower-rs/tower-grpc#184

Open

danburkert closed this as completed Nov 15, 2020

vorner mentioned this issue Nov 15, 2020

Support for String<Bytes> or similar #392

Open

ikopylov mentioned this issue Sep 5, 2022

506 avoid data cloning qoollo/bob#525

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize for protos containing large blobs #31

optimize for protos containing large blobs #31

rajatgoel commented Sep 11, 2017

danburkert commented Sep 12, 2017

nerdrew commented Oct 12, 2018

vorner commented Nov 13, 2018

quininer commented May 19, 2019

nrc commented May 24, 2019

mzabaluev commented Nov 25, 2019

vorner commented Nov 25, 2019

mzabaluev commented Nov 25, 2019 •

edited

mzabaluev commented Nov 25, 2019

danburkert commented Nov 15, 2020 •

edited

optimize for protos containing large blobs #31

optimize for protos containing large blobs #31

Comments

rajatgoel commented Sep 11, 2017

danburkert commented Sep 12, 2017

nerdrew commented Oct 12, 2018

vorner commented Nov 13, 2018

quininer commented May 19, 2019

nrc commented May 24, 2019

mzabaluev commented Nov 25, 2019

vorner commented Nov 25, 2019

mzabaluev commented Nov 25, 2019 • edited

mzabaluev commented Nov 25, 2019

danburkert commented Nov 15, 2020 • edited

mzabaluev commented Nov 25, 2019 •

edited

danburkert commented Nov 15, 2020 •

edited