Optional support for bytes::Bytes type backing bytes field. #337

rolftimmermans · 2020-05-25T20:46:43Z

This is an attempt to support bytes::Bytes as a backing type for bytes fields. This is opt-in, the default remains Vec<u8>. The configuration at build time is similar to BTreeMap opt-in.

~~For bytes fields with this feature enabled, no memory copying should take place when decoding from buffers backed by Bytes. Encoding will still copy data regardless of the backing type.~~

This does not address zero-copying, which would require specialisation or additional support in the Buf trait.

rolftimmermans · 2020-05-25T21:12:49Z

I will address the CI errors tomorrow; but please feel free to comment on the overall approach in the mean time.

src/encoding.rs

mzabaluev · 2020-05-27T03:23:39Z

And here as well, I humbly offer StrChunk as the alternative backing type for strings to accompany Bytes.

quininer · 2020-05-27T06:49:42Z

We can also consider using bstr, which means we only need to provide Bytes type for string.

mzabaluev · 2020-05-27T07:11:29Z

We can also consider using bstr, which means we only need to provide Bytes type for string.

I think part of the purpose of this change is to provide generated message structs with scalar fields backed by Bytes. To be convenient to the consumer, the string fields should be accessible as UTF-8 buffers/slices without extra steps. As far as I understand, bstr only provides Vec wrappers and string conversion methods grafted onto an [u8].

rolftimmermans · 2020-05-27T13:10:29Z

And here as well, I humbly offer StrChunk as the alternative backing type for strings to accompany Bytes.

Given that the bytes crate already is a dependency of prost it's a lot more obvious to offer it as an option. But I guess alternative string types could be offered behind a feature flag? To be completely honest, though, I'm not keen on making changes to the backing type of string fields part of this PR.

mzabaluev · 2020-05-27T15:35:05Z

And here as well, I humbly offer StrChunk as the alternative backing type for strings to accompany Bytes.

Given that the bytes crate already is a dependency of prost it's a lot more obvious to offer it as an option. But I guess alternative string types could be offered behind a feature flag?

Yes. However, StrChunk is a very thin wrapper over Bytes, and the crate does not have any other dependencies besides another tiny library providing convenience traits.

To be completely honest, though, I'm not keen on making changes to the backing type of string fields part of this PR.

Sure, the corresponding change for strings can wait for another feature PR. Without it, however, support for zero-copy scalars will be incomplete.

src/encoding.rs

danburkert · 2020-05-31T19:49:52Z

src/encoding.rs

+    }
+}
+
+pub trait BytesAdapter: Default + Sized + 'static {


This trait is my biggest hangup about this implementation; I'd like to avoid introducing a new pseudo-public trait if at all possible. It's one more interface that prost has to support going forward, and it's not particularly elegant if someone wants to substitute in a different concrete Buf impl in the future. Did you look at whether this could be replaced with a combination of non-prost traits, perhaps Clone + Default + Buf?

I'm going to poke at this a bit, I think it should be possible.

Well this at least compiles/passes tests:

pub fn merge<V, B>( wire_type: WireType, value: &mut V, buf: &mut B, _ctx: DecodeContext, ) -> Result<(), DecodeError> where V: Clone + Default + BufMut, B: Buf, { check_wire_type(WireType::LengthDelimited, wire_type)?; let len = decode_varint(buf)?; if len > buf.remaining() as u64 { return Err(DecodeError::new("buffer underflow")); } let len = len as usize; // Clear the existing value. This follows from the following rule in the encoding guide[1]: // // > Normally, an encoded message would never have more than one instance of a non-repeated // > field. However, parsers are expected to handle the case in which they do. For numeric // > types and strings, if the same field appears multiple times, the parser accepts the // > last value it sees. // // [1]: https://developers.google.com/protocol-buffers/docs/encoding#optional value.clone_from(&V::default()); value.put(buf.take(len)); Ok(()) }

will keep playing, the most important thing is to make sure this doesn't regress the performance of the existing vec impl.

After looking at this more, the encoding APIs get tricky to impl because Vec<u8> does not impl Buf. I think it may be possible to have two separate implementations, one for Vec<u8> and one that is generic for anything which impls Buf.

Indeed this is not possible because Vec<u8> is not a Buf. I have now changed the trait to be private (with a workaround stolen from https://docs.rs/tokio/0.2.21/tokio/net/trait.ToSocketAddrs.html). Does that address your concern?

Otherwise I guess it would be necessary to create a new function, say, encode_buf(...) and a lot of special-casing in prost-derive for Bytes-backed bytes fields... That seems like a much less elegant option.

src/encoding.rs

danburkert · 2020-05-31T20:12:26Z

oh also please stick to generic params and where clauses instead of impl trait in encoding.rs to maintain consistency.

rolftimmermans · 2020-06-01T08:02:18Z

oh also please stick to generic params and where clauses instead of impl trait in encoding.rs to maintain consistency.

I have done this in all places except for length_delimited!(impl BytesAdapter);. Changing this to a generic parameter with a trait bound doesn't let me reuse the existing implementation with the length_delimited!() macro. I can of course duplicate it but not sure that is worth the style change. What do you think?

danburkert · 2020-06-14T22:49:43Z

hmm, not sure what I did there but I intended to add a couple of commits to the PR branch, not sure why it got closed.

danburkert · 2020-06-14T22:55:11Z

OK well I really screwed that up, and now I can't fix cause the PR's been closed. Will open a new PR with the proper commits. Apologies @rolftimmermans !

danburkert · 2020-06-14T22:56:45Z

Reopened as #341

rolftimmermans mentioned this pull request May 26, 2020

Experiment with using bytes::Bytes to back bytes and string fields #190

Closed

rolftimmermans changed the title ~~Optional support for Bytes type backing bytes field.~~ Optional support for bytes::Bytes type backing bytes field. May 26, 2020

mzabaluev reviewed May 26, 2020

View reviewed changes

src/encoding.rs Outdated Show resolved Hide resolved

danburkert reviewed May 31, 2020

View reviewed changes

rolftimmermans requested a review from danburkert June 9, 2020 07:24

danburkert closed this Jun 14, 2020

danburkert force-pushed the master branch from cb0344c to 8025627 Compare June 14, 2020 22:48

danburkert mentioned this pull request Jun 14, 2020

Optional support for bytes::Bytes type backing bytes field #341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional support for bytes::Bytes type backing bytes field. #337

Optional support for bytes::Bytes type backing bytes field. #337

rolftimmermans commented May 25, 2020 •

edited

rolftimmermans commented May 25, 2020

mzabaluev commented May 27, 2020

quininer commented May 27, 2020

mzabaluev commented May 27, 2020

rolftimmermans commented May 27, 2020

mzabaluev commented May 27, 2020

danburkert May 31, 2020 •

edited

danburkert May 31, 2020

danburkert May 31, 2020 •

edited

danburkert May 31, 2020

danburkert Jun 1, 2020

rolftimmermans Jun 1, 2020

danburkert commented May 31, 2020

rolftimmermans commented Jun 1, 2020

danburkert commented Jun 14, 2020

danburkert commented Jun 14, 2020

danburkert commented Jun 14, 2020

Optional support for bytes::Bytes type backing bytes field. #337

Optional support for bytes::Bytes type backing bytes field. #337

Conversation

rolftimmermans commented May 25, 2020 • edited

rolftimmermans commented May 25, 2020

mzabaluev commented May 27, 2020

quininer commented May 27, 2020

mzabaluev commented May 27, 2020

rolftimmermans commented May 27, 2020

mzabaluev commented May 27, 2020

danburkert May 31, 2020 • edited

Choose a reason for hiding this comment

danburkert May 31, 2020

Choose a reason for hiding this comment

danburkert May 31, 2020 • edited

Choose a reason for hiding this comment

danburkert May 31, 2020

Choose a reason for hiding this comment

danburkert Jun 1, 2020

Choose a reason for hiding this comment

rolftimmermans Jun 1, 2020

Choose a reason for hiding this comment

danburkert commented May 31, 2020

rolftimmermans commented Jun 1, 2020

danburkert commented Jun 14, 2020

danburkert commented Jun 14, 2020

danburkert commented Jun 14, 2020

rolftimmermans commented May 25, 2020 •

edited

danburkert May 31, 2020 •

edited

danburkert May 31, 2020 •

edited