Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional support for bytes::Bytes type backing bytes field. #337

Closed
wants to merge 0 commits into from

Conversation

rolftimmermans
Copy link
Contributor

@rolftimmermans rolftimmermans commented May 25, 2020

This is an attempt to support bytes::Bytes as a backing type for bytes fields. This is opt-in, the default remains Vec<u8>. The configuration at build time is similar to BTreeMap opt-in.

For bytes fields with this feature enabled, no memory copying should take place when decoding from buffers backed by Bytes. Encoding will still copy data regardless of the backing type.

This does not address zero-copying, which would require specialisation or additional support in the Buf trait.

@rolftimmermans
Copy link
Contributor Author

I will address the CI errors tomorrow; but please feel free to comment on the overall approach in the mean time.

@rolftimmermans rolftimmermans changed the title Optional support for Bytes type backing bytes field. Optional support for bytes::Bytes type backing bytes field. May 26, 2020
src/encoding.rs Outdated Show resolved Hide resolved
@mzabaluev
Copy link
Contributor

And here as well, I humbly offer StrChunk as the alternative backing type for strings to accompany Bytes.

@quininer
Copy link
Member

We can also consider using bstr, which means we only need to provide Bytes type for string.

@mzabaluev
Copy link
Contributor

We can also consider using bstr, which means we only need to provide Bytes type for string.

I think part of the purpose of this change is to provide generated message structs with scalar fields backed by Bytes. To be convenient to the consumer, the string fields should be accessible as UTF-8 buffers/slices without extra steps. As far as I understand, bstr only provides Vec wrappers and string conversion methods grafted onto an [u8].

@rolftimmermans
Copy link
Contributor Author

And here as well, I humbly offer StrChunk as the alternative backing type for strings to accompany Bytes.

Given that the bytes crate already is a dependency of prost it's a lot more obvious to offer it as an option. But I guess alternative string types could be offered behind a feature flag? To be completely honest, though, I'm not keen on making changes to the backing type of string fields part of this PR.

@mzabaluev
Copy link
Contributor

And here as well, I humbly offer StrChunk as the alternative backing type for strings to accompany Bytes.

Given that the bytes crate already is a dependency of prost it's a lot more obvious to offer it as an option. But I guess alternative string types could be offered behind a feature flag?

Yes. However, StrChunk is a very thin wrapper over Bytes, and the crate does not have any other dependencies besides another tiny library providing convenience traits.

To be completely honest, though, I'm not keen on making changes to the backing type of string fields part of this PR.

Sure, the corresponding change for strings can wait for another feature PR. Without it, however, support for zero-copy scalars will be incomplete.

src/encoding.rs Outdated Show resolved Hide resolved
src/encoding.rs Outdated
}
}

pub trait BytesAdapter: Default + Sized + 'static {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait is my biggest hangup about this implementation; I'd like to avoid introducing a new pseudo-public trait if at all possible. It's one more interface that prost has to support going forward, and it's not particularly elegant if someone wants to substitute in a different concrete Buf impl in the future. Did you look at whether this could be replaced with a combination of non-prost traits, perhaps Clone + Default + Buf?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to poke at this a bit, I think it should be possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this at least compiles/passes tests:

    pub fn merge<V, B>(
        wire_type: WireType,
        value: &mut V,
        buf: &mut B,
        _ctx: DecodeContext,
    ) -> Result<(), DecodeError>
    where
        V: Clone + Default + BufMut,
        B: Buf,
    {
        check_wire_type(WireType::LengthDelimited, wire_type)?;
        let len = decode_varint(buf)?;
        if len > buf.remaining() as u64 {
            return Err(DecodeError::new("buffer underflow"));
        }
        let len = len as usize;

        // Clear the existing value. This follows from the following rule in the encoding guide[1]:
        //
        // > Normally, an encoded message would never have more than one instance of a non-repeated
        // > field. However, parsers are expected to handle the case in which they do. For numeric
        // > types and strings, if the same field appears multiple times, the parser accepts the
        // > last value it sees.
        //
        // [1]: https://developers.google.com/protocol-buffers/docs/encoding#optional
        value.clone_from(&V::default());
        value.put(buf.take(len));
        Ok(())
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will keep playing, the most important thing is to make sure this doesn't regress the performance of the existing vec impl.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this more, the encoding APIs get tricky to impl because Vec<u8> does not impl Buf. I think it may be possible to have two separate implementations, one for Vec<u8> and one that is generic for anything which impls Buf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is not possible because Vec<u8> is not a Buf. I have now changed the trait to be private (with a workaround stolen from https://docs.rs/tokio/0.2.21/tokio/net/trait.ToSocketAddrs.html). Does that address your concern?

Otherwise I guess it would be necessary to create a new function, say, encode_buf(...) and a lot of special-casing in prost-derive for Bytes-backed bytes fields... That seems like a much less elegant option.

src/encoding.rs Outdated Show resolved Hide resolved
@danburkert
Copy link
Collaborator

oh also please stick to generic params and where clauses instead of impl trait in encoding.rs to maintain consistency.

@rolftimmermans
Copy link
Contributor Author

oh also please stick to generic params and where clauses instead of impl trait in encoding.rs to maintain consistency.

I have done this in all places except for length_delimited!(impl BytesAdapter);. Changing this to a generic parameter with a trait bound doesn't let me reuse the existing implementation with the length_delimited!() macro. I can of course duplicate it but not sure that is worth the style change. What do you think?

@danburkert
Copy link
Collaborator

hmm, not sure what I did there but I intended to add a couple of commits to the PR branch, not sure why it got closed.

@danburkert
Copy link
Collaborator

OK well I really screwed that up, and now I can't fix cause the PR's been closed. Will open a new PR with the proper commits. Apologies @rolftimmermans !

@danburkert
Copy link
Collaborator

Reopened as #341

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants