Support serialization as `[u8; 16]` rather than as `&[u8]` #329

smarnach · 2018-09-29T20:49:52Z

Is your feature request related to a problem? Please describe.
Currently, serialization and deserialization are performed as a slice rather than a fixed-sized array. When using the bincode codec, this means that the raw data is prefixed with a redundant length tag (16usize), which is unsuitable for loading binary datastructures containing UUIDs. It is also unexpected that a UUID is serialized in a different way than [u8; 16], since it essentially is an [u8; 16].

Describe the solution you'd like
I would like to be able to serialize a UUID as if it was a [u8; 16] in non-human-readable codecs. This could be protected by a new feature flag to avoid breaking compatibility with old serialized data.

Is it blocking?
Not really, since we stopped using this crate because of this issue. While it would be possible to implement a different serialization for the UUID type, it was easier for us to use a plain [u8;16] instead.

Describe alternatives you've considered
I looked into writing my own serialization and deserialization functions. It would be nice if this was supported out of the box, though.

Additional context
N/A

Other
N/A

The text was updated successfully, but these errors were encountered:

Dylan-DPC-zz · 2018-09-29T21:06:32Z

Hi @smarnach

Thanks for reporting this and sorry you had to stop using the crate for this.

Are you open to writing a PR? If not, its okay, not a big deal.

smarnach · 2018-10-01T13:10:45Z

@Dylan-DPC No worries! I don't have time to prepare a PR during the week, but I may have time in the weekend, so I will try to do it then. No guarantees, though. :)

Dylan-DPC-zz · 2018-10-01T13:14:29Z

Cool :) No we have labelled it "hacktoberfest" so there is a chance someone might pick it up before you end up doing it. .Will keep this issue updated if that happens 👍

Redrield · 2018-10-01T21:39:51Z

I'd be interested in taking a stab at this, would the code I'm changing be under serde_support?

smarnach · 2018-10-01T22:49:11Z

@Redrield I'm just the reporter of this issue, and I'm not particularly familiar with the code base, but I had a quick look anyway.

The current serialization code is in src/serde_support.rs. The new code could either be added to the same file, or in a completely new module, depending on how big the overlap between the two implementations turns out to be. I believe we should keep the old serialization code in place for backwards compatibility, and implement a new feature flag, e.g. serde-new, that activates the new behaviour rather than the old one.

For human-readable codecs, no change is required. For non-human-readable codecs, the new implementation should use serialize_tuple() rather than serialize_bytes(), similar to the code to serialize fixed-length arrays in the serde library. The deserialization code needs the corresponding changes.

@Dylan-DPC Could you please comment on whether backwards compatibility is required, given that this goes into a new version? Having both versions in the codebase could be confusing in the long run, in particular since the difference is somewhat subtle. On the other hand, breaking compatibility would make it harder for people to upgrade.

Dylan-DPC-zz · 2018-10-01T22:53:38Z

@Redrield

Hi thanks for showing interest. Yeah @smarnach has got everything covered in his last comment. Had discussed with my co-maintainer earlier and we prefer if the feature-gate is named dense_serde.

Backwards compatibility is preferred in this case as we don't want to introduce a backward incompatible minor version and don't have much to go for a major release. Also some users might be wanting the old functionality so we'll keep both and the user can opt for either based on a feature gate.

* This commit adds a feature `dense_serde` which enables serialization and deserialization of a Uuid as a [u16; 8]

kinggoesgaming · 2018-10-02T00:34:35Z

As @Dylan-DPC said we need backward compat... as such if the feature is not enabled the current behaviour should exist otherwise dense_serde should use the array serde-ing

KodrAus · 2018-10-02T12:41:54Z

@kinggoesgaming Hmm, I'm not sure a feature flag is the best way to go here, because you can't really guarantee what flags a dependency will have toggled when you pull it in (thanks to it maybe being a dependency of one of your dependencies already) it means we effectively couldn't guarantee Uuid will be serialized in any particular way. That's a bit of a problem.

You can get a copy of the [u8; 16] from the Uuid yourself to serialize.

Dylan-DPC-zz · 2018-10-02T12:52:48Z

@KodrAus you can make a feature flag depend on a dependency feature flag which afaik should toggle it for the dependency.

KodrAus · 2018-10-02T13:01:03Z

@Dylan-DPC, the issue is any dependency could toggle that flag on for you, and the introduction of any new dependency could bring in uuid and toggle it on for you. We can't ever guarantee that the uuid you add as a dependency without the dense_serde feature will not use the dense_serde format.

I think we could explore what potential breakage there is in a patch that uses more space-efficient array serialization, but supports both fixed-size arrays and slices in deserialization. That would be backwards compatible, but not forwards compatible.

Dylan-DPC-zz · 2018-10-02T13:12:55Z

@KodrAus that's true but what I was saying is that someone who wants that deserialisation will have the feature opted in else he can use what's already available.

I can't think of a better approach if you have any, let us know. I feel we could try this approach and see how it goes and then take a call on it in a later version?

KodrAus · 2018-10-02T13:25:40Z

I see what you mean, and it's not very likely but the problem is for folks that don't have the feature enabled. Any new dependency they add, or maybe anytime they run cargo update, they could end up with a transitive dependency on uuid that has the dense_serde feature, and any previously serialized data before that feature was enabled will be broken. It's only a potential risk, but it's not impossible.

I wouldn't recommend landing this with a feature flag. They're only suited to features that add functionality that wasn't available before in a backwards compatible way. If we want to support this at all, and if we want to support this at all in the current minor version then I think we would introduce the same breakage over the lifetime of that minor version by just making the change to the serializer and ensuring the deserializer is resilient, with no feature flags. That would be a tidier solution I think.

Dylan-DPC-zz · 2018-10-02T13:33:42Z

Yes fair but wouldn't that be a breaking change as well since we are changing what we are serialising?

KodrAus · 2018-10-02T13:41:46Z

That depends on whether we consider the format itself in scope for breakage, or just the fact that any serialized Uuid can be deserialized back into a Uuid using serde. I'm ok if we want to be strict or lenient here.

I guess my point is really that a feature flag doesn't actually prevent this same kind of breakage in this case, because you're not solely in control of whether or not it's toggled. And if the feature flag isn't protecting us then we may as well not have it. If we consider the output of the serde format itself part of our public API then we'll have to leave this on the backlog until we're ready to break again.

Dylan-DPC-zz · 2018-10-02T13:47:22Z

Agreed. The reason for the feature flag was to ensure backwards compatibility but seeing that most likely people are going to use it on Uuid type instead of the slice directly, i'm fine with doing this directly without the feature flag if the output is the same.

Redrield · 2018-10-02T13:51:44Z

Could implementation detail discussion go in the PR that I have opened so I can check there to see what changes from what I have implemented may need to be done?

smarnach · 2018-10-02T14:34:47Z

It is not possible to make the deserializer support both formats for the bincode codec, since it can't distinguish between the two formats.

Here's a few more details about the use case we encountered. The superblock of an ext2 partitiion contains a struct with close to 50 fields, and two of the are UUIDs:

#[repr(C)]
pub struct Superblock {
    pub s_inodes_count: u32,
    pub s_blocks_count: u32,
    [...]
    pub s_uuid: uuid::Uuid,
    [...]
    pub s_journal_uuid: uuid::Uuid,
    [...]
}

The superblock is stored on disk in packed binary format, one field after the other, with all integers in little endian format. On a little endian machine, we can load it from disk using a bitwise copy. If we want a platform-independent soution, though, we need to use the serde bincode codec, which only works if we replace UUID by [u8; 16] in the above struct with the current implementation of serialization.

KodrAus · 2018-10-03T08:06:28Z

@smarnach Ah, so bincode doesn't support the deserialize_any method? That's actually ok now, because we don't need to touch the default serde implementations to support this in a backwards compatible way so don't need to try support both formats simultaneously.

Slightly off topic, if you need to guarantee the byte layout of Superblock then I feel like you've made the right call using [u8; 16] directly instead of Uuid. We don't guarantee the internal layout of Uuid is actually [u8; 16], it could change at any time. I mean, there's not really anything else it could be, but it's effectively a black box.

smarnach · 2018-10-03T16:27:54Z

@KodrAus Thanks for your help. Yes, using a fixed-length array is fine for our use case, but I still think it's a good thing to have a version of the serialization that behaves the same as a fixed-size array. The solution turned out quite nice.

Marwes · 2019-02-27T13:55:43Z

Would it make sense to make the [u8; 16] serialization be the default/only way in 0.8? It seems like a stretch that a binary/non-human readable format would want the larger &[u8] format.

KodrAus · 2019-03-01T07:30:13Z

@Marwes, that seems like a fair suggestion. I think we'd have to explore how this might break any currently persisted data though.

uuid::Uuid serializes as a vector, when it really should be serializing as a fixed-size array. This change creates a local wrapper class that passes through every method except serialize and deserialize. uuid-rs/uuid#329

KodrAus · 2021-10-28T11:10:11Z

Revisiting this now. I think we should fix this but need to come up with a plan for any previously serialized data.

The problem is a library that derives Serialize with fields containing UUIDs doesn’t have any control or knowledge over how it gets serialized and persisted itself. Its serialization becomes part of its public contract, and with non-self-describing formats it’s pretty much fixed from the moment it’s shipped. So if we change the default behaviour and give you a fallback the only valid choice that library can make is to opt-in to the fallback.

So I think our only path here is to add a serde::dense module you can use with #[serde(with)] and get yourself a dense representation in your own code. If you do nothing but take UUID updates then you’ll keep the same format.

I’m erring in the side of caution here just because of how widely used the library is and how long its format has been around for. I could probably be convinced otherwise 🙂

KodrAus · 2021-10-28T11:13:16Z

Ah and the original advice in this thread that we don’t guarantee [u8; 16] is also no longer valid. We do absolutely now guarantee that a Uuid is exactly a [u8; 16].

smarnach · 2021-10-28T11:34:00Z

So I think our only path here is to add a serde::dense module you can use with #[serde(with)] and get yourself a dense representation in your own code. If you do nothing but take UUID updates then you’ll keep the same format.

I think that solution was already implemented in #331.

Marwes · 2021-10-28T11:40:21Z

As long as the breaking change is done in a semver incompatible release it is up to users to ensure that they don't break their format (and if you have a binary format I'd really hope you are careful about breaking changes in that format!). Being forced into a suboptimal format forever because users can't be trusted doesn't strike me as a better idea.

KodrAus · 2021-10-28T12:34:15Z

@Marwes I think the trouble is you don’t necessarily know your type with a private field Uuid is going to be part of a binary format so our breaking change here isn’t necessarily going to be considered breaking by everyone else. You wouldn’t normally consider changes to private dependencies to be part of your public API unless you’re careful to check all uses in types that also implement Serialize. Which is a good idea, but I don’t expect that to be what actually happens.

It would be a shame to be stuck with the redundant length forever though. Maybe it’s ok if we are clear in our release notes that the format can change, so if you’re using Uuid in a Serialize impl then consider upgrading it a breaking change for you too.

Marwes · 2021-10-28T13:24:11Z

Anything that implements Serialize/Deserialize on a private type, but then leaks the serialized data doesn't actually have a private type. I suppose that may be easier to miss, but 🤷

KodrAus · 2021-10-28T13:28:43Z

🤷 indeed. I’m just a bit worried about it being especially easier to miss in libraries that are just updating things without the context we’ve got in this thread that makes it seem really obvious.

The next breaking release I was hoping to do is 1.0 though, so maybe that’s a bump people are more likely to consider carefully than 0.9.

KodrAus · 2021-10-28T13:31:27Z

I think I’m convinced then we should use the dense format by default in 1.0 and call out front-and-centre in our release notes that UUIDs might be treated differently by formats like bincode.

KodrAus · 2021-10-28T13:34:11Z

Specifically the case I was thinking about is something like:

// On inspection Uuid looks private
pub struct MyData {
    field_a: i32,
    field_b: bool,
    field_n: Uuid,
}

// lots of other code here

#[cfg(some_feature)]
impl Serialize for MyData {
    // Actually Uuid is not private
}

KodrAus · 2021-10-30T21:46:54Z

The main branch will now serialize using the compact representation.

triage-new-issues bot added the triage label Sep 29, 2018

Dylan-DPC-zz added the Type:Enhancement label Sep 29, 2018

triage-new-issues bot removed the triage label Sep 29, 2018

Dylan-DPC-zz added this to To do in 0.7.x via automation Sep 29, 2018

kinggoesgaming added the hackoberfest label Sep 29, 2018

Redrield added a commit to Redrield/uuid that referenced this issue Oct 2, 2018

Fix uuid-rs#329

5155759

* This commit adds a feature `dense_serde` which enables serialization and deserialization of a Uuid as a [u16; 8]

Redrield mentioned this issue Oct 2, 2018

Serialises UUID as [u8:16] #331

Merged

Redrield added a commit to Redrield/uuid that referenced this issue Oct 2, 2018

Fix uuid-rs#329

7d1741c

* This commit adds a feature `dense_serde` which enables serialization and deserialization of a Uuid as a [u16; 8]

kinggoesgaming moved this from To do to In progress in 0.7.x Oct 2, 2018

kinggoesgaming added this to the 0.7.2 milestone Oct 2, 2018

kinggoesgaming added this to Needs triage in Issue Triage via automation Oct 2, 2018

kinggoesgaming moved this from Needs triage to High priority in Issue Triage Oct 2, 2018

bors bot closed this as completed in #331 Oct 3, 2018

Issue Triage automation moved this from High priority to Closed Oct 3, 2018

0.7.x automation moved this from In progress to Done Oct 3, 2018

KodrAus reopened this Mar 1, 2019

Issue Triage automation moved this from Closed to Needs triage Mar 1, 2019

0.7.x automation moved this from Done to In progress Mar 1, 2019

kinggoesgaming modified the milestones: 0.7.2, 0.8.0 Mar 1, 2019

kinggoesgaming added this to To do in 0.8.x via automation Mar 1, 2019

kinggoesgaming removed this from In progress in 0.7.x Mar 1, 2019

kinggoesgaming removed the Hacktoberfest label May 9, 2019

DianaNites mentioned this issue Mar 5, 2020

API Nits/general "I had trouble with this" #461

Closed

KodrAus removed the Type:Enhancement label Oct 28, 2021

KodrAus mentioned this issue Oct 29, 2021

Make ctors that accept slices infallible using arrays and other breaking changes #536

Merged

KodrAus closed this as completed Oct 30, 2021

KodrAus mentioned this issue Nov 2, 2021

Rethink the serde [u8; 16] binary representation #557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support serialization as `[u8; 16]` rather than as `&[u8]` #329

Support serialization as `[u8; 16]` rather than as `&[u8]` #329

smarnach commented Sep 29, 2018 •

edited

Dylan-DPC-zz commented Sep 29, 2018

smarnach commented Oct 1, 2018

Dylan-DPC-zz commented Oct 1, 2018

Redrield commented Oct 1, 2018

smarnach commented Oct 1, 2018

Dylan-DPC-zz commented Oct 1, 2018

kinggoesgaming commented Oct 2, 2018

KodrAus commented Oct 2, 2018

Dylan-DPC-zz commented Oct 2, 2018

KodrAus commented Oct 2, 2018 •

edited

Dylan-DPC-zz commented Oct 2, 2018

KodrAus commented Oct 2, 2018 •

edited

Dylan-DPC-zz commented Oct 2, 2018

KodrAus commented Oct 2, 2018

Dylan-DPC-zz commented Oct 2, 2018

Redrield commented Oct 2, 2018

smarnach commented Oct 2, 2018 •

edited

KodrAus commented Oct 3, 2018

smarnach commented Oct 3, 2018

Marwes commented Feb 27, 2019

KodrAus commented Mar 1, 2019

KodrAus commented Oct 28, 2021

KodrAus commented Oct 28, 2021

smarnach commented Oct 28, 2021

Marwes commented Oct 28, 2021

KodrAus commented Oct 28, 2021

Marwes commented Oct 28, 2021

KodrAus commented Oct 28, 2021

KodrAus commented Oct 28, 2021

KodrAus commented Oct 28, 2021

KodrAus commented Oct 30, 2021

Support serialization as [u8; 16] rather than as &[u8] #329

Support serialization as [u8; 16] rather than as &[u8] #329

Comments

smarnach commented Sep 29, 2018 • edited

Dylan-DPC-zz commented Sep 29, 2018

smarnach commented Oct 1, 2018

Dylan-DPC-zz commented Oct 1, 2018

Redrield commented Oct 1, 2018

smarnach commented Oct 1, 2018

Dylan-DPC-zz commented Oct 1, 2018

kinggoesgaming commented Oct 2, 2018

KodrAus commented Oct 2, 2018

Dylan-DPC-zz commented Oct 2, 2018

KodrAus commented Oct 2, 2018 • edited

Dylan-DPC-zz commented Oct 2, 2018

KodrAus commented Oct 2, 2018 • edited

Dylan-DPC-zz commented Oct 2, 2018

KodrAus commented Oct 2, 2018

Dylan-DPC-zz commented Oct 2, 2018

Redrield commented Oct 2, 2018

smarnach commented Oct 2, 2018 • edited

KodrAus commented Oct 3, 2018

smarnach commented Oct 3, 2018

Marwes commented Feb 27, 2019

KodrAus commented Mar 1, 2019

KodrAus commented Oct 28, 2021

KodrAus commented Oct 28, 2021

smarnach commented Oct 28, 2021

Marwes commented Oct 28, 2021

KodrAus commented Oct 28, 2021

Marwes commented Oct 28, 2021

KodrAus commented Oct 28, 2021

KodrAus commented Oct 28, 2021

KodrAus commented Oct 28, 2021

KodrAus commented Oct 30, 2021

Support serialization as `[u8; 16]` rather than as `&[u8]` #329

Support serialization as `[u8; 16]` rather than as `&[u8]` #329

smarnach commented Sep 29, 2018 •

edited

KodrAus commented Oct 2, 2018 •

edited

KodrAus commented Oct 2, 2018 •

edited

smarnach commented Oct 2, 2018 •

edited