Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support serialization as [u8; 16] rather than as &[u8] #329

Closed
smarnach opened this issue Sep 29, 2018 · 31 comments · Fixed by #331
Closed

Support serialization as [u8; 16] rather than as &[u8] #329

smarnach opened this issue Sep 29, 2018 · 31 comments · Fixed by #331

Comments

@smarnach
Copy link

smarnach commented Sep 29, 2018

Is your feature request related to a problem? Please describe.
Currently, serialization and deserialization are performed as a slice rather than a fixed-sized array. When using the bincode codec, this means that the raw data is prefixed with a redundant length tag (16usize), which is unsuitable for loading binary datastructures containing UUIDs. It is also unexpected that a UUID is serialized in a different way than [u8; 16], since it essentially is an [u8; 16].

Describe the solution you'd like
I would like to be able to serialize a UUID as if it was a [u8; 16] in non-human-readable codecs. This could be protected by a new feature flag to avoid breaking compatibility with old serialized data.

Is it blocking?
Not really, since we stopped using this crate because of this issue. While it would be possible to implement a different serialization for the UUID type, it was easier for us to use a plain [u8;16] instead.

Describe alternatives you've considered
I looked into writing my own serialization and deserialization functions. It would be nice if this was supported out of the box, though.

Additional context
N/A

Other
N/A

@Dylan-DPC-zz
Copy link
Member

Hi @smarnach

Thanks for reporting this and sorry you had to stop using the crate for this.

Are you open to writing a PR? If not, its okay, not a big deal.

@Dylan-DPC-zz Dylan-DPC-zz added this to To do in 0.7.x via automation Sep 29, 2018
@smarnach
Copy link
Author

smarnach commented Oct 1, 2018

@Dylan-DPC No worries! I don't have time to prepare a PR during the week, but I may have time in the weekend, so I will try to do it then. No guarantees, though. :)

@Dylan-DPC-zz
Copy link
Member

Cool :) No we have labelled it "hacktoberfest" so there is a chance someone might pick it up before you end up doing it. .Will keep this issue updated if that happens 👍

@Redrield
Copy link
Contributor

Redrield commented Oct 1, 2018

I'd be interested in taking a stab at this, would the code I'm changing be under serde_support?

@smarnach
Copy link
Author

smarnach commented Oct 1, 2018

@Redrield I'm just the reporter of this issue, and I'm not particularly familiar with the code base, but I had a quick look anyway.

The current serialization code is in src/serde_support.rs. The new code could either be added to the same file, or in a completely new module, depending on how big the overlap between the two implementations turns out to be. I believe we should keep the old serialization code in place for backwards compatibility, and implement a new feature flag, e.g. serde-new, that activates the new behaviour rather than the old one.

For human-readable codecs, no change is required. For non-human-readable codecs, the new implementation should use serialize_tuple() rather than serialize_bytes(), similar to the code to serialize fixed-length arrays in the serde library. The deserialization code needs the corresponding changes.

@Dylan-DPC Could you please comment on whether backwards compatibility is required, given that this goes into a new version? Having both versions in the codebase could be confusing in the long run, in particular since the difference is somewhat subtle. On the other hand, breaking compatibility would make it harder for people to upgrade.

@Dylan-DPC-zz
Copy link
Member

@Redrield

Hi thanks for showing interest. Yeah @smarnach has got everything covered in his last comment. Had discussed with my co-maintainer earlier and we prefer if the feature-gate is named dense_serde.

Backwards compatibility is preferred in this case as we don't want to introduce a backward incompatible minor version and don't have much to go for a major release. Also some users might be wanting the old functionality so we'll keep both and the user can opt for either based on a feature gate.

Redrield added a commit to Redrield/uuid that referenced this issue Oct 2, 2018
* This commit adds a feature `dense_serde` which enables serialization
and deserialization of a Uuid as a [u16; 8]
Redrield added a commit to Redrield/uuid that referenced this issue Oct 2, 2018
* This commit adds a feature `dense_serde` which enables serialization
and deserialization of a Uuid as a [u16; 8]
@kinggoesgaming
Copy link
Member

As @Dylan-DPC said we need backward compat... as such if the feature is not enabled the current behaviour should exist otherwise dense_serde should use the array serde-ing

@kinggoesgaming kinggoesgaming moved this from To do to In progress in 0.7.x Oct 2, 2018
@kinggoesgaming kinggoesgaming added this to the 0.7.2 milestone Oct 2, 2018
@KodrAus
Copy link
Member

KodrAus commented Oct 2, 2018

@kinggoesgaming Hmm, I'm not sure a feature flag is the best way to go here, because you can't really guarantee what flags a dependency will have toggled when you pull it in (thanks to it maybe being a dependency of one of your dependencies already) it means we effectively couldn't guarantee Uuid will be serialized in any particular way. That's a bit of a problem.

You can get a copy of the [u8; 16] from the Uuid yourself to serialize.

@Dylan-DPC-zz
Copy link
Member

@KodrAus you can make a feature flag depend on a dependency feature flag which afaik should toggle it for the dependency.

@KodrAus
Copy link
Member

KodrAus commented Oct 2, 2018

@Dylan-DPC, the issue is any dependency could toggle that flag on for you, and the introduction of any new dependency could bring in uuid and toggle it on for you. We can't ever guarantee that the uuid you add as a dependency without the dense_serde feature will not use the dense_serde format.

I think we could explore what potential breakage there is in a patch that uses more space-efficient array serialization, but supports both fixed-size arrays and slices in deserialization. That would be backwards compatible, but not forwards compatible.

@Dylan-DPC-zz
Copy link
Member

@KodrAus that's true but what I was saying is that someone who wants that deserialisation will have the feature opted in else he can use what's already available.

I can't think of a better approach if you have any, let us know. I feel we could try this approach and see how it goes and then take a call on it in a later version?

@KodrAus
Copy link
Member

KodrAus commented Oct 2, 2018

I see what you mean, and it's not very likely but the problem is for folks that don't have the feature enabled. Any new dependency they add, or maybe anytime they run cargo update, they could end up with a transitive dependency on uuid that has the dense_serde feature, and any previously serialized data before that feature was enabled will be broken. It's only a potential risk, but it's not impossible.

I wouldn't recommend landing this with a feature flag. They're only suited to features that add functionality that wasn't available before in a backwards compatible way. If we want to support this at all, and if we want to support this at all in the current minor version then I think we would introduce the same breakage over the lifetime of that minor version by just making the change to the serializer and ensuring the deserializer is resilient, with no feature flags. That would be a tidier solution I think.

@Dylan-DPC-zz
Copy link
Member

Yes fair but wouldn't that be a breaking change as well since we are changing what we are serialising?

@KodrAus
Copy link
Member

KodrAus commented Oct 2, 2018

That depends on whether we consider the format itself in scope for breakage, or just the fact that any serialized Uuid can be deserialized back into a Uuid using serde. I'm ok if we want to be strict or lenient here.

I guess my point is really that a feature flag doesn't actually prevent this same kind of breakage in this case, because you're not solely in control of whether or not it's toggled. And if the feature flag isn't protecting us then we may as well not have it. If we consider the output of the serde format itself part of our public API then we'll have to leave this on the backlog until we're ready to break again.

@Dylan-DPC-zz
Copy link
Member

Agreed. The reason for the feature flag was to ensure backwards compatibility but seeing that most likely people are going to use it on Uuid type instead of the slice directly, i'm fine with doing this directly without the feature flag if the output is the same.

@Redrield
Copy link
Contributor

Redrield commented Oct 2, 2018

Could implementation detail discussion go in the PR that I have opened so I can check there to see what changes from what I have implemented may need to be done?

@smarnach
Copy link
Author

smarnach commented Oct 2, 2018

It is not possible to make the deserializer support both formats for the bincode codec, since it can't distinguish between the two formats.

Here's a few more details about the use case we encountered. The superblock of an ext2 partitiion contains a struct with close to 50 fields, and two of the are UUIDs:

#[repr(C)]
pub struct Superblock {
    pub s_inodes_count: u32,
    pub s_blocks_count: u32,
    [...]
    pub s_uuid: uuid::Uuid,
    [...]
    pub s_journal_uuid: uuid::Uuid,
    [...]
}

The superblock is stored on disk in packed binary format, one field after the other, with all integers in little endian format. On a little endian machine, we can load it from disk using a bitwise copy. If we want a platform-independent soution, though, we need to use the serde bincode codec, which only works if we replace UUID by [u8; 16] in the above struct with the current implementation of serialization.

@kinggoesgaming kinggoesgaming added this to Needs triage in Issue Triage via automation Oct 2, 2018
@kinggoesgaming kinggoesgaming moved this from Needs triage to High priority in Issue Triage Oct 2, 2018
@KodrAus
Copy link
Member

KodrAus commented Oct 3, 2018

@smarnach Ah, so bincode doesn't support the deserialize_any method? That's actually ok now, because we don't need to touch the default serde implementations to support this in a backwards compatible way so don't need to try support both formats simultaneously.

Slightly off topic, if you need to guarantee the byte layout of Superblock then I feel like you've made the right call using [u8; 16] directly instead of Uuid. We don't guarantee the internal layout of Uuid is actually [u8; 16], it could change at any time. I mean, there's not really anything else it could be, but it's effectively a black box.

@bors bors bot closed this as completed in #331 Oct 3, 2018
Issue Triage automation moved this from High priority to Closed Oct 3, 2018
0.7.x automation moved this from In progress to Done Oct 3, 2018
@smarnach
Copy link
Author

smarnach commented Oct 3, 2018

@KodrAus Thanks for your help. Yes, using a fixed-length array is fine for our use case, but I still think it's a good thing to have a version of the serialization that behaves the same as a fixed-size array. The solution turned out quite nice.

@Marwes
Copy link
Contributor

Marwes commented Feb 27, 2019

Would it make sense to make the [u8; 16] serialization be the default/only way in 0.8? It seems like a stretch that a binary/non-human readable format would want the larger &[u8] format.

@KodrAus
Copy link
Member

KodrAus commented Mar 1, 2019

@Marwes, that seems like a fair suggestion. I think we'd have to explore how this might break any currently persisted data though.

@KodrAus KodrAus reopened this Mar 1, 2019
Issue Triage automation moved this from Closed to Needs triage Mar 1, 2019
0.7.x automation moved this from Done to In progress Mar 1, 2019
@kinggoesgaming kinggoesgaming modified the milestones: 0.7.2, 0.8.0 Mar 1, 2019
@kinggoesgaming kinggoesgaming added this to To do in 0.8.x via automation Mar 1, 2019
@kinggoesgaming kinggoesgaming removed this from In progress in 0.7.x Mar 1, 2019
asomers added a commit to bfffs/bfffs that referenced this issue Sep 20, 2019
uuid::Uuid serializes as a vector, when it really should be serializing as a
fixed-size array.  This change creates a local wrapper class that passes
through every method except serialize and deserialize.

uuid-rs/uuid#329
@KodrAus
Copy link
Member

KodrAus commented Oct 28, 2021

Revisiting this now. I think we should fix this but need to come up with a plan for any previously serialized data.

The problem is a library that derives Serialize with fields containing UUIDs doesn’t have any control or knowledge over how it gets serialized and persisted itself. Its serialization becomes part of its public contract, and with non-self-describing formats it’s pretty much fixed from the moment it’s shipped. So if we change the default behaviour and give you a fallback the only valid choice that library can make is to opt-in to the fallback.

So I think our only path here is to add a serde::dense module you can use with #[serde(with)] and get yourself a dense representation in your own code. If you do nothing but take UUID updates then you’ll keep the same format.

I’m erring in the side of caution here just because of how widely used the library is and how long its format has been around for. I could probably be convinced otherwise 🙂

@KodrAus
Copy link
Member

KodrAus commented Oct 28, 2021

Ah and the original advice in this thread that we don’t guarantee [u8; 16] is also no longer valid. We do absolutely now guarantee that a Uuid is exactly a [u8; 16].

@smarnach
Copy link
Author

So I think our only path here is to add a serde::dense module you can use with #[serde(with)] and get yourself a dense representation in your own code. If you do nothing but take UUID updates then you’ll keep the same format.

I think that solution was already implemented in #331.

@Marwes
Copy link
Contributor

Marwes commented Oct 28, 2021

As long as the breaking change is done in a semver incompatible release it is up to users to ensure that they don't break their format (and if you have a binary format I'd really hope you are careful about breaking changes in that format!). Being forced into a suboptimal format forever because users can't be trusted doesn't strike me as a better idea.

@KodrAus
Copy link
Member

KodrAus commented Oct 28, 2021

@Marwes I think the trouble is you don’t necessarily know your type with a private field Uuid is going to be part of a binary format so our breaking change here isn’t necessarily going to be considered breaking by everyone else. You wouldn’t normally consider changes to private dependencies to be part of your public API unless you’re careful to check all uses in types that also implement Serialize. Which is a good idea, but I don’t expect that to be what actually happens.

It would be a shame to be stuck with the redundant length forever though. Maybe it’s ok if we are clear in our release notes that the format can change, so if you’re using Uuid in a Serialize impl then consider upgrading it a breaking change for you too.

@Marwes
Copy link
Contributor

Marwes commented Oct 28, 2021

Anything that implements Serialize/Deserialize on a private type, but then leaks the serialized data doesn't actually have a private type. I suppose that may be easier to miss, but 🤷

@KodrAus
Copy link
Member

KodrAus commented Oct 28, 2021

🤷 indeed. I’m just a bit worried about it being especially easier to miss in libraries that are just updating things without the context we’ve got in this thread that makes it seem really obvious.

The next breaking release I was hoping to do is 1.0 though, so maybe that’s a bump people are more likely to consider carefully than 0.9.

@KodrAus
Copy link
Member

KodrAus commented Oct 28, 2021

I think I’m convinced then we should use the dense format by default in 1.0 and call out front-and-centre in our release notes that UUIDs might be treated differently by formats like bincode.

@KodrAus
Copy link
Member

KodrAus commented Oct 28, 2021

Specifically the case I was thinking about is something like:

// On inspection Uuid looks private
pub struct MyData {
    field_a: i32,
    field_b: bool,
    field_n: Uuid,
}

// lots of other code here

#[cfg(some_feature)]
impl Serialize for MyData {
    // Actually Uuid is not private
}

@KodrAus
Copy link
Member

KodrAus commented Oct 30, 2021

The main branch will now serialize using the compact representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.8.x
  
To do
Issue Triage
  
Needs triage
Development

Successfully merging a pull request may close this issue.

6 participants