Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support deserializing types that borrow data in from_str #505

Closed
wants to merge 5 commits into from
Closed

feat: support deserializing types that borrow data in from_str #505

wants to merge 5 commits into from

Conversation

AlexTMjugador
Copy link

Some structs users may define, such as the following one, do not implement serde's DeserializeOwned trait because they might borrow data from the deserializer, and thus they can't be deserialized for any lifetime:

#[derive(Deserialize)]
struct Message<'a> {
    #[serde(borrow)]
    title: Cow<'a, str>,
    #[serde(borrow)]
    text: Cow<'a, str>
}

Even though, as far as I know, the TOML deserializer implemented in these crates doesn't take advantage of zero-copy deserialization features, the fact that the convenience from_str function has a DeserializeOwned bound prevents it from working with such structs. This is a regression from previous versions of the crate.

For comparison, serde_json's equivalent function has a Deserialize<'a> bound instead, where 'a is the lifetime of the input string, that supports these structs.

Replacing toml::from_str with the more verbose expression T::deserialize(toml::de::Deserializer::new(s)) works in these cases, so I think there is no compelling reason to make this function more restrictive than it needs to be.

This change should be semver-compatible according to my best interpretation of the API evolution RFC.

Comment on lines +41 to +43
pub fn from_str<'a, T>(s: &'a str) -> Result<T, Error>
where
T: serde::de::DeserializeOwned,
T: serde::de::Deserialize<'a>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be semver-compatible according to my best interpretation of the API evolution RFC.

This changes the trait bounds from a more general one to a more specific one which is a breaking change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! 😄 Isn't it the reverse, though? DeserializeOwned is syntax sugar for the higher-ranked trait bound for<'de> Deserialize<'de>, so I'd argue that DeserializeOwned is more specific. The HRTB's RFC seems to support this, stating that:

[...] if I have a trait reference like for<'a> FnMut(&'a int), that would be usable wherever a trait reference with a concrete lifetime, like FnMut(&'static int), is expected.

@@ -38,9 +38,9 @@
/// assert_eq!(config.owner.name, "Lisa");
/// ```
#[cfg(feature = "parse")]
pub fn from_str<T>(s: &'_ str) -> Result<T, Error>
pub fn from_str<'a, T>(s: &'a str) -> Result<T, Error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for borrowed data to ensure this works.

If it does work, then more investigation is needed because it shouldn't according to the serde docs. &str gets parsed into owned types and then we deserialize that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added the test in 44563c2, thanks!

I think that the tested code works because Strings, being an owned type, have a implicit 'static lifetime bound (they can live indefinitely long until dropped). Cows support storing owned data, so a valid lifetime for a Cow that either stores owned data or static strings is Cow<'static, str>. Due to lifetime coercion, a 'static lifetime can be used whenever a shorter lifetime is expected, such as 'de in this case: the lifetime of the serialized data source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that sufficient if the user explicitly declared the field as borrow?

What about if the user isn't using Cow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that sufficient if the user explicitly declared the field as borrow?

Yes, it is. I've pushed a commit that adds that attribute to the field to show that it works too.

What about if the user isn't using Cow?

When using &'a str in the new borrowed_data test, a TomlError is returned with message invalid type: string "bar", expected a borrowed string, which makes sense, because the deserializer does not borrow strings, and it can't move the owned string it deserializes to a reference in the struct.

However, this is not a new error condition: it can already be reproduced by using T::deserialize(toml::de::Deserializer::new(s)).

Morally, I see some value in accepting these kind of structs from users: TOML may just be one of several formats they want to (de)serialize it from, and other formats properly support zero-copy deserialization. Newcomers to serde are very unlikely to mess around with zero-copy deserialization anyway.

Edit: I think it may be useful to point out that serde's author approved these "magical" Cow properties on several obscure issues on its repo: serde-rs/serde-rs.github.io#46, serde-rs/serde-rs.github.io#57. Logically, if Cows are meant to support maybe owned, maybe borrowed data, their lifetime must not be 'static, but somehow derived from the deserializer lifetime, so structs that contain them and want to take advantage of this cannot implement DeserializeOwned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this isn't a morality discussion.

Second, we are shifting a compile time error into a runtime error for the easy path while, as you pointed out, a workaround is available for those that want to borrow. In API design, we should be emphasizing the safe path with the easy path while ideally still supporting the unsafe route. The current solutions fits that to a T. I don't know why the serde docs make the recommendation they do but in this case, I see it being useful.

I'm going to go ahead and close this and ask any further discussion to happen in the appropriate issue (#490).

@@ -38,9 +38,9 @@
/// assert_eq!(config.owner.name, "Lisa");
/// ```
#[cfg(feature = "parse")]
pub fn from_str<T>(s: &'_ str) -> Result<T, Error>
pub fn from_str<'a, T>(s: &'a str) -> Result<T, Error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be reflected in toml_edit as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@epage
Copy link
Member

epage commented Jan 31, 2023

From #505 (comment)

Second, we are shifting a compile time error into a runtime error for the easy path while, as you pointed out, a workaround is available for those that want to borrow. In API design, we should be emphasizing the safe path with the easy path while ideally still supporting the unsafe route. The current solutions fits that to a T. I don't know why the serde docs make the recommendation they do but in this case, I see it being useful.

I'm going to go ahead and close this and ask any further discussion to happen in the appropriate issue (#490).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants