Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trimming option for the de::from_* functions #561

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Pastequee
Copy link

Reference #520 (comment).
So I did the feature in ONE way possible because there are many different ways this can be implemented.
I choose to modify the current functions for this to work but this can change easily and I will explain the other alternatives I've thought about at the end. What I really wanted to discuss here is if the logic of extending the StartTrimmer to a generic start and end Trimmer is fine? And if giving to all the different Reader is also good? I choose a rather generic and extensible solution. Just want to know if you don't see it made like that or if this is rather fine.

For the different way of expressing the config parameter, I've chosen to add a parameter to from_str but its breaks the current API so it might not be the best solution. I thought of 2 other ways of doing this.
The first one: add another function from_str_option that will do the exact same as the current from_str but will take the extra parameter, and the old from_str will just call from_str_option with a default Trimmer.
The second option: Transform from_str to a macro to be able to have the Trimmer parameter optional, and if not given construct a default one.

When I talk about from_str it is the high-level one not the inner function of Deserializer

@Pastequee
Copy link
Author

There is a lot of changes because I changed the from_str prototype, I needed to change all the related tests to make this work.

@Mingun
Copy link
Collaborator

Mingun commented Feb 24, 2023

I prefer to not change the creation functions and instead provide the get / set functions for config parameters (or the whole config struct).

I think, that is is better to not change the StartTrimmer struct and instead just do not call trimming if the trim option is disabled. The options itself can lied in the Deserializer / XmlReader struct or in the new public Config (or probably, DeConfig?) struct.

@Pastequee
Copy link
Author

I understand the point, but I've tried to do it like that, which is not satisfying. There is no way to do a global config variable that can be changed and used by XmlReader. Rust is complaining unless I use unsafe code which is not a good option in my opinion.
The other option would be to have a static global variable that is a lazy loaded Arc, but still.

And the main problem with this version is that each Deserializer will have the same config, at one given time. If working with a multi-threaded parser, each one should be able to have its own config without interfering with others.
Let me know what you think, I'll keep looking for options.

@Pastequee
Copy link
Author

Ok well, I used what you did in #572 and what you said here #285 (comment), I think it's a great idea to enable/disable trimming per Struct or per field to have more control.
A nice thing would be adding another custom proc macro on serde like this.

#[serde(no_trim)]
struct Foo {
  #[serde(no_trim)]
  elem: String,
}

And the default value of this is to trim content to keep compatibility.
I don't have any idea of how to really do this, if you have doc or example of a similar implementation, it'd be welcome

@Mingun
Copy link
Collaborator

Mingun commented Mar 8, 2023

Additional attribute would be great to have. Unfortunately, serde does not provide a way to attach attributes, so we need to encode everything in the name, like we do that for the attribute indicator (@). The overall idea is to encode trimming mode in the name of field. The special dedicated attribute also should be possible to implement, because attribute macros and would transform AST before the derive run, if they are placed before derive:

#[xml] // runs first and could rewrite #[xml] helpers to #[serde(rename = "-field")] for example
#[derive(Deserialize)]
struct Xml {
  #[xml(no_trim)] // helper attribute for #[xml] on struct
  field: String,
}

But this is advanced usage which, I think, you shouldn't care here about. This change is better to make in the separate PR.

While working on #574 I've found this: https://www.w3.org/TR/xmlschema11-2/#built-in-primitive-datatypes

This means, that Deserializer could itself determine if trimming is needed. Practically speaking, when deserialize with deserialize_borrowed_str / deserialize_str / deserialize_string we shouldn't trim, in all other cases should.

We could also add a flag "trim / not trim" to the Deserializer (which, actually, probably should be stored in ContentDeserializer), but I recommend first check if the behavior specified specified in the XML specification suits you. If yes, then implement it -- we shouldn't trim during reading, but only store a range with boundaries of not-trimmed content; apply trim when deserialize primitives except strings.

@Mingun Mingun marked this pull request as draft April 8, 2023 16:50
@Mingun Mingun added enhancement serde Issues related to mapping from Rust types to XML labels Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable trim_text in Deserializer from_reader
2 participants