Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable trim_text in Deserializer from_reader #285

Open
woodworker opened this issue Apr 17, 2021 · 14 comments · May be fixed by #561
Open

Disable trim_text in Deserializer from_reader #285

woodworker opened this issue Apr 17, 2021 · 14 comments · May be fixed by #561
Labels
enhancement help wanted serde Issues related to mapping from Rust types to XML

Comments

@woodworker
Copy link

Is there a easy way to set trim_text to false in the Deserializer::from_str when i use quick_xml::de::from_str?

quick-xml/src/de/mod.rs

Lines 160 to 167 in a4be484

pub fn from_reader(reader: R) -> Self {
let mut reader = Reader::from_reader(reader);
reader
.expand_empty_elements(true)
.check_end_names(true)
.trim_text(true);
Self::new(reader)
}

@ImJeremyHe
Copy link

Is there a way to determine whether set trim_text by looking if there is xml:space = "preserve"?
For example:

<t xml:space="preserve">Text </t>

The trailing space here should not be trimmed.

@tafia
Copy link
Owner

tafia commented May 12, 2021

There are very little customization on the serde deserializer so far. I don't think there is any major blocking point, someone just needs to write it.

@Mingun Mingun added enhancement help wanted serde Issues related to mapping from Rust types to XML labels May 21, 2022
@Mingun
Copy link
Collaborator

Mingun commented May 21, 2022

In the coming release Deserializer::new would be public and you could create a deserializer from a Reader (but do not turn off expand_empty_elements! For now Deserializer is not prepared for that).

Processing of xml:space still waits its own PR

@naumazeredo
Copy link

In the coming release Deserializer::new would be public

Was this implemented already? Deserializer::new is public in the latest version, but it seems useless since XmlRead can't be implemented outside of quick_xml and SliceReader and IoReader can't be instantiated also. There's no way to use Deserializer::new, unless I'm missing something here.

@Mingun
Copy link
Collaborator

Mingun commented Jul 28, 2022

Yes, this is oversight. So, currently this is still not possible, even in master. Need to think about better API. I also would to provide an API to create a deserializer for a part of XML, so you can mix usual Reader usage with the Deserializer usage, for example, to support streaming deserialization.

@Mingun
Copy link
Collaborator

Mingun commented Jul 28, 2022

According to the original use case -- I do not think that simply disabling trim_text would be usable -- it seems that you'll just break deserialization of pretty-printed XMLs at all with that setting

@naumazeredo
Copy link

Yeah, that's exactly what happened. I didn't get why that option, even though internal, exists.
I've sadly moved back to serde_xml_rs since they give an option to not trim and I'm not willing to spend more time debugging xml deserialization right now. I'll be trying quick_xml in the future in case it gets more versatility

@dralley
Copy link
Collaborator

dralley commented Jul 29, 2022

The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements. It should be possible (and probably the default) to ignore the latter without affecting the text contents of elements themselves.

Having an option for trimming spaces around text contents is nice of course, but not at all necessary (the user could easily do this themselves) and as this issue points out it is more difficult to do "correctly" than originally envisioned. Maybe we should eliminate this feature and just keep the "ignore spaces between XML elements" functionality?

@Mingun
Copy link
Collaborator

Mingun commented Jul 29, 2022

The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements.

Yes, I think, we should move in that direction. A couple of thoughts:

  • need to take into account a Deserializing tag with attribute values into Map #383 problem. I think, it can be solved by introducing a method to read all content as a string regardless of the XML markup inside:
    impl Reader {
      /// For XML
      ///
      /// <outer>  <inner/>   </outer>
      ///
      /// - can be called after BytesStart("outer")
      /// - returns "  <inner/>   "
      /// - consumes BytesEnd("outer")
      fn read_as_text(&mut self, end: QName) -> Result<Cow<str>> { ... }
    }
    // or maybe better (except for a long name :( )
    impl BytesStart {
      fn read_to_end_as_text(&self, reader: &mut Reader) -> Result<Cow<str>> { ... }
    }
  • we need a lookahead to decide if whitespace is significant or not (==determine the shape of the next tag -- opening/closing/self-closing/comment/PI/CData?) -- Is there any way to read an event and not consume it? #414 is related. Should we, for example, allow XML comments in whitespace-significant parts (<outer> <!----> </outer>)? If yes, that would require a potentially infinity lookahead

@Pastequee
Copy link

Hello, little update here, since Deserializer::new still is not public we can't have a from_str or any other variation where it does not trim the content. So I can work on a quick PR for that. But you already discussed about that, so have you a preferred solution ? I've thought about juste adding a from_reader implementation that takes a Reader (not something that implements IoReader) so that you can change the reader's attributes without having access to the Deserializer. This function will just force the attribute expand_empty_elements since you said it is required for the moment.

@Mingun
Copy link
Collaborator

Mingun commented Dec 29, 2022

Just disabling trim_text will not work correctly for pretty-printed XMLs, therefore I doubt that so limited implementation would be useful in mass. Actually, the trim_text* options should not exist at that level of parsing -- it is just wrong place to do trim. I'm working on proper trim implementation in #520 and I plan to implement it in 0.28 which is probably would be released 2-3 months later. After that probably this problem will gone (but maybe not).

Well, I think, that we could add a Deserializer::trim_text(trim: bool) method as a temporary solution, with a proper warning that it will not work for XMLs with pretty-printed parts.

@Mingun
Copy link
Collaborator

Mingun commented Mar 4, 2023

I've just created a #572. When it would be merged, we could change the content of an introduced Text type. We should change it definition to:

struct Text<'a> {
    /// Untrimmed text after concatenating content of all
    /// [`Text`] and [`CData`] events
    text: Cow<'a, str>,
    /// A range into `text` which contains data after trimming
    content: Range<usize>,
}

Such a change will open a door to use a per-field control for trimming

@nsunderland1
Copy link

Are you still looking at fixing this? If not, what remains to be done? This is a breaking issue for my team, and we may be interested in contributing in order to help fix it.

@Mingun
Copy link
Collaborator

Mingun commented Nov 8, 2023

I did not put my efforts in this issue since my last comment. Because #572 was merged, we can move forward by the way outlined in that comment. We also can add a way to globally disable trimming, but I think such setting will have a limited usefulness. If you wish feel free to explore those opportunities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants