Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding empty tag behavior #741

Open
phdavis1027 opened this issue Apr 26, 2024 · 3 comments
Open

Understanding empty tag behavior #741

phdavis1027 opened this issue Apr 26, 2024 · 3 comments
Labels

Comments

@phdavis1027
Copy link

First of all, I want to thank everyone involved in this project for the excellent work they've done. It's absurdly fast and fits great in my project.

I have a question about expected behavior for empty tags. I have some XML that looks like this:

...
<value></value>
<value></value>
...
<value></value>
...

That is being parsed by this code:

match (state, reader.read_event()?) {
 (State::ResultsInnerValueInner, Event::Text(e)) => {
         column.push(e.unescape_with(irods_unescapes)?.to_string());
         State::ResultsInnerValue
   }
}

When I later print this value out, it has the value "\n". Is this expected behavior? I think I've seen it a couple other times. I would have guessed that the output would be the empty &str.

@Mingun
Copy link
Collaborator

Mingun commented Apr 26, 2024

I cannot say what the reason of this without the full code, but I believe that you've get the text between </value> and next <value>. You should check that your state management is correct.

It also would be good to use dbg!(state, reader.read_event()?) to see that you've match exactly.

@phdavis1027
Copy link
Author

Oh interesting. I suppose I assumed that Text events only occurred in the context of something like <tag>...</tag>, but debugging does seem to show that they're appearing in </tag><tag> contexts and I've just gotten lucky so far. Thanks for the lead.

@Mingun
Copy link
Collaborator

Mingun commented Apr 26, 2024

Also, just consuming Event::Texts is error-prone. In XML all text events should be concatenated together with CDATA contents and you should drop any comments between them. The code that takes into account all the nuances is quite large, but unfortunately, there is no good API out of box in quick-xml for this (note self.drain_text(...)):

quick-xml/src/de/mod.rs

Lines 2222 to 2243 in e8ae020

fn next(&mut self) -> Result<DeEvent<'i>, DeError> {
loop {
return match self.next_impl()? {
PayloadEvent::Start(e) => Ok(DeEvent::Start(e)),
PayloadEvent::End(e) => Ok(DeEvent::End(e)),
PayloadEvent::Text(mut e) => {
if self.need_trim_end() && e.inplace_trim_end() {
continue;
}
self.drain_text(e.unescape_with(|entity| self.entity_resolver.resolve(entity))?)
}
PayloadEvent::CData(e) => self.drain_text(e.decode()?),
PayloadEvent::DocType(e) => {
self.entity_resolver
.capture(e)
.map_err(|err| DeError::Custom(format!("cannot parse DTD: {}", err)))?;
continue;
}
PayloadEvent::Eof => Ok(DeEvent::Eof),
};
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants