Skip to content

Commit

Permalink
seq: Allow overlapping between list items and other items
Browse files Browse the repository at this point in the history
Example of such XML:

  <item/>
  <another-item/>
  <item/>
  <item/>

Here we need to skip `<another-item/>` in order to collect all `<item/>`s.
So ability to skip events and replay them later was added

This fixes all remaining tests
  • Loading branch information
Mingun committed May 21, 2022
1 parent 0266831 commit 2e12fc9
Show file tree
Hide file tree
Showing 7 changed files with 794 additions and 106 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/rust.yml
Expand Up @@ -22,6 +22,8 @@ jobs:
run: cargo test --features encoding,serialize
- name: Run tests (escape-html+serialize)
run: cargo test --features escape-html,serialize
- name: Run tests (all features)
run: cargo test --all-features
- name: Check fmt
run: cargo fmt -- --check

40 changes: 40 additions & 0 deletions Cargo.toml
Expand Up @@ -42,6 +42,46 @@ default = []
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding
encoding = ["encoding_rs"]

## This feature enables support of deserializing of lists which tags are overlapped
## with tags that does not correspond to the list.
##
## When this feature is enabled, that XML:
## ```xml
## <any-name>
## <item/>
## <another-item/>
## <item/>
## <item/>
## </any-name>
## ```
## could be deserialized to a struct:
## ```ignore
## #[derive(Deserialize)]
## #[serde(rename_all = "kebab-case")]
## struct AnyName {
## item: Vec<()>,
## another_item: (),
## }
## ```
##
## When feature is not enabled (default), only first element will be assotiated
## with a field, and deserializer will report an error when it encounter a second
## `<item/>`.
##
## Note, that enabling this feature can lead to high and even unlimited memory
## consumption, because deserializer should check all events up to the end of a
## container tag (`</any-name>` in that example) to figure out that there are no
## more items for a field. If `</any-name>` or EOF even not encountered, the
## parsing will never end which can lead to DoS.
##
## Having several lists and overlapped elements for them in XML could also lead
## to quadratic parsing time, because deserialzier have to check list of events
## as many times as count of sequence fields present.
##
## This feature works only with `serialize` feature and has no effect if `serialize`
## is not enabled.
overlapped-lists = []

## Enables support for [`serde`] serialization and deserialization
serialize = ["serde"]

Expand Down
5 changes: 5 additions & 0 deletions Changelog.md
Expand Up @@ -10,6 +10,11 @@

## Unreleased

### New Features

- [#12]: Allow overlapping between elements of sequence and other elements
(using new feature `overlapped-lists`)

### Bug Fixes

- [#9]: Deserialization erroneously was successful in some cases where error is expected.
Expand Down
36 changes: 24 additions & 12 deletions src/de/map.rs
Expand Up @@ -105,6 +105,9 @@ enum ValueSource {
/// [list of known fields]: MapAccess::fields
Content,
/// Next value should be deserialized from an element with a dedicated name.
/// If deserialized type is a sequence, then that sequence will collect all
/// elements with the same name until it will be filled. If not all elements
/// would be consumed, the rest will be ignored.
///
/// That state is set when call to [`peek()`] returns a [`Start`] event, which
/// [`name()`] represents a field name. That name will be deserialized as a key.
Expand Down Expand Up @@ -585,20 +588,29 @@ where
T: DeserializeSeed<'de>,
{
let decoder = self.map.de.reader.decoder();
match self.map.de.peek()? {
// Stop iteration when list elements ends
DeEvent::Start(e) if !self.filter.is_suitable(&e, decoder)? => Ok(None),
loop {
break match self.map.de.peek()? {
// If we see a tag that we not interested, skip it
#[cfg(feature = "overlapped-lists")]
DeEvent::Start(e) if !self.filter.is_suitable(&e, decoder)? => {
self.map.de.skip()?;
continue;
}
// Stop iteration when list elements ends
#[cfg(not(feature = "overlapped-lists"))]
DeEvent::Start(e) if !self.filter.is_suitable(&e, decoder)? => Ok(None),

// Stop iteration after reaching a closing tag
DeEvent::End(e) if e.name() == self.map.start.name() => Ok(None),
// This is a unmatched closing tag, so the XML is invalid
DeEvent::End(e) => Err(DeError::UnexpectedEnd(e.name().to_owned())),
// We cannot get `Eof` legally, because we always inside of the
// opened tag `self.map.start`
DeEvent::Eof => Err(DeError::UnexpectedEof),
// Stop iteration after reaching a closing tag
DeEvent::End(e) if e.name() == self.map.start.name() => Ok(None),
// This is a unmatched closing tag, so the XML is invalid
DeEvent::End(e) => Err(DeError::UnexpectedEnd(e.name().to_owned())),
// We cannot get `Eof` legally, because we always inside of the
// opened tag `self.map.start`
DeEvent::Eof => Err(DeError::UnexpectedEof),

// Start(tag), Text, CData
_ => seed.deserialize(&mut *self.map.de).map(Some),
// Start(tag), Text, CData
_ => seed.deserialize(&mut *self.map.de).map(Some),
};
}
}
}

0 comments on commit 2e12fc9

Please sign in to comment.