Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors in sequence deserialization #387

Merged
merged 6 commits into from Jun 5, 2022
Merged

Fix errors in sequence deserialization #387

merged 6 commits into from Jun 5, 2022

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented May 21, 2022

Replaces Mingun/fast-xml#12

Fixes #342

This PR adds a bunch of tests to ensure, that deserialization of sequences is correct in all situations.

  • Structs with $value field can now deserialize their other fields from elements, not only from attributes, as it was before:

    <any-tag>
      <before/>
      <item/>
      <item/>
      <item/>
      <after/>
    </any-tag>

    can be deserialized into

    struct AnyName {
      before: (),
      item: Vec<()>,
      after: (),
    }
  • Added a new cargo feature overlapped-lists which allows also interleave sequence
    tags with non-sequence tags in the XML. When use that feature the following XML
    can be deserialized to the struct shown above:

    <any-tag>
      <item/>
      <item/>
      <before/>
      <item/>
      <after/>
      <item/>
    </any-tag>

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@dralley
Copy link
Collaborator

dralley commented May 24, 2022

This is a long PR so it might take me a few days to get through it. Going through the documentation first since it gives me a better picture of what to expect from the code.

src/de/mod.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/errors.rs Outdated Show resolved Hide resolved
src/errors.rs Outdated Show resolved Hide resolved
src/de/mod.rs Show resolved Hide resolved
src/de/mod.rs Outdated Show resolved Hide resolved
src/de/mod.rs Outdated Show resolved Hide resolved
src/de/mod.rs Outdated Show resolved Hide resolved
src/de/mod.rs Outdated Show resolved Hide resolved
src/de/mod.rs Outdated Show resolved Hide resolved
failures (36):
    seq::fixed_name::fixed_size::field_after_list::overlapped
    seq::fixed_name::fixed_size::field_before_list::overlapped
    seq::fixed_name::fixed_size::two_lists::overlapped
    seq::fixed_name::fixed_size::unknown_items::overlapped
    seq::fixed_name::variable_size::field_after_list::overlapped
    seq::fixed_name::variable_size::field_before_list::overlapped
    seq::fixed_name::variable_size::two_lists::overlapped
    seq::fixed_name::variable_size::unknown_items::overlapped
    seq::variable_name::fixed_size::field_after_list::after
    seq::variable_name::fixed_size::field_after_list::before
    seq::variable_name::fixed_size::field_after_list::overlapped
    seq::variable_name::fixed_size::field_before_list::after
    seq::variable_name::fixed_size::field_before_list::before
    seq::variable_name::fixed_size::field_before_list::overlapped
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_after
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_before
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_after
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_before
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_after
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_before
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_after
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_before
    seq::variable_name::variable_size::field_after_list::after
    seq::variable_name::variable_size::field_after_list::before
    seq::variable_name::variable_size::field_after_list::overlapped
    seq::variable_name::variable_size::field_before_list::after
    seq::variable_name::variable_size::field_before_list::before
    seq::variable_name::variable_size::field_before_list::overlapped
    seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_before
    seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_before
    seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_after
    seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_before
    seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_after
    seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_before
@Mingun
Copy link
Collaborator Author

Mingun commented May 30, 2022

@dralley , I've address all your suggestions except one (I have a question about it). Could you finish your review this week? I have a tons of plans and unfortunately they all are a strictly coupled with each other.

(Use Compare button against forse-pushed line to see the difference from the previous version)

@dralley
Copy link
Collaborator

dralley commented May 31, 2022

@Mingun Yeah, I'll take another look tonight

src/de/seq.rs Outdated
/// get a string representation of a tag.
///
/// Returns `true`, if `start` is not in the `fields` list and `false` otherwise.
pub fn is_unknown(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be named more descriptively e.g. is_unknown_{field|tag|element}?

Copy link
Collaborator

@dralley dralley Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: I guess "unknown" here is referring to lookahead and what element will come next rather than what elements should be accepted by the serializer?


"unknown" also sounds a little strange in this context since we theoretically do known the names of the enum variants?

re:

enum Choice { one, two, three }
struct Root {
    #[serde(rename = "$value")]
    item: [Choice; 3],
    node: (),
}

and this struct should be able to deserialized from

<root>
    <one/>
    <two/>
    <three/>
    <node/>
</root>

Maybe tag_maps_to_field? (That name sounds a bit wrong as well, so if you prefer the former names I'm fine with that)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose is_unknown simple in contrary to is_known (i.e. if tag is listed in fields it is known for a map), but because each usage of is_known() would be !is_known() I inverted it. The method does not have any specific cognitive load, so just not_in should be enough.

src/de/map.rs Outdated Show resolved Hide resolved
src/de/seq.rs Outdated Show resolved Hide resolved
src/de/map.rs Outdated Show resolved Hide resolved
src/de/seq.rs Outdated Show resolved Hide resolved
src/de/seq.rs Outdated
/// so we take an any element. But because in XML sequences a flattened into a
/// maps, then we could take an elements that have their own dedicated fields
/// in a struct. To prevent this we use an `Exclude` filter, that filters out
/// any known names of a struct fields.
Copy link
Collaborator

@dralley dralley Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// In the second case we don't know what tag name should be expected as a
/// sequence element, so we accept any element. Since the sequence are flattened
/// into maps, we collect elements which have dedicated fields in a struct by using an 
/// `Exclude` filter that filters out elements with names matching field names from the struct.

This language is kind of subtle so let me know if I butchered the meaning. Again though I guess I'm not sure I understand why we can't know the names of the enum fields - not familiar with Serde's API.

Copy link
Collaborator Author

@Mingun Mingun Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then

/// In the second case we don't know what tag name should be expected as a
/// sequence element, so we accept any element. Since the sequence are flattened
/// into maps, we skip elements which have dedicated fields in a struct by using an
/// Exclude filter that filters out elements with names matching field names from the struct.

This describes that fact, that struct

struct AnyName {
  field: (),
  list: Vec<()>,
}

in XML would be represented as

<any-tag>
  <field/>
  <list/>
  <list/>
<any-tag>

The JSON equivalent would be

{
  "field": null,
  "list": null,
  "list": null
}

instead of correct

{
  "field": null,
  "list": [null, null]
}

Because collection element could be an enum and enums in XML could be represented as

<variant-name .../>

and because the type of each struct field is opaque to us at the AnyName deserialization level, we should consider all tags as a potential list items, if one of our struct fields will be a sequence (for example, a Vec<T>), with one exception: if we are know with guarantee that a tag with that name is mapped to another field. We list all names of that tags in the Exclude variant and this list is a fields list of a struct.

Copy link
Collaborator

@dralley dralley Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I always thought the most natural XML representation was akin to the JSON one, e.g.

<any-tag>
  <field/>
  <list>
    <item/>
    <item/>
    <item/>
  </list>
<any-tag>

There would then be no ambiguity about which tags map to which fields. But given we have to process whatever the user provided, and their schema might not be structured this way, I see your point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be nice for this to be the case, but I often see such XML. If you think about it, this is natural, given how easy it is to describe such a structure in XSD:

<xs:complexType>
  <xs:sequence>
    <xs:element name="field"/>
    <xs:element name="list" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

src/de/seq.rs Outdated Show resolved Hide resolved
Changelog.md Outdated Show resolved Hide resolved
Changelog.md Outdated Show resolved Hide resolved
src/de/map.rs Outdated Show resolved Hide resolved
Example of a struct, that should be a valid definition:

    enum Choice { one, two, three }
    struct Root {
        #[serde(rename = "$value")]
        item: [Choice; 3],
        node: (),
    }

and this struct should be able to deserialized from

    <root>
        <one/>
        <two/>
        <three/>
        <node/>
    </root>

The following tests were fixed (10):
    seq::variable_name::fixed_size::field_after_list::after
    seq::variable_name::fixed_size::field_after_list::before
    seq::variable_name::fixed_size::field_before_list::after
    seq::variable_name::fixed_size::field_before_list::before
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_after
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_before
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_after
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_before
    seq::variable_name::variable_size::field_after_list::before
    seq::variable_name::variable_size::field_before_list::before

failures (26):
    seq::fixed_name::fixed_size::field_after_list::overlapped
    seq::fixed_name::fixed_size::field_before_list::overlapped
    seq::fixed_name::fixed_size::two_lists::overlapped
    seq::fixed_name::fixed_size::unknown_items::overlapped
    seq::fixed_name::variable_size::field_after_list::overlapped
    seq::fixed_name::variable_size::field_before_list::overlapped
    seq::fixed_name::variable_size::two_lists::overlapped
    seq::fixed_name::variable_size::unknown_items::overlapped
    seq::variable_name::fixed_size::field_after_list::overlapped
    seq::variable_name::fixed_size::field_before_list::overlapped
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_after
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_before
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_after
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_before
    seq::variable_name::variable_size::field_after_list::after
    seq::variable_name::variable_size::field_after_list::overlapped
    seq::variable_name::variable_size::field_before_list::after
    seq::variable_name::variable_size::field_before_list::overlapped
    seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_before
    seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_before
    seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_after
    seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_before
    seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_after
    seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_before

Co-authored-by: Daniel Alley <dalley@redhat.com>
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2022

Codecov Report

Merging #387 (59a5c76) into master (015079b) will increase coverage by 1.93%.
The diff coverage is 87.64%.

@@            Coverage Diff             @@
##           master     #387      +/-   ##
==========================================
+ Coverage   58.80%   60.73%   +1.93%     
==========================================
  Files          19       19              
  Lines        9537     9843     +306     
==========================================
+ Hits         5608     5978     +370     
+ Misses       3929     3865      -64     
Flag Coverage Δ
unittests 60.73% <87.64%> (+1.93%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/errors.rs 12.90% <0.00%> (-0.15%) ⬇️
src/de/map.rs 88.74% <79.10%> (-7.97%) ⬇️
src/de/mod.rs 75.62% <88.61%> (+13.99%) ⬆️
src/de/seq.rs 92.59% <93.18%> (+3.11%) ⬆️
src/reader.rs 88.13% <100.00%> (+0.08%) ⬆️
src/lib.rs 25.96% <0.00%> (+0.16%) ⬆️
src/events/mod.rs 73.87% <0.00%> (+0.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 015079b...59a5c76. Read the comment docs.

src/de/map.rs Outdated
/// Determines, should [`Deserializer::next_text_impl()`] expand the second
/// level of tags or not.
///
/// If this field is `true`, we processes the following XML shape:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If this field is `true`, we processes the following XML shape:
/// If this field is `true`, we process the following XML shape:

- name: Run tests (all features)
env:
LLVM_PROFILE_FILE: coverage/all-features-%p-%m.profraw
run: cargo test --all-features
Copy link
Collaborator

@dralley dralley Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should cover testing both with and without overlapped-lists which is a concern because of all the alternate code 👍

Copy link
Collaborator Author

@Mingun Mingun Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is why I've added this.

@@ -42,6 +42,52 @@ default = []
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding
encoding = ["encoding_rs"]

## This feature enables support for deserializing lists where tags are overlapped
## with tags that do not correspond to the list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about the use case, is this basically for situations where you're parsing messy handwritten XML / HTML? Generally I would think machine-generated XML wouldn't have this problem.

It might be worth providing that as an example of a situation where you might want to enable it if that is the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I cannot remember that would I worked with such XMLs, but all parsers, that I know, very tolerant to overlapped lists (even in xs:sequence, although XSD requires exact order of tags. I never seen using xs:all that allows arbitrary order). So I thought there should be an option to allow this behavior

Copy link

@k-bx k-bx Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dralley I'm trying to parse KML file (from Google Earth Pro) which can have elements of types Folder, Document and Point interleaved in a single list. I'd like to ideally get them in the same order, so I tried to create an enum covering all three cases, but it didn't work. So this at least gives me an option to parse it as three lists without the preservation of order in-between the items.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k-bx, if you case still didn't work, could you describe it in more detail? That is what that definitely should work

Copy link

@k-bx k-bx Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun it works good enough, but not perfectly. The desired way to handle things is:

input: <TagOne></TagOne><TagTwo></TagTwo><TagOne></TagOne>

parsed as:

struct TagOne {};
struct TagTwo {};
enum Tag {
  TagOne(TagOne),
  TagTwo(TagTwo)
}

-- result should be
vec![Tag::TagOne(TagOne {}), Tag::TagTwo(TagTwo {}), Tag::TagOne(TagOne {})]

What you get is a perfect preservation of the order in which you get your mixed tags.

I didn't see a way to do something like this via the current lib ^

Do I need to create a separate issue to make something like this possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an error, I filled #500 for it

Mingun and others added 4 commits June 5, 2022 21:06
The following tests were fixed (6):
    seq::variable_name::variable_size::field_after_list::after
    seq::variable_name::variable_size::field_before_list::after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_before
    seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_after
    seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_before

failures (20):
    seq::fixed_name::fixed_size::field_after_list::overlapped
    seq::fixed_name::fixed_size::field_before_list::overlapped
    seq::fixed_name::fixed_size::two_lists::overlapped
    seq::fixed_name::fixed_size::unknown_items::overlapped
    seq::fixed_name::variable_size::field_after_list::overlapped
    seq::fixed_name::variable_size::field_before_list::overlapped
    seq::fixed_name::variable_size::two_lists::overlapped
    seq::fixed_name::variable_size::unknown_items::overlapped
    seq::variable_name::fixed_size::field_after_list::overlapped
    seq::variable_name::fixed_size::field_before_list::overlapped
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_after
    seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_before
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_after
    seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_before
    seq::variable_name::variable_size::field_after_list::overlapped
    seq::variable_name::variable_size::field_before_list::overlapped
    seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_after
    seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_before
    seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_after
    seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_before

Co-authored-by: Daniel Alley <dalley@redhat.com>
…outside it

This fixes a data race with nested maps in sequence fields.
Also, top-level sequences always have `has_value_field == false`,
so no need to check it

Co-authored-by: Daniel Alley <dalley@redhat.com>
Example of such XML:

  <item/>
  <another-item/>
  <item/>
  <item/>

Here we need to skip `<another-item/>` in order to collect all `<item/>`s.
So ability to skip events and replay them later was added

This fixes all remaining tests

Co-authored-by: Daniel Alley <dalley@redhat.com>
…onsumption

Co-authored-by: Daniel Alley <dalley@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug enhancement serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deserialization fails when choice and element are in complex element
4 participants