New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix errors in sequence deserialization #387
Changes from all commits
d2ad730
55f541b
6a52ecc
1142da4
7a356ec
59a5c76
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,52 @@ default = [] | |
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding | ||
encoding = ["encoding_rs"] | ||
|
||
## This feature enables support for deserializing lists where tags are overlapped | ||
## with tags that do not correspond to the list. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious about the use case, is this basically for situations where you're parsing messy handwritten XML / HTML? Generally I would think machine-generated XML wouldn't have this problem. It might be worth providing that as an example of a situation where you might want to enable it if that is the case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I cannot remember that would I worked with such XMLs, but all parsers, that I know, very tolerant to overlapped lists (even in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dralley I'm trying to parse KML file (from Google Earth Pro) which can have elements of types Folder, Document and Point interleaved in a single list. I'd like to ideally get them in the same order, so I tried to create an enum covering all three cases, but it didn't work. So this at least gives me an option to parse it as three lists without the preservation of order in-between the items. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @k-bx, if you case still didn't work, could you describe it in more detail? That is what that definitely should work There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Mingun it works good enough, but not perfectly. The desired way to handle things is: input: parsed as:
What you get is a perfect preservation of the order in which you get your mixed tags. I didn't see a way to do something like this via the current lib ^ Do I need to create a separate issue to make something like this possible? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this is an error, I filled #500 for it |
||
## | ||
## When this feature is enabled, the XML: | ||
## ```xml | ||
## <any-name> | ||
## <item/> | ||
## <another-item/> | ||
## <item/> | ||
## <item/> | ||
## </any-name> | ||
## ``` | ||
## could be deserialized to a struct: | ||
## ```ignore | ||
## #[derive(Deserialize)] | ||
## #[serde(rename_all = "kebab-case")] | ||
## struct AnyName { | ||
## item: Vec<()>, | ||
## another_item: (), | ||
## } | ||
## ``` | ||
## | ||
## When this feature is not enabled (default), only the first element will be | ||
## associated with the field, and the deserialized type will report an error | ||
## (duplicated field) when the deserializer encounters a second `<item/>`. | ||
## | ||
## Note, that enabling this feature can lead to high and even unlimited memory | ||
## consumption, because deserializer should check all events up to the end of a | ||
## container tag (`</any-name>` in that example) to figure out that there are no | ||
## more items for a field. If `</any-name>` or even EOF is not encountered, the | ||
## parsing will never end which can lead to a denial-of-service (DoS) scenario. | ||
## | ||
## Having several lists and overlapped elements for them in XML could also lead | ||
## to quadratic parsing time, because the deserializer must check the list of | ||
## events as many times as the number of sequence fields present in the schema. | ||
## | ||
## To reduce negative consequences, always [limit] the maximum number of events | ||
## that [`Deserializer`] will buffer. | ||
## | ||
## This feature works only with `serialize` feature and has no effect if `serialize` | ||
## is not enabled. | ||
## | ||
## [limit]: crate::de::Deserializer::event_buffer_size | ||
## [`Deserializer`]: crate::de::Deserializer | ||
overlapped-lists = [] | ||
|
||
## Enables support for [`serde`] serialization and deserialization | ||
serialize = ["serde"] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this should cover testing both with and without
overlapped-lists
which is a concern because of all the alternate code 👍There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is why I've added this.