diff --git a/Changelog.md b/Changelog.md index c74a7b46..e1ce1afa 100644 --- a/Changelog.md +++ b/Changelog.md @@ -68,6 +68,7 @@ Refer to [documentation] for details. - [#521]: MSRV bumped to 1.52. - [#473]: `serde` feature that used to make some types serializable, renamed to `serde-types` +- [#528]: Added documentation for XML to `serde` mapping [#473]: https://github.com/tafia/quick-xml/issues/473 [#490]: https://github.com/tafia/quick-xml/pull/490 @@ -76,6 +77,7 @@ [#517]: https://github.com/tafia/quick-xml/issues/517 [#521]: https://github.com/tafia/quick-xml/pull/521 [#523]: https://github.com/tafia/quick-xml/pull/523 +[#528]: https://github.com/tafia/quick-xml/pull/528 [XML name]: https://www.w3.org/TR/xml11/#NT-Name [documentation]: https://docs.rs/quick-xml/0.27.0/quick_xml/de/index.html#difference-between-text-and-value-special-names diff --git a/src/de/mod.rs b/src/de/mod.rs index 8fc70d84..efaf49f8 100644 --- a/src/de/mod.rs +++ b/src/de/mod.rs @@ -1,6 +1,1356 @@ -//! Serde `Deserializer` module +//! Serde `Deserializer` module. //! -//! # Difference between `$text` and `$value` special names +//! Due to the complexity of the XML standard and the fact that serde was developed +//! with JSON in mind, not all serde concepts apply smoothly to XML. This leads to +//! that fact that some XML concepts are inexpressible in terms of serde derives +//! and may require manual deserialization. +//! +//! The most notable restriction is the ability to distinguish between _elements_ +//! and _attributes_, as no other format used by serde has such a conception. +//! +//! Due to that the mapping is performed in a best effort manner. +//! +//! +//! +//! Table of Contents +//! ================= +//! - [Mapping XML to Rust types](#mapping-xml-to-rust-types) +//! - [Optional attributes and elements](#optional-attributes-and-elements) +//! - [Choices (`xs:choice` XML Schema type)](#choices-xschoice-xml-schema-type) +//! - [Sequences (`xs:all` and `xs:sequence` XML Schema types)](#sequences-xsall-and-xssequence-xml-schema-types) +//! - [Composition Rules](#composition-rules) +//! - [Difference between `$text` and `$value` special names](#difference-between-text-and-value-special-names) +//! - [`$text`](#text) +//! - [`$value`](#value) +//! - [Primitives and sequences of primitives](#primitives-and-sequences-of-primitives) +//! - [Structs and sequences of structs](#structs-and-sequences-of-structs) +//! - [Enums and sequences of enums](#enums-and-sequences-of-enums) +//! +//! +//! +//! Mapping XML to Rust types +//! ========================= +//! +//! Type names are never considered when deserializing, so you can name your +//! types as you wish. Other general rules: +//! - `struct` field name could be represented in XML only as an attribute name +//! or an element name; +//! - `enum` variant name could be represented in XML only as an attribute name +//! or an element name; +//! - the unit struct, unit type `()` and unit enum variant can be deserialized +//! from any valid XML content: +//! - attribute and element names; +//! - attribute and element values; +//! - text or CDATA content (including mixed text and CDATA content). +//! +//!
To parse all these XML's... | ...use that Rust type(s) |
---|---|
+//! Content of attributes and text / CDATA content of elements (including mixed
+//! text and CDATA content):
+//!
+//! ```xml
+//! <... ...="content" />
+//! ```
+//! ```xml
+//! <...>content
+//! ```
+//! ```xml
+//! <...>
+//! ```
+//! ```xml
+//! <...>texttext
+//! ```
+//!
+//!
+//! Merging of the text / CDATA content is tracked in the issue [#474] and
+//! will be available in the next release.
+//!
+//! |
+//!
+//!
+//! You can use any type that can be deserialized from an `&str`, for example:
+//! - [`String`] and [`&str`]
+//! - [`Cow
+//!
+//! NOTE: deserialization to non-owned types (i.e. borrow from the input),
+//! such as `&str`, is possible only if you parse document in the UTF-8
+//! encoding and content does not contain entity references such as `&`,
+//! or character references such as `
`, as well as text content represented
+//! by one piece of [text] or [CDATA] element.
+//!
+//!
+//!
+//! [text]: Event::Text
+//! [CDATA]: Event::CData
+//! |
+//!
+//!
+//! Content of attributes and text / CDATA content of elements (including mixed
+//! text and CDATA content), which represents a space-delimited lists, as
+//! specified in the XML Schema specification for [`xs:list`] `simpleType`:
+//!
+//! ```xml
+//! <... ...="element1 element2 ..." />
+//! ```
+//! ```xml
+//! <...>
+//! element1
+//! element2
+//! ...
+//!
+//! ```
+//! ```xml
+//! <...>
+//! ```
+//!
+//!
+//! Merging of the text / CDATA content is tracked in the issue [#474] and
+//! will be available in the next release.
+//!
+//!
+//! [`xs:list`]: https://www.w3.org/TR/xmlschema11-2/#list-datatypes
+//! |
+//!
+//!
+//! Use any type that deserialized using [`deserialize_seq()`] call, for example:
+//!
+//! ```
+//! // FIXME: #474, merging mixed text / CDATA
+//! // content does not work yet
+//! type List = Vec
+//!
+//! NOTE: according to the XML Schema restrictions, you cannot escape those
+//! white-space characters, so list elements will _never_ contain them.
+//! In practice you will usually use `xs:list`s for lists of numbers or enumerated
+//! values which looks like identifiers in many languages, for example, `item`,
+//! `some_item` or `some-item`, so that shouldn't be a problem.
+//!
+//! NOTE: according to the XML Schema specification, list elements can be
+//! delimited only by spaces. Other delimiters (for example, commas) are not
+//! allowed.
+//!
+//!
+//!
+//! [`deserialize_seq()`]: de::Deserializer::deserialize_seq
+//! |
+//!
+//! A typical XML with attributes. The root tag name does not matter:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! A structure where each XML attribute is mapped to a field with a name
+//! starting with `@`. Because Rust identifiers do not permit the `@` character,
+//! you should use the `#[serde(rename = "@...")]` attribute to rename it.
+//! The name of the struct itself does not matter:
+//!
+//! ```
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # type U = ();
+//! // Get both attributes
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! #[serde(rename = "@one")]
+//! one: T,
+//!
+//! #[serde(rename = "@two")]
+//! two: U,
+//! }
+//! # quick_xml::de::from_str::
+//!
+//! NOTE: XML allows you to have an attribute and an element with the same name
+//! inside the one element. quick-xml deals with that by prepending a `@` prefix
+//! to the name of attributes.
+//!
+//! |
+//!
+//! A typical XML with child elements. The root tag name does not matter:
+//!
+//! ```xml
+//! |
+//!
+//! A structure where an each XML child element are mapped to the field.
+//! Each element name becomes a name of field. The name of the struct itself
+//! does not matter:
+//!
+//! ```
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # type U = ();
+//! // Get both elements
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! one: T,
+//! two: U,
+//! }
+//! # quick_xml::de::from_str::
+//!
+//! NOTE: XML allows you to have an attribute and an element with the same name
+//! inside the one element. quick-xml deals with that by prepending a `@` prefix
+//! to the name of attributes.
+//!
+//! |
+//!
+//! An XML with an attribute and a child element named equally:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! You MUST specify `#[serde(rename = "@field")]` on a field that will be used
+//! for an attribute:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # type U = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! #[serde(rename = "@field")]
+//! attribute: T,
+//! field: U,
+//! }
+//! # assert_eq!(
+//! # AnyName { attribute: (), field: () },
+//! # quick_xml::de::from_str(r#"
+//! # |
+//!
+//! +//! ## Optional attributes and elements +//! +//! | |
To parse all these XML's... | ...use that Rust type(s) |
+//! An optional XML attribute that you want to capture.
+//! The root tag name does not matter:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! A structure with an optional field, renamed according to the requirements
+//! for attributes:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! #[serde(rename = "@optional")]
+//! optional: Option |
+//!
+//! An optional XML elements that you want to capture.
+//! The root tag name does not matter:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! A structure with an optional field:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! optional: Option
+//!
+//! Currently some edge cases exists described in the issue [#497].
+//!
+//! |
+//!
+//! +//! ## Choices (`xs:choice` XML Schema type) +//! +//! | |
To parse all these XML's... | ...use that Rust type(s) |
+//! An XML with different root tag names:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! An enum where each variant have a name of the possible root tag. The name of
+//! the enum itself does not matter.
+//!
+//! All these structs can be used to deserialize from any XML on the
+//! left side depending on amount of information that you want to get:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # type U = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! #[serde(rename_all = "snake_case")]
+//! enum AnyName {
+//! One { #[serde(rename = "@field1")] field1: T },
+//! Two { field2: U },
+//! }
+//! # assert_eq!(AnyName::One { field1: () }, quick_xml::de::from_str(r#"
+//!
+//! NOTE: You should have variants for all possible tag names in your enum
+//! or have an `#[serde(other)]` variant.
+//!
+//!
+//! |
+//!
+//!
+//! ` |
+//!
+//!
+//! A structure with a field which type is an `enum`.
+//!
+//! Names of the enum, struct, and struct field with `Choice` type does not matter:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! #[serde(rename_all = "snake_case")]
+//! enum Choice {
+//! One,
+//! Two,
+//! }
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! #[serde(rename = "@field")]
+//! field: T,
+//!
+//! #[serde(rename = "$value")]
+//! any_name: Choice,
+//! }
+//! # assert_eq!(
+//! # AnyName { field: (), any_name: Choice::One },
+//! # quick_xml::de::from_str(r#" |
+//!
+//!
+//! ` |
+//!
+//!
+//! A structure with a field which type is an `enum`.
+//!
+//! Names of the enum, struct, and struct field with `Choice` type does not matter:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! #[serde(rename_all = "snake_case")]
+//! enum Choice {
+//! One,
+//! Two,
+//! }
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! field: T,
+//!
+//! #[serde(rename = "$value")]
+//! any_name: Choice,
+//! }
+//! # assert_eq!(
+//! # AnyName { field: (), any_name: Choice::One },
+//! # quick_xml::de::from_str(r#"
+//!
+//! NOTE: if your `Choice` enum would contain an `#[serde(other)]`
+//! variant, element `
+//!
+//! |
+//!
+//!
+//! ` |
+//!
+//!
+//! A structure with a field of an intermediate type with one field of `enum` type.
+//! Actually, this example is not necessary, because you can construct it by yourself
+//! using the composition rules that were described above. However the XML construction
+//! described here is very common, so it is shown explicitly.
+//!
+//! Names of the enum and struct does not matter:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! #[serde(rename_all = "snake_case")]
+//! enum Choice {
+//! One,
+//! Two,
+//! }
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct Holder {
+//! #[serde(rename = "$value")]
+//! any_name: Choice,
+//! }
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! #[serde(rename = "@field")]
+//! field: T,
+//!
+//! choice: Holder,
+//! }
+//! # assert_eq!(
+//! # AnyName { field: (), choice: Holder { any_name: Choice::One } },
+//! # quick_xml::de::from_str(r#" |
+//!
+//!
+//! ` |
+//!
+//!
+//! A structure with a field of an intermediate type with one field of `enum` type.
+//! Actually, this example is not necessary, because you can construct it by yourself
+//! using the composition rules that were described above. However the XML construction
+//! described here is very common, so it is shown explicitly.
+//!
+//! Names of the enum and struct does not matter:
+//!
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type T = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! #[serde(rename_all = "snake_case")]
+//! enum Choice {
+//! One,
+//! Two,
+//! }
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct Holder {
+//! #[serde(rename = "$value")]
+//! any_name: Choice,
+//! }
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! field: T,
+//!
+//! choice: Holder,
+//! }
+//! # assert_eq!(
+//! # AnyName { field: (), choice: Holder { any_name: Choice::One } },
+//! # quick_xml::de::from_str(r#" |
+//!
+//! +//! ## Sequences (`xs:all` and `xs:sequence` XML Schema types) +//! +//! | |
To parse all these XML's... | ...use that Rust type(s) |
+//! A sequence inside of a tag without a dedicated name:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! A structure with a field which have a sequence type, for example, [`Vec`].
+//! Because XML syntax does not distinguish between empty sequences and missed
+//! elements, we should indicate that on the Rust side, because serde will require
+//! that field `item` exists. You can do that in two possible ways:
+//!
+//! Use the `#[serde(default)]` attribute for a [field] or the entire [struct]:
+//! ```
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # type Item = ();
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! struct AnyName {
+//! #[serde(default)]
+//! item: Vec
+//!
+//! Currently not working. The bug is tracked in [#510].
+//!
+//!
+//! [field]: https://serde.rs/field-attrs.html#default
+//! [struct]: https://serde.rs/container-attrs.html#default
+//! |
+//!
+//! A sequence with a strict order, probably with a mixed content
+//! (text / CDATA and tags):
+//!
+//! ```xml
+//!
+//!
+//! NOTE: this is just an example for showing mapping. XML does not allow
+//! multiple root tags -- you should wrap the sequence into a tag.
+//!
+//! |
+//!
+//!
+//! All elements mapped to the heterogeneous sequential type: tuple or named tuple.
+//! Each element of the tuple should be able to be deserialized from the nested
+//! element content (`...`), except the enum types which would be deserialized
+//! from the full element (`
+//!
+//! NOTE: consequent text and CDATA nodes are merged into the one text node,
+//! so you cannot have two adjacent string types in your sequence.
+//!
+//!
+//!
+//! Merging of the text / CDATA content is tracked in the issue [#474] and
+//! will be available in the next release.
+//!
+//! |
+//!
+//! A sequence with a non-strict order, probably with a mixed content
+//! (text / CDATA and tags).
+//!
+//! ```xml
+//!
+//!
+//! NOTE: this is just an example for showing mapping. XML does not allow
+//! multiple root tags -- you should wrap the sequence into a tag.
+//!
+//! |
+//!
+//! A homogeneous sequence of elements with a fixed or dynamic size:
+//!
+//! ```ignore
+//! // FIXME: #474
+//! # use pretty_assertions::assert_eq;
+//! # use serde::Deserialize;
+//! # #[derive(Debug, PartialEq)]
+//! #[derive(Deserialize)]
+//! #[serde(rename_all = "snake_case")]
+//! enum Choice {
+//! One,
+//! Two,
+//! #[serde(other)]
+//! Other,
+//! }
+//! type AnyName = [Choice; 4];
+//! # assert_eq!(
+//! # [Choice::One, Choice::Other, Choice::Two, Choice::One],
+//! # quick_xml::de::from_str::
+//!
+//! NOTE: consequent text and CDATA nodes are merged into the one text node,
+//! so you cannot have two adjacent string types in your sequence.
+//!
+//!
+//!
+//! Merging of the text / CDATA content is tracked in the issue [#474] and
+//! will be available in the next release.
+//!
+//! |
+//!
+//! A sequence with a strict order, probably with a mixed content,
+//! (text and tags) inside of the other element:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! A structure where all child elements mapped to the one field which have
+//! a heterogeneous sequential type: tuple or named tuple. Each element of the
+//! tuple should be able to be deserialized from the full element (`
+//!
+//! NOTE: consequent text and CDATA nodes are merged into the one text node,
+//! so you cannot have two adjacent string types in your sequence.
+//!
+//!
+//!
+//! Merging of the text / CDATA content is tracked in the issue [#474] and
+//! will be available in the next release.
+//!
+//! |
+//!
+//! A sequence with a non-strict order, probably with a mixed content
+//! (text / CDATA and tags) inside of the other element:
+//!
+//! ```xml
+//! |
+//!
+//!
+//! A structure where all child elements mapped to the one field which have
+//! a homogeneous sequential type: array-like container. A container type `T`
+//! should be able to be deserialized from the nested element content (`...`),
+//! except if it is an enum type which would be deserialized from the full
+//! element (`
+//!
+//! NOTE: consequent text and CDATA nodes are merged into the one text node,
+//! so you cannot have two adjacent string types in your sequence.
+//!
+//!
+//!
+//! Merging of the text / CDATA content is tracked in the issue [#474] and
+//! will be available in the next release.
+//!
+//! |
+//!