Skip to content

Commit

Permalink
Fix the errors, clarify some things in the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Mingun authored and dralley committed Aug 26, 2022
1 parent 9b0caff commit 5bd62cc
Show file tree
Hide file tree
Showing 11 changed files with 122 additions and 71 deletions.
15 changes: 11 additions & 4 deletions Cargo.toml
Expand Up @@ -59,11 +59,17 @@ async-tokio = ["tokio"]
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding
encoding = ["encoding_rs"]

## Enables support for recognizing all [HTML 5 entities](https://dev.w3.org/html5/html-author/charref)
## Enables support for recognizing all [HTML 5 entities] in [`unescape`] and
## [`unescape_with`] functions. The full list of entities also can be found in
## <https://html.spec.whatwg.org/entities.json>.
##
## [HTML 5 entities]: https://dev.w3.org/html5/html-author/charref
## [`unescape`]: crate::escape::unescape
## [`unescape_with`]: crate::escape::unescape_with
escape-html = []

## This feature enables support for deserializing lists where tags are overlapped
## with tags that do not correspond to the list.
## This feature for a serde deserializer that enables support for deserializing
## lists where tags are overlapped with tags that do not correspond to the list.
##
## When this feature is enabled, the XML:
## ```xml
Expand All @@ -75,7 +81,8 @@ escape-html = []
## </any-name>
## ```
## could be deserialized to a struct:
## ```ignore
## ```no_run
## # use serde::Deserialize;
## #[derive(Deserialize)]
## #[serde(rename_all = "kebab-case")]
## struct AnyName {
Expand Down
13 changes: 11 additions & 2 deletions src/errors.rs
Expand Up @@ -168,15 +168,24 @@ pub mod serialize {
/// Please open an issue at <https://github.com/tafia/quick-xml>, provide
/// your Rust code and XML input.
UnexpectedEnd(Vec<u8>),
/// Unexpected end of file
/// The [`Reader`] produced [`Event::Eof`] when it is not expecting,
/// for example, after producing [`Event::Start`] but before corresponding
/// [`Event::End`].
///
/// [`Reader`]: crate::reader::Reader
/// [`Event::Eof`]: crate::events::Event::Eof
/// [`Event::Start`]: crate::events::Event::Start
/// [`Event::End`]: crate::events::Event::End
UnexpectedEof,
/// This error indicates that [`deserialize_struct`] was called, but there
/// is no any XML element in the input. That means that you try to deserialize
/// a struct not from an XML element.
///
/// [`deserialize_struct`]: serde::de::Deserializer::deserialize_struct
ExpectedStart,
/// Unsupported operation
/// An attempt to deserialize to a type, that is not supported by the XML
/// store at current position, for example, attempt to deserialize `struct`
/// from attribute or attempt to deserialize binary data.
Unsupported(&'static str),
/// Too many events were skipped while deserializing a sequence, event limit
/// exceeded. The limit was provided as an argument
Expand Down
11 changes: 10 additions & 1 deletion src/escapei.rs
Expand Up @@ -132,13 +132,21 @@ fn _escape<F: Fn(u8) -> bool>(raw: &str, escape_chars: F) -> Cow<str> {
}

/// Unescape an `&str` and replaces all xml escaped characters (`&...;`) into
/// their corresponding value
/// their corresponding value.
///
/// If feature `escape-html` is enabled, then recognizes all [HTML5 escapes].
///
/// [HTML5 escapes]: https://dev.w3.org/html5/html-author/charref
pub fn unescape(raw: &str) -> Result<Cow<str>, EscapeError> {
unescape_with(raw, |_| None)
}

/// Unescape an `&str` and replaces all xml escaped characters (`&...;`) into
/// their corresponding value, using a resolver function for custom entities.
///
/// If feature `escape-html` is enabled, then recognizes all [HTML5 escapes].
///
/// [HTML5 escapes]: https://dev.w3.org/html5/html-author/charref
pub fn unescape_with<'input, 'entity, F>(
raw: &'input str,
resolve_entity: F,
Expand Down Expand Up @@ -211,6 +219,7 @@ const fn named_entity(name: &str) -> Option<&str> {
const fn named_entity(name: &str) -> Option<&str> {
// imported from https://dev.w3.org/html5/html-author/charref
// match over strings are not allowed in const functions
//TODO: automate up-to-dating using https://html.spec.whatwg.org/entities.json
let s = match name.as_bytes() {
b"Tab" => "\u{09}",
b"NewLine" => "\u{0A}",
Expand Down
12 changes: 9 additions & 3 deletions src/events/mod.rs
Expand Up @@ -16,8 +16,8 @@
//! See [`Event`] for a list of all possible events.
//!
//! # Reading
//! When reading a XML stream, the events are emitted by
//! [`Reader::read_event_into`]. You must listen
//! When reading a XML stream, the events are emitted by [`Reader::read_event`]
//! and [`Reader::read_event_into`]. You must listen
//! for the different types of events you are interested in.
//!
//! See [`Reader`] for further information.
Expand All @@ -29,6 +29,7 @@
//!
//! See [`Writer`] for further information.
//!
//! [`Reader::read_event`]: crate::reader::Reader::read_event
//! [`Reader::read_event_into`]: crate::reader::Reader::read_event_into
//! [`Reader`]: crate::reader::Reader
//! [`Writer`]: crate::writer::Writer
Expand Down Expand Up @@ -500,7 +501,12 @@ impl<'a> BytesDecl<'a> {
.transpose()
}

/// Gets the decoder struct
/// Gets the actual encoding using [_get an encoding_](https://encoding.spec.whatwg.org/#concept-encoding-get)
/// algorithm.
///
/// If encoding in not known, or `encoding` key was not found, returns `None`.
/// In case of duplicated `encoding` key, encoding, corresponding to the first
/// one, is returned.
#[cfg(feature = "encoding")]
pub fn encoder(&self) -> Option<&'static Encoding> {
self.encoding()
Expand Down
14 changes: 8 additions & 6 deletions src/lib.rs
Expand Up @@ -7,18 +7,19 @@
//! A streaming API based on the [StAX] model. This is suited for larger XML documents which
//! cannot completely read into memory at once.
//!
//! The user has to explicitly _ask_ for the next XML event, similar
//! to a database cursor.
//! The user has to explicitly _ask_ for the next XML event, similar to a database cursor.
//! This is achieved by the following two structs:
//!
//! - [`Reader`]: A low level XML pull-reader where buffer allocation/clearing is left to user.
//! - [`Writer`]: A XML writer. Can be nested with readers if you want to transform XMLs.
//!
//! Especially for nested XML elements, the user must keep track _where_ (how deep) in the XML document
//! the current event is located. This is needed as the
//! Especially for nested XML elements, the user must keep track _where_ (how deep)
//! in the XML document the current event is located.
//!
//! Furthermore, quick-xml also contains optional [Serde] support to directly serialize and deserialize from
//! structs, without having to deal with the XML events.
//! quick-xml contains optional support of asynchronous reading using [tokio].
//!
//! Furthermore, quick-xml also contains optional [Serde] support to directly
//! serialize and deserialize from structs, without having to deal with the XML events.
//!
//! # Examples
//!
Expand All @@ -30,6 +31,7 @@
//! `quick-xml` supports the following features:
//!
//! [StAX]: https://en.wikipedia.org/wiki/StAX
//! [tokio]: https://tokio.rs/
//! [Serde]: https://serde.rs/
#![cfg_attr(
feature = "document-features",
Expand Down
2 changes: 1 addition & 1 deletion src/name.rs
Expand Up @@ -212,7 +212,7 @@ impl<'a> AsRef<[u8]> for Prefix<'a> {
/// [XML Schema specification](https://www.w3.org/TR/xml-names/#ns-decl)
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum PrefixDeclaration<'a> {
/// XML attribute binds a default namespace. Corresponds to `xmlns` in in `xmlns="..."`
/// XML attribute binds a default namespace. Corresponds to `xmlns` in `xmlns="..."`
Default,
/// XML attribute binds a specified prefix to a namespace. Corresponds to a
/// `prefix` in `xmlns:prefix="..."`, which is stored as payload of this variant.
Expand Down
16 changes: 8 additions & 8 deletions src/reader/buffered_reader.rs
Expand Up @@ -216,8 +216,7 @@ impl<'b, R: BufRead> XmlSource<'b, &'b mut Vec<u8>> for R {

////////////////////////////////////////////////////////////////////////////////////////////////////

/// This is an implementation of [`Reader`] for reading from a [`BufRead`] as
/// underlying byte stream.
/// This is an implementation for reading from a [`BufRead`] as underlying byte stream.
impl<R: BufRead> Reader<R> {
/// Reads the next `Event`.
///
Expand All @@ -243,24 +242,24 @@ impl<R: BufRead> Reader<R> {
/// let xml = r#"<tag1 att1 = "test">
/// <tag2><!--Test comment-->Test</tag2>
/// <tag2>Test 2</tag2>
/// </tag1>"#;
/// </tag1>"#;
/// let mut reader = Reader::from_str(xml);
/// reader.trim_text(true);
/// let mut count = 0;
/// let mut buf = Vec::new();
/// let mut txt = Vec::new();
/// loop {
/// match reader.read_event_into(&mut buf) {
/// Ok(Event::Start(ref e)) => count += 1,
/// Ok(Event::Start(_)) => count += 1,
/// Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),
/// Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
/// Ok(Event::Eof) => break,
/// _ => (),
/// }
/// buf.clear();
/// }
/// println!("Found {} start events", count);
/// println!("Text events: {:?}", txt);
/// assert_eq!(count, 3);
/// assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
/// ```
#[inline]
pub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>> {
Expand All @@ -275,7 +274,8 @@ impl<R: BufRead> Reader<R> {
/// a closing tag or an empty slice, if [`expand_empty_elements`] is set and
/// this method was called after reading expanded [`Start`] event.
///
/// Manages nested cases where parent and child elements have the same name.
/// Manages nested cases where parent and child elements have the _literally_
/// same name.
///
/// If corresponding [`End`] event will not be found, the [`Error::UnexpectedEof`]
/// will be returned. In particularly, that error will be returned if you call
Expand All @@ -299,7 +299,7 @@ impl<R: BufRead> Reader<R> {
///
/// # Namespaces
///
/// While the [`Reader`] does not support namespace resolution, namespaces
/// While the `Reader` does not support namespace resolution, namespaces
/// does not change the algorithm for comparing names. Although the names
/// `a:name` and `b:name` where both prefixes `a` and `b` resolves to the
/// same namespace, are semantically equivalent, `</b:name>` cannot close
Expand Down
20 changes: 14 additions & 6 deletions src/reader/mod.rs
Expand Up @@ -20,27 +20,32 @@ macro_rules! configure_methods {
/// default), those tags are represented by an [`Empty`] event instead.
///
/// Note, that setting this to `true` will lead to additional allocates that
/// needed to store tag name for an [`End`] event. There is no additional
/// allocation, however, if [`Self::check_end_names()`] is also set.
/// needed to store tag name for an [`End`] event. However if [`check_end_names`]
/// is also set, only one additional allocation will be performed that support
/// both these options.
///
/// (`false` by default)
///
/// [`Empty`]: Event::Empty
/// [`Start`]: Event::Start
/// [`End`]: Event::End
/// [`check_end_names`]: Self::check_end_names
pub fn expand_empty_elements(&mut self, val: bool) -> &mut Self {
self $(.$holder)? .parser.expand_empty_elements = val;
self
}

/// Changes whether whitespace before and after character data should be removed.
///
/// When set to `true`, all [`Text`] events are trimmed. If they are empty, no event will be
/// pushed.
/// When set to `true`, all [`Text`] events are trimmed.
/// If after that the event is empty it will not be pushed.
///
/// Changing this option automatically changes the [`trim_text_end`] option.
///
/// (`false` by default)
///
/// [`Text`]: Event::Text
/// [`trim_text_end`]: Self::trim_text_end
pub fn trim_text(&mut self, val: bool) -> &mut Self {
self $(.$holder)? .parser.trim_text_start = val;
self $(.$holder)? .parser.trim_text_end = val;
Expand All @@ -50,6 +55,7 @@ macro_rules! configure_methods {
/// Changes whether whitespace after character data should be removed.
///
/// When set to `true`, trailing whitespace is trimmed in [`Text`] events.
/// If after that the event is empty it will not be pushed.
///
/// (`false` by default)
///
Expand Down Expand Up @@ -99,13 +105,15 @@ macro_rules! configure_methods {
/// contain the data of the mismatched end tag.
///
/// Note, that setting this to `true` will lead to additional allocates that
/// needed to store tag name for an [`End`] event. There is no additional
/// allocation, however, if [`Self::expand_empty_elements()`] is also set.
/// needed to store tag name for an [`End`] event. However if [`expand_empty_elements`]
/// is also set, only one additional allocation will be performed that support
/// both these options.
///
/// (`true` by default)
///
/// [spec]: https://www.w3.org/TR/xml11/#dt-etag
/// [`End`]: Event::End
/// [`expand_empty_elements`]: Self::expand_empty_elements
pub fn check_end_names(&mut self, val: bool) -> &mut Self {
self $(.$holder)? .parser.check_end_names = val;
self
Expand Down

0 comments on commit 5bd62cc

Please sign in to comment.