Fix attribute parsing #4

Mingun · 2022-05-03T16:20:27Z

This PR adds tests for attribute parsing and fixes bugs found. It introduces new AttrError type that now returned by Attributes::next() and contains all possible errors when attributes is parsed.

Also it introduces a new type Attr which is by design is borrowed-only. In version 0.24 I plan to change Attributes iterator and return that new type.

@dralley, it would be great to run performance tests and see the results and maybe make optimizations like that you've done in #3

dralley · 2022-05-03T16:46:54Z

Sure, I'll try to review in the next few days (when time allows)

dralley · 2022-05-03T22:27:59Z

I assume this is an exhaustive list of (tests demonstrating) the various issues with the previous parsing algorithm? Are there any non-covered cases that you're aware of (ignoring the attribute value normalization work that I'll eventually get back to)

    events::attributes::html::duplicated::with_check::double_quoted
    events::attributes::html::duplicated::with_check::key_only
    events::attributes::html::duplicated::with_check::single_quoted
    events::attributes::html::duplicated::with_check::unquoted
    events::attributes::html::single::missed_value
    events::attributes::html::sparsed::double_quoted
    events::attributes::html::sparsed::key_contains_invalid
    events::attributes::html::sparsed::key_only
    events::attributes::html::sparsed::key_start_invalid
    events::attributes::html::sparsed::missed_value
    events::attributes::html::sparsed::single_quoted
    events::attributes::html::sparsed::unquoted
    events::attributes::xml::duplicated::with_check::double_quoted
    events::attributes::xml::duplicated::with_check::key_only
    events::attributes::xml::duplicated::with_check::single_quoted
    events::attributes::xml::duplicated::with_check::unquoted
    events::attributes::xml::duplicated::without_check::key_only
    events::attributes::xml::duplicated::without_check::unquoted
    events::attributes::xml::first::key_only
    events::attributes::xml::first::missed_value
    events::attributes::xml::first::unquoted
    events::attributes::xml::single::key_only
    events::attributes::xml::single::missed_value
    events::attributes::xml::sparsed::missed_value

Mingun · 2022-05-04T04:46:47Z

I assume this is an exhaustive list of (tests demonstrating) the various issues with the previous parsing algorithm?

Yes, it is. If you checkout this commit you will see what exactly errors present. Particularly, in some cases key contains spaces, in many cases after returning Err next next() call returns None and in some returns next attribute.

The attributes parser itself is very rough and do not follow specification. It doesn't check for illegal symbols in a key and unquoted value, entities does not expanded (but that is fine at that level -- you anyway needs an entity manager to get a list of defined entities). For now I left that as "by design" -- correctness of each attribute should be checked after getting it. But I have plans to try to replace manually written parser with one generated by grammar using rust-peg with more robust parsing.

If you want to work on attribute normalization, some my minds: first, you need to introduce an entity manager. That manager should collect entities defined in a document and allow user to define external entities. Collecting entities requires, in turn, DTD parser. There already exist crates for that, I did not look deeply. May be we can use some of them.

Then normalization would require reference to the entity manager. No need to store it inside an attribute, it is better to create a dedicated type for a well-formed attribute that would created by normalization. In fact, that PR already did that differentiation: Attr by design contains borrowed unprocessed data to the input, and created only by the parser, but Attribute could be created by the user and have own data.

Introducing entity manager and doing attribute normalization could be done in different PRs, because entity manager should be used everywhere

Mingun

In general, new algorithm around 5 us (~10%) slower than previous, which, I believe, because of recovery checks. Not a significance difference

Mingun · 2022-05-05T17:14:07Z

src/events/attributes.rs

-            None => return attr!(self.position..len),
+    pub fn next(&mut self, slice: &[u8]) -> Option<AttrResult> {
+        let mut iter = match self.recover(slice) {
+            Some(offset) => (offset..).zip(slice[offset..].iter()),


Using this instead of

slice.iter().enumerate().skip(offset)

really improves performance around by 10 us (~15%), according to Criterion, so applied it

Mingun · 2022-05-05T17:16:19Z

src/events/attributes.rs

+            }
+        };
+
+        match iter.find(|(_, &b)| b == quote) {


I also tried to replace this by

match memchr::memchr(quote, &slice[start_value..]) { ... }

but that does not give a significance difference

src/events/attributes.rs

dralley · 2022-05-05T18:35:46Z

src/events/attributes.rs

+    /// That error can be raised only if iterator in XML mode.
+    UnquotedValue(usize),
+    /// Attribute value not finished with a matching quote, position relative to
+    /// start of owning tag and a quote is provided. That position is always a last


I understand this to mean that "the position will be after a quote character" but to say that the position is relative to the "start of the owning tag and a quote" is a little bit confusing. It sounds like it could mean either of

A) the position will be on a quote character, which isn't true
B) the position is relative to both, which isn't possible

I think "and a quote" could be removed and it would still make sense (given the examples).

Feel free to rephrase it and I'll add it to the other your suggestions.

src/events/attributes.rs

dralley · 2022-05-05T18:46:05Z

@Mingun I'm going through commit-by-commit so this isn't a final review yet. I'll continue in a few hours.

The code looks great so far, I only have a few minor grammatical / wording nitpicks.

Mingun · 2022-05-05T19:10:17Z

Ok. Yeah, grammar should be checked because I'm not a native speaker :) I tried, but native speaker always can write it better

Mingun · 2022-05-06T15:33:12Z

@dralley, tell me when you finish with review. I'll plan to make a 0.23 release after merging this.

dralley

@Mingun Here's another round of review, I have some questions and some code suggestions

dralley · 2022-05-07T02:17:34Z

src/de/map.rs

@@ -41,7 +42,7 @@ where
    /// do not store reference to `Attributes` itself but instead create
    /// a new object on each advance of `Attributes` iterator, so we need
    /// to restore last position before advance.
-    position: usize,
+    iter: IterState,


I have some other questions about this (below) but the docstring needs to be updated.

Somehow forgot to save changes in the editor, so comment is changed outside of this PR in f9453b9

dralley · 2022-05-07T02:32:21Z

src/events/attributes.rs

-    /// if `with_checks`, contains the ranges corresponding to the
-    /// attribute names already parsed in this `Element`
-    consumed: Vec<Range<usize>>,
+    /// Iterator state, independent from the actual source of bytes


What is the rationale behind the change?

It leads to some weird interfaces below - e.g. all the methods of IterState take an external slice, and the caller could theoretically call iter.next(...) or other methods with two entirely different slices. The slice has to get passed around more often, the methods here are now delegating directly to self.state, etc.

Intuitively leaving everything in the Attributes struct as it was makes more sense to me, but I might be missing some details.

This is required for another iterator implementation to share the code. The problem is described in tafia/quick-xml#337 (comment): we have a borrowing iterator, but doesn't have an owned iterator. For example, for MapAccess we need an owned iterator and in this PR MapAccess de facto implements it. In the future we should expose that iterator as public API

In any case IterState is supposed to stay an internal detail and will never be exposed in public API. All internal usages will guaranties that they will pass correct slices

dralley · 2022-05-07T02:43:56Z

src/events/attributes.rs

-            }
-            Some((i, _)) => err!(AttrError::UnquotedValue(i)),
-            None => return attr!(start_key..end_key),
+        match self.state.next(self.bytes) {


This feels awkward because it's not the signature you'd expect from a next() method (even though this isn't trying to implement Iterator::next(), it's still the same thing conceptually). Usually iterators know what they're iterating over.

This is required for lifetime correctness. Owned iterator will store a Cow<'input, [u8]>, borrowing iterator stores a &'input [u8]. If you will try to move slice argument of next into IterState and store in a Cow<'input, [u8]>, even when borrowing iterator always will pass a Cow::Borrowed, borrow checker will not allow you to return slices with the 'input lifetime, because Cow potentially can be owned.

You also cannot store a &'input [u8] in the IterState, because owned iterator is supposed to be

struct OwnedAttributes<'input> { slice: Cow<'input, [u8]>, state: IterState, }

and you cannot store in that struct reference to a field of this struct (state would be referenced to a slice)

dralley · 2022-05-07T02:45:52Z

src/events/attributes.rs

+    /// [events]: crate::events::Event::Start
+    Unquoted(T, T),
+    /// Attribute without value. Attribute key provided. This is HTML-style attribute,
+    /// it can be returned only in HTML-mode parsing only. In XML mode


"only in HTML-mode parsing only" => I think you meant to remove one of the "only"s

dralley · 2022-05-07T03:09:47Z

src/events/attributes.rs

+impl<T> Attr<T> {
+    /// Maps an `Attr<T>` to `Attr<U>` by applying a function to a contained key and value.
+    #[inline]
+    pub fn map<U, F>(self, mut f: F) -> Attr<U>


(Regarding the whole implementation of Attr, not this specific method)

I see where you're coming from with all of this, but there are some downsides to doing it this way.

it would add a lot more branches to the generated code, because every access has to go through match

also general code bloat when it gets inlined everywhere that a dev uses .key(), .value(), etc.

it feels like a detail few users would care too much about, even ones parsing HTML

As far as I can tell, HTML doesn't have any attributes that are required to not have a value - attributes with no value, and attributes with an empty-string value are supposed to be treated identically.

A number of attributes are boolean attributes. The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.

If the attribute is present, its value must either be the empty string or a value that is an ASCII case-insensitive match for the attribute's canonical name, with no leading or trailing whitespace.

https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#boolean-attributes

Just the attribute name. The value is implicitly the empty string.

https://html.spec.whatwg.org/multipage/syntax.html#syntax-attr-empty

So given that the spec says that all of these syntaxes are equivalent ways to express the attribute, I'm not sure that all of this is worth it, compared to just having a struct with key + value fields. I wouldn't have a problem with a third field to denote unquoted, empty, single-quote, double-quote though. It's just not important enough to treat like Option or Result and force dealing with specific variants.

That benchmarks shows only small 5 us regression (from ~75 to ~80 with checks, from ~44 to ~49 without checks, from ~155 to ~159 for try_get_attribute) in parsing which, I think, not a big pay for an additional info, that could be useful:

fast-xml/benches/bench.rs

Lines 256 to 333 in 6a3fd90

/// Benchmarks parsing attributes from events

fn attributes(c: &mut Criterion) {

let mut group = c.benchmark_group("attributes");

group.bench_function("with_checks = true", |b| {

b.iter(|| {

let mut r = Reader::from_reader(PLAYERS);

r.check_end_names(false).check_comments(false);

let mut count = criterion::black_box(0);

let mut buf = Vec::new();

loop {

match r.read_event(&mut buf) {

Ok(Event::Empty(e)) => {

for attr in e.attributes() {

let _attr = attr.unwrap();

count += 1

}

}

Ok(Event::Eof) => break,

_ => (),

}

buf.clear();

}

assert_eq!(count, 1041);

})

});

group.bench_function("with_checks = false", |b| {

b.iter(|| {

let mut r = Reader::from_reader(PLAYERS);

r.check_end_names(false).check_comments(false);

let mut count = criterion::black_box(0);

let mut buf = Vec::new();

loop {

match r.read_event(&mut buf) {

Ok(Event::Empty(e)) => {

for attr in e.attributes().with_checks(false) {

let _attr = attr.unwrap();

count += 1

}

}

Ok(Event::Eof) => break,

_ => (),

}

buf.clear();

}

assert_eq!(count, 1041);

})

});

group.bench_function("try_get_attribute", |b| {

b.iter(|| {

let mut r = Reader::from_reader(PLAYERS);

r.check_end_names(false).check_comments(false);

let mut count = criterion::black_box(0);

let mut buf = Vec::new();

loop {

match r.read_event(&mut buf) {

Ok(Event::Empty(e)) if e.name() == b"player" => {

for name in ["num", "status", "avg"] {

if let Some(_attr) = e.try_get_attribute(name).unwrap() {

count += 1

}

}

assert!(e

.try_get_attribute("attribute-that-doesn't-exist")

.unwrap()

.is_none());

}

Ok(Event::Eof) => break,

_ => (),

}

buf.clear();

}

assert_eq!(count, 150);

})

});

group.finish();

}

Besides, I'll also have a next PR with Span support and I want to expose the information from where that or another piece of data come. That is dependent of the attribute shape and useful for reporting errors.

bq. I wouldn't have a problem with a third field to denote unquoted, empty, single-quote, double-quote though. It's just not important enough to treat like Option or Result and force dealing with specific variants.

That wouldn't be idiomatic Rust. Usually you will use .key() and .value() methods, so you hardly likely will deal with the concrete variant. Moreover, I've introduced a new Attr type instead of modifying Attribute to allow users to not deal with attribute shape when it do not matter. There will only one conversion from Attr to Attribute that just moves shape erasing from internals of iterator to a user-defined place. That should not bloat code well and likely will be optimized by the compiler if you did that just after getting the attribute from the iterator.

That wouldn't be idiomatic Rust.

There are cases where it actually is idiomatic, or at least a common pattern. std::io::ErrorKind is used because all variants have the same signature (like is the case here). nom::error::ErrorKind is the same.

The problem with going purely off the benchmarks here is that depending on the consistency of the input, the branch predictor is going to get it right most of the time (and if it's a small enough input, the the branch predictor is going to learn it over the course of many iterations even if it's not consistent), so it's not going to look that expensive.

But it's very dependent on the input, and how much the extra bloat hurts depends a lot on what code the user is writing and how much cache their CPU has. It's just a lot of extra stuff to put in such a hot path.

Maybe it could look like this?

struct Attr { key: ..., value: ..., shape: AttrShape, } enum AttrShape { Unqoted(..span stuff..), DoubleQuoted(..span stuff..), SingleQuoted(..span stuff..), Empty(..span stuff..), }

And then the user could avoid the cost completely if they don't care about it, and the Span work still has an obvious place to go.

Actually, that what you suggesting, is implemented now:

your AttrShape is Attr<Range<usize>> -- that is the type that returned by IterState::next()

your Attr is the future Attribute

Because we in any case need to get spans for key and value, and because value can be missed, and because I want to keep only one parser implementation and not copy it for different types, the current IterState design seems to me the almost only possible (the other way is return (Range<usize>, Option<Range<usize>>) instead but I don't think that it gives any benefits while reducing code clarity).

I see that your point in possible code bloat and/or inconveniences to use match constantly, but that is impossible to trigger from the user code now, because the only way to create an Attr instance in the user code is to create it by yourself. IterState is a private type. The only place where conversion from Attr to Attribute is performed now is inside Attributes::next():

fast-xml/src/events/attributes.rs

Lines 335 to 346 in eaebf45

impl<'a> Iterator for Attributes<'a> {

type Item = Result<Attribute<'a>, AttrError>;

#[inline]

fn next(&mut self) -> Option<Self::Item> {

match self.state.next(self.bytes) {

None => None,

Some(Ok(a)) => Some(Ok(a.map(|range| &self.bytes[range]).into())),

Some(Err(e)) => Some(Err(e)),

}

}

}

I still believe that compiler is able to optimize all right there. Maybe that place seems too generic, because I hide conversion under map() + Into<Attribute>, but it is fairly straight-ford

dralley · 2022-05-07T03:18:32Z

src/events/attributes.rs

+            Attr::Unquoted(key, value) => (key, Some(value)),
+        }
+    }
+}


Basically all this code would be deleted up to my previous comment. key() and value() methods would stay but it would just be a simple accessor.

dralley · 2022-05-08T03:10:27Z

@Mingun Barring the two docstrings that need to be fixed, let's call this approved - I'd like to experiment for myself to either better understand the problems you've discussed and maybe gather some concrete data about the costs or lack thereof of the extra branching - but since that will take some time I won't block the PR on it. The tests pass at least.

Also fix incorrect references to `Attributes` instead of `Attribute`

…essages

.find(|&(_, &b)| b == b'=' || is_whitespace(b)) cannot return Some((_, b'"')) or Some((_, b'\'')) because `"` and `'` are not whitespaces

Because iterator over attributes is recovering, user can want to inspect possible errors and filtering out irrelevant errors should help him Co-authored-by: Daniel Alley <dalley@redhat.com>

Because now error type is PartialEq, we can use `assert_eq!` directly, which is especially nice when used with pretty_assertions crate

failures (24): events::attributes::html::duplicated::with_check::double_quoted events::attributes::html::duplicated::with_check::key_only events::attributes::html::duplicated::with_check::single_quoted events::attributes::html::duplicated::with_check::unquoted events::attributes::html::single::missed_value events::attributes::html::sparsed::double_quoted events::attributes::html::sparsed::key_contains_invalid events::attributes::html::sparsed::key_only events::attributes::html::sparsed::key_start_invalid events::attributes::html::sparsed::missed_value events::attributes::html::sparsed::single_quoted events::attributes::html::sparsed::unquoted events::attributes::xml::duplicated::with_check::double_quoted events::attributes::xml::duplicated::with_check::key_only events::attributes::xml::duplicated::with_check::single_quoted events::attributes::xml::duplicated::with_check::unquoted events::attributes::xml::duplicated::without_check::key_only events::attributes::xml::duplicated::without_check::unquoted events::attributes::xml::first::key_only events::attributes::xml::first::missed_value events::attributes::xml::first::unquoted events::attributes::xml::single::key_only events::attributes::xml::single::missed_value events::attributes::xml::sparsed::missed_value

Introduce new `Attr` type that stores not only an attribute content, but also its shape

Mingun · 2022-05-08T09:28:32Z

@dralley, thanks for your work on reviewing this PR!

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | patch | `0.23.0-alpha3` -> `0.23.0` | | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | minor | `0.22.0` -> `0.23.0` | --- ### Release Notes <details> <summary>tafia/quick-xml</summary> ### [`v0.23.0`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#0230----2022-05-08) - feat: add support for `i128` / `u128` in attributes or text/CDATA content - test: add tests for malformed inputs for serde deserializer - fix: allow to deserialize `unit`s from any data in attribute values and text nodes - refactor: unify errors when EOF encountered during serde deserialization - test: ensure that after deserializing all XML was consumed - feat: add `Deserializer::from_str`, `Deserializer::from_slice` and `Deserializer::from_reader` - refactor: deprecate `from_bytes` and `Deserializer::from_borrowing_reader` because they are fully equivalent to `from_slice` and `Deserializer::new` - refactor: reduce number of unnecessary copies when deserialize numbers/booleans/identifiers from the attribute and element names and attribute values - fix: allow to deserialize `unit`s from text and CDATA content. `DeError::InvalidUnit` variant is removed, because after fix it is no longer used - fix: `ElementWriter`, introduced in [#274](tafia/quick-xml#274) (0.23.0-alpha2) now available to end users - fix: allow lowercase `<!doctype >` definition (used in HTML 5) when parse document from `&[u8]` - test: add tests for consistence behavior of buffered and borrowed readers - fix: produce consistent error positions in buffered and borrowed readers - feat: `Error::UnexpectedBang` now provide the byte found - refactor: unify code for buffered and borrowed readers - fix: fix internal panic message when parse malformed XML ([#344](tafia/quick-xml#344)) - test: add tests for trivial documents (empty / only comment / `<root>...</root>` -- one tag with content) - fix: CDATA was not handled in many cases where it should - fix: do not unescape CDATA content because it never escaped by design. CDATA event data now represented by its own `BytesCData` type ([quick-xml#311](tafia/quick-xml#311)) - feat: add `Reader::get_ref()` and `Reader::get_mut()`, rename `Reader::into_underlying_reader()` to `Reader::into_inner()` - refactor: now `Attributes::next()` returns a new type `AttrError` when attribute parsing failed ([#4](Mingun/fast-xml#4)) - test: properly test all paths of attributes parsing ([#4](Mingun/fast-xml#4)) - feat: attribute iterator now implements `FusedIterator` ([#4](Mingun/fast-xml#4)) - fix: fixed many errors in attribute parsing using iterator, returned from `attributes()` or `html_attributes()` ([#4](Mingun/fast-xml#4)) </details> --- ### Configuration 📅 **Schedule**: At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, click this checkbox. --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Co-authored-by: crapStone <crapstone01@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1377 Reviewed-by: crapStone <crapstone@noreply.codeberg.org> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>

Mingun mentioned this pull request May 3, 2022

Small optimizations #3

Closed

Mingun force-pushed the fix-attributes branch from 9c4bfc1 to 0b1c983 Compare May 3, 2022 19:59

Mingun force-pushed the fix-attributes branch 2 times, most recently from 7d4131b to e090006 Compare May 5, 2022 17:09

Mingun commented May 5, 2022

View reviewed changes

dralley suggested changes May 5, 2022

View reviewed changes

dralley suggested changes May 7, 2022

View reviewed changes

Mingun and others added 8 commits May 8, 2022 14:02

Group all attribute-related methods in one impl block

bddd1a6

Also fix incorrect references to `Attributes` instead of `Attribute`

Group Attributes definition and impl block

149635a

Remove period from error messages for consistency and rephrase some m…

6b82fc3

…essages

Error kind NameWithQuote never triggered, remove it

e7dda45

.find(|&(_, &b)| b == b'=' || is_whitespace(b)) cannot return Some((_, b'"')) or Some((_, b'\'')) because `"` and `'` are not whitespaces

Use dedicated comparable error type for attribute parsing errors

f93259a

Because iterator over attributes is recovering, user can want to inspect possible errors and filtering out irrelevant errors should help him Co-authored-by: Daniel Alley <dalley@redhat.com>

Use AttrError in the Attributes iterator return type

317ab14

Because now error type is PartialEq, we can use `assert_eq!` directly, which is especially nice when used with pretty_assertions crate

Reimplement attributes parsing logic and fix all errors

c81d25c

Introduce new `Attr` type that stores not only an attribute content, but also its shape

Mingun force-pushed the fix-attributes branch from eaebf45 to c81d25c Compare May 8, 2022 09:21

Mingun merged commit 30e0c23 into master May 8, 2022

Mingun deleted the fix-attributes branch May 8, 2022 09:28

Mingun added a commit that referenced this pull request May 8, 2022

Follow up for #4 - forgot to save changes in the editor

f9453b9

Mingun added the bug Something isn't working label May 9, 2022

marco-c mentioned this pull request Jun 11, 2022

Bump quick-xml from 0.22.0 to 0.23.0 mozilla/grcov#834

Closed

plugwash mentioned this pull request Dec 22, 2022

patches to support newer quick-xml rust-x-bindings/rust-xcb#216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix attribute parsing #4

Fix attribute parsing #4

Mingun commented May 3, 2022

dralley commented May 3, 2022

dralley commented May 3, 2022 •

edited

Mingun commented May 4, 2022

Mingun left a comment

Mingun May 5, 2022

Mingun May 5, 2022

dralley May 5, 2022

Mingun May 5, 2022

dralley commented May 5, 2022

Mingun commented May 5, 2022

Mingun commented May 6, 2022

dralley left a comment

dralley May 7, 2022

Mingun May 8, 2022

dralley May 7, 2022

Mingun May 7, 2022

dralley May 7, 2022

Mingun May 7, 2022

dralley May 7, 2022

Mingun May 7, 2022

dralley May 7, 2022 •

edited

Mingun May 7, 2022

dralley May 7, 2022 •

edited

Mingun May 7, 2022

dralley May 7, 2022

dralley commented May 8, 2022 •

edited

Mingun commented May 8, 2022

	/// Benchmarks parsing attributes from events
	fn attributes(c: &mut Criterion) {
	let mut group = c.benchmark_group("attributes");
	group.bench_function("with_checks = true", \|b\| {
	b.iter(\|\| {
	let mut r = Reader::from_reader(PLAYERS);
	r.check_end_names(false).check_comments(false);
	let mut count = criterion::black_box(0);
	let mut buf = Vec::new();
	loop {
	match r.read_event(&mut buf) {
	Ok(Event::Empty(e)) => {
	for attr in e.attributes() {
	let _attr = attr.unwrap();
	count += 1
	}
	}
	Ok(Event::Eof) => break,
	_ => (),
	}
	buf.clear();
	}
	assert_eq!(count, 1041);
	})
	});

	group.bench_function("with_checks = false", \|b\| {
	b.iter(\|\| {
	let mut r = Reader::from_reader(PLAYERS);
	r.check_end_names(false).check_comments(false);
	let mut count = criterion::black_box(0);
	let mut buf = Vec::new();
	loop {
	match r.read_event(&mut buf) {
	Ok(Event::Empty(e)) => {
	for attr in e.attributes().with_checks(false) {
	let _attr = attr.unwrap();
	count += 1
	}
	}
	Ok(Event::Eof) => break,
	_ => (),
	}
	buf.clear();
	}
	assert_eq!(count, 1041);
	})
	});

	group.bench_function("try_get_attribute", \|b\| {
	b.iter(\|\| {
	let mut r = Reader::from_reader(PLAYERS);
	r.check_end_names(false).check_comments(false);
	let mut count = criterion::black_box(0);
	let mut buf = Vec::new();
	loop {
	match r.read_event(&mut buf) {
	Ok(Event::Empty(e)) if e.name() == b"player" => {
	for name in ["num", "status", "avg"] {
	if let Some(_attr) = e.try_get_attribute(name).unwrap() {
	count += 1
	}
	}
	assert!(e
	.try_get_attribute("attribute-that-doesn't-exist")
	.unwrap()
	.is_none());
	}
	Ok(Event::Eof) => break,
	_ => (),
	}
	buf.clear();
	}
	assert_eq!(count, 150);
	})
	});
	group.finish();
	}

	impl<'a> Iterator for Attributes<'a> {
	type Item = Result<Attribute<'a>, AttrError>;

	#[inline]
	fn next(&mut self) -> Option<Self::Item> {
	match self.state.next(self.bytes) {
	None => None,
	Some(Ok(a)) => Some(Ok(a.map(\|range\| &self.bytes[range]).into())),
	Some(Err(e)) => Some(Err(e)),
	}
	}
	}

Fix attribute parsing #4

Fix attribute parsing #4

Conversation

Mingun commented May 3, 2022

dralley commented May 3, 2022

dralley commented May 3, 2022 • edited

Mingun commented May 4, 2022

Mingun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley commented May 5, 2022

Mingun commented May 5, 2022

Mingun commented May 6, 2022

dralley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley May 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley May 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley commented May 8, 2022 • edited

Mingun commented May 8, 2022

dralley commented May 3, 2022 •

edited

dralley May 7, 2022 •

edited

dralley May 7, 2022 •

edited

dralley commented May 8, 2022 •

edited