Rework BOM handling and encoding API #399

Mingun · 2022-06-19T19:22:38Z

Fixes encoding feature breaks API #180
Fixes quick-xml captures UTF BOM as Event::Text #191. @TakaakiFuruse, could you look at this, whether it solves your problems properly?
Fixes reduce code redundancy for the encoding feature #262

This PR solves several highly-coupled problems, that blocks changing for safe namespace handling (#59). The main reason in that various decode methods accepts a reference to the Reader, but if we move namespace buffer inside the Reader, we could face with borrowing problems (actually, namespace buffer was decoupled from Reader due to this reason in #69).

First, I've fixed #191 by introducing a new StartText event which generated when reader encounters "text" before the XML markup. In well-formed XMLs that text could represent a BOM.

Secondly, I've removed all functions that handle BOM in inappropriate time. For example, there a no sense to handle BOM when you decode attribute values, because you will never get BOM there. For that reason I've moved all BOM handling code to the BytesStartText struct.

Because after that change you can handle BOM only as the first event from the reader, that allowed me to move the automatic BOM-based encoding detection to the reader instead of triggering it by calling BytesText::decode_without_bom (where the name is also incorrect, since the function removes BOM from text and decodes the rest content, so correct one is decode_with_bom_removal).

As encoding API already changed very much, I've also solved the #180 by unify function signature with and without encoding feature enabled.

codecov-commenter · 2022-06-19T19:31:12Z

Codecov Report

Merging #399 (d49a4b8) into master (3b37c0e) will decrease coverage by 0.42%.
The diff coverage is 75.20%.

@@            Coverage Diff             @@
##           master     #399      +/-   ##
==========================================
- Coverage   61.72%   61.29%   -0.43%     
==========================================
  Files          20       20              
  Lines       10233    10148      -85     
==========================================
- Hits         6316     6220      -96     
- Misses       3917     3928      +11

Flag	Coverage Δ
unittests	`61.29% <75.20%> (-0.43%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/de/escape.rs	`65.15% <ø> (-1.28%)`	⬇️
src/errors.rs	`9.52% <0.00%> (-2.85%)`	⬇️
src/events/attributes.rs	`94.19% <0.00%> (+3.52%)`	⬆️
src/lib.rs	`21.09% <0.00%> (-4.92%)`	⬇️
src/reader.rs	`88.38% <79.44%> (-0.02%)`	⬇️
src/de/mod.rs	`76.14% <80.00%> (+0.87%)`	⬆️
src/events/mod.rs	`75.57% <80.85%> (+1.23%)`	⬆️
src/de/seq.rs	`91.83% <100.00%> (-0.76%)`	⬇️
src/se/mod.rs	`93.81% <100.00%> (-0.01%)`	⬇️
src/writer.rs	`90.36% <100.00%> (+0.02%)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b37c0e...d49a4b8. Read the comment docs.

Changelog.md

…ters, read methods and private methods

…in the code

Changelog.md

Because `Decoder` type was private, hardly ever that someone use it

The method `Reader::decoder()` is public anyway, but its result type is not, which means that it cannot be used as method argument, and that is not good

BOM cannot arise in the attribute values - this is by definition a mark that can be located only at the begin of the stream and every attribute value is inside the stream

Holding reference to a reader prevents some usages in the future Co-authored-by: Daniel Alley <dalley@redhat.com>

dralley

You do a great job with testing, documentation, splitting up commits and such - I can't find too much to complain about :)

All the changes here look sensible

src/reader.rs

Mingun · 2022-06-20T17:58:30Z

@dralley, I tried to reword some sentences in changelog as best as I can, and also add a couple of comments to the TagState. Could you take a look at whether is all ok?

Changelog.md

dralley · 2022-06-20T18:28:56Z

I left some suggestions

…irst XML element

doing that on demand even at inappropriate time

…ethods

…ature `encoding` enabled and when it is disabled Co-authored-by: Daniel Alley <dalley@redhat.com>

Mingun · 2022-06-20T18:59:29Z

Thanks! In the final version I've updated benchmarks (they has been broken since #393) and applied you suggestions. Merging now

Mingun added bug encoding Issues related to support of various encodings of the XML documents labels Jun 19, 2022

Mingun requested a review from dralley June 19, 2022 19:22

Mingun added 3 commits June 20, 2022 00:32

Use Self instead of explicitly named type where applicable

22bbac1

Replace full path with import + short path

3febf08

Move Deref impls to the corresponding structs

0e60d2d

dralley reviewed Jun 19, 2022

View reviewed changes

Changelog.md Outdated Show resolved Hide resolved

dralley reviewed Jun 19, 2022

View reviewed changes

Changelog.md Outdated Show resolved Hide resolved

Mingun added 3 commits June 20, 2022 00:52

Split Reader implementation block into four sections: builders, get…

7cf5f57

…ters, read methods and private methods

Eliminate pointless string recoding in tests

7f2137d

Move decoding of a name into namespace_name to decrease redundancy …

aa802bf

…in the code

dralley reviewed Jun 19, 2022

View reviewed changes

Changelog.md Outdated Show resolved Hide resolved

Mingun and others added 4 commits June 20, 2022 00:53

Remove unused Decoder::decode_owned function

493c75a

Because `Decoder` type was private, hardly ever that someone use it

tafia#180: Make Decoder struct public

2863674

The method `Reader::decoder()` is public anyway, but its result type is not, which means that it cannot be used as method argument, and that is not good

Remove all *_without_bom functions from Attributes struct

24fac69

BOM cannot arise in the attribute values - this is by definition a mark that can be located only at the begin of the stream and every attribute value is inside the stream

Remove reader.decode() in flavor to reader.decoder().decode()

c8236e4

Holding reference to a reader prevents some usages in the future Co-authored-by: Daniel Alley <dalley@redhat.com>

Mingun force-pushed the bom branch from d8dce78 to 61cd193 Compare June 19, 2022 20:06

dralley approved these changes Jun 20, 2022

View reviewed changes

src/reader.rs Outdated Show resolved Hide resolved

Mingun force-pushed the bom branch from 61cd193 to 3368f58 Compare June 20, 2022 17:57

dralley reviewed Jun 20, 2022

View reviewed changes

Changelog.md Outdated Show resolved Hide resolved

dralley reviewed Jun 20, 2022

View reviewed changes

Changelog.md Outdated Show resolved Hide resolved

Mingun and others added 5 commits June 20, 2022 23:40

tafia#191: Add new event StartText which contains text before the f…

9f1e655

…irst XML element

Autodetect encoding automatically when start parsing instead of

15f5075

doing that on demand even at inappropriate time

Fix tafia#262: Decrease code redundancy by merging encoding-related m…

8d5aaaa

…ethods

Use ? operator instead of map_err

be9ec0c

Fix tafia#180: Eliminated the differences in the decoding API when fe…

d49a4b8

…ature `encoding` enabled and when it is disabled Co-authored-by: Daniel Alley <dalley@redhat.com>

Mingun force-pushed the bom branch from 3368f58 to d49a4b8 Compare June 20, 2022 18:49

Mingun merged commit 8fa6f1e into tafia:master Jun 20, 2022

Mingun deleted the bom branch June 20, 2022 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework BOM handling and encoding API #399

Rework BOM handling and encoding API #399

Mingun commented Jun 19, 2022

codecov-commenter commented Jun 19, 2022 •

edited

dralley left a comment

Mingun commented Jun 20, 2022 •

edited

dralley commented Jun 20, 2022

Mingun commented Jun 20, 2022

Rework BOM handling and encoding API #399

Rework BOM handling and encoding API #399

Conversation

Mingun commented Jun 19, 2022

codecov-commenter commented Jun 19, 2022 • edited

Codecov Report

dralley left a comment

Choose a reason for hiding this comment

Mingun commented Jun 20, 2022 • edited

dralley commented Jun 20, 2022

Mingun commented Jun 20, 2022

codecov-commenter commented Jun 19, 2022 •

edited

Mingun commented Jun 20, 2022 •

edited