New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework BOM handling and encoding API #399
Conversation
Codecov Report
@@ Coverage Diff @@
## master #399 +/- ##
==========================================
- Coverage 61.72% 61.29% -0.43%
==========================================
Files 20 20
Lines 10233 10148 -85
==========================================
- Hits 6316 6220 -96
- Misses 3917 3928 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
…ters, read methods and private methods
Because `Decoder` type was private, hardly ever that someone use it
The method `Reader::decoder()` is public anyway, but its result type is not, which means that it cannot be used as method argument, and that is not good
BOM cannot arise in the attribute values - this is by definition a mark that can be located only at the begin of the stream and every attribute value is inside the stream
Holding reference to a reader prevents some usages in the future Co-authored-by: Daniel Alley <dalley@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do a great job with testing, documentation, splitting up commits and such - I can't find too much to complain about :)
All the changes here look sensible
@dralley, I tried to reword some sentences in changelog as best as I can, and also add a couple of comments to the |
I left some suggestions |
…irst XML element
doing that on demand even at inappropriate time
…ature `encoding` enabled and when it is disabled Co-authored-by: Daniel Alley <dalley@redhat.com>
Thanks! In the final version I've updated benchmarks (they has been broken since #393) and applied you suggestions. Merging now |
encoding
feature #262This PR solves several highly-coupled problems, that blocks changing for safe namespace handling (#59). The main reason in that various
decode
methods accepts a reference to theReader
, but if we move namespace buffer inside theReader
, we could face with borrowing problems (actually, namespace buffer was decoupled fromReader
due to this reason in #69).First, I've fixed #191 by introducing a new
StartText
event which generated when reader encounters "text" before the XML markup. In well-formed XMLs that text could represent a BOM.Secondly, I've removed all functions that handle BOM in inappropriate time. For example, there a no sense to handle BOM when you decode attribute values, because you will never get BOM there. For that reason I've moved all BOM handling code to the
BytesStartText
struct.Because after that change you can handle BOM only as the first event from the reader, that allowed me to move the automatic BOM-based encoding detection to the reader instead of triggering it by calling
BytesText::decode_without_bom
(where the name is also incorrect, since the function removes BOM from text and decodes the rest content, so correct one isdecode_with_bom_removal
).As encoding API already changed very much, I've also solved the #180 by unify function signature with and without
encoding
feature enabled.