-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect serialization of XML attributes containing white space #284
Comments
(for reference, the affected characters which need to be escaped in attributes are |
update: this is an open specification bug in DOM Parsing: w3c/DOM-Parsing#59 |
@marrus-sh Thank you for reporting this, especially for finding and referencing the specification bug. I believe that the fix needs to be applied at least to the parsing side. But maybe also to the serialization part that you linked, but I'm not sure about it. I'm not sure how soon I will be able to address it, even though I'm very curious to see the failing tests. Is there any workaround for this that you are aware of? |
Just out of curiosity I started digging a bit. I finally understood why you are pointing at the serializer: By checking the (Chrome and Firefox) browser implementations I found out that the value we are currently having in the DOM is correct ( There is the following problem: Since xmldom is currently not taking care of normalizing line endings and is only taking care of replacing character references with their literal values (instead of doing everything described in https://www.w3.org/TR/xml/#AVNormalize), just converting literal characters in the DOM back to the character references will also convert literal input to character references, even though they should be converted to spaces. I'm not sure if this is a huge issue, the snapshot tests that need to be updated when adding the fix look better then before and are closer to the "expected" values: master...karfau:284-serialize-whitespace-literals-to-character-references Any opinions? |
this definitely looks closer to expected output! i might not be understanding the code correctly, but i think if you add a Line 265 in 621fad8
entityReplacer , it should handle the attribute normalization on parsing well enough? [you want to replace literal tabs/newlines/etc with spaces, but not character references.]
|
Yes, I already looked at that place in the code before, it's not the only line where this replacement took place, but I moved it into one place and added the replacement of white space literals.
|
this looks perfect, thank you!! no worries about it not landing right away; i’m just glad it’s getting fixed |
The code for serializing attributes (here:
xmldom/lib/dom.js
Lines 1138 to 1140 in 6ce4700
This is a problem, because XML parses literal newlines as spaces. However, it allows newlines in attributes when they are provided as character references. (See: https://www.w3.org/TR/xml/#AVNormalize.)
So:
serializes as
which then gets parsed as
which is not equivalent.
The text was updated successfully, but these errors were encountered: