New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
entity en/decoding different between parser and serializer #421
Comments
Thank you for this awesome bug report. Can you please check if the behavior is still present in the version that is upcoming as part of #338 ? If it doesn't already solve the issue you are describing, I would ask you to either base your work on that PR (I can take care of rebasing your branch if required) or wait until it has been merged. I didn't find much time to work on this repo recently, but I want to get back to it in the next weeks/months. |
apos should NOT be encoded in attributes according to the latest XMLSerializer spec: https://w3c.github.io/DOM-Parsing/#dfn-serializing-an-attribute-value so this is not a bug, it is expected behaviour |
as for “decoding entities in text nodes”, this is necessary for things like const doc = new DOMParser().parseFromString("<root>&<>'"</root>", 'text/xml')
doc.documentElement.textContent;
// should be &<>'"
new XMLSerializer().serializeToString(doc);
// should be <root>&<>'"</root> you can run these in your browser console and see that this is the expected result |
Hey @marrus-sh , Thanks for taking a close look. I only looked at the XML spec before .... 😅 Then I guess I'll have to change the HTML Mode to de- and encode everything that looks like a known entity and use this to modify the seemingly non compliant XMLs, that I have to deal with. I'll check the linked spec and the behaviour in browsers, thanks for pointing it out! |
@sbresin @marrus-sh do I understand correctly that the current behavior of xmldom is what is expected by the spec and also what happens in browsers and we can close this issue as wontfix? Ps: in case we need to treat html and xml differently, we are now able to do that quite easily and reliably. |
Description
According to the XML-Spec,
<
,>
,&
have to be encoded in attributes and text nodes.In attributes additionally
'
and"
have to be encoded.The
XMLSerializer
does this encoding according to the spec. (except for'
in attributes, which is a bug, but super easily fixed)The parser on the other hand, decodes all 5 entities in attributes AND in text nodes.
I have to process XMLs, where all 5 entities are also encoded for text fields. Parsing, modifying and then serializing these XMLs then changes all the text nodes.
How to replicate
outputs this:
Solution
I am happy to open a PR for this, but first wanted to clarify the approach:
&
,<
and>
for text nodes (here)The text was updated successfully, but these errors were encountered: