Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOMParser: Wrong data in PI element #42

Open
microshine opened this issue Mar 27, 2020 · 2 comments
Open

DOMParser: Wrong data in PI element #42

microshine opened this issue Mar 27, 2020 · 2 comments
Labels
help-wanted External contributions welcome needs investigation Information is missing and needs to be researched spec:DOM Level 3 https://www.w3.org/TR/DOM-Level-3-Core/ spec:DOM Living Standard https://dom.spec.whatwg.org/
Milestone

Comments

@microshine
Copy link
Contributor

  • Document childNodes mustnot show xml PI
  • PI node must have nodeName (instead of tagName) DOM 4
  • nodeValue doesn't have spaces at the end

Test code

const domParser = new DOMParser();
const xmlText = `<?xml version="1.0"?><?xml-stylesheet   href="doc.xsl"\n  type="text/xsl"   ?><test/>`;

const doc = domParser.parseFromString(xmlText, "application/xml");

console.log("Child nodes:", doc.childNodes.length);
console.log(`Node name: '${doc.childNodes[0].nodeName}'`);
console.log(`Node value: '${doc.childNodes[0].nodeValue}'`);

xmldom result

Child nodes: 3
Node name: 'undefined'
Node value: 'version="1.0"'

xmldom xml-stylesheet PI

Node name: 'undefined'
Node value: 'href="doc.xsl"
  type="text/xsl"'

Chrome result

Child nodes: 2
VM1167:7 Node name: 'xml-stylesheet'
VM1167:8 Node value: 'href="doc.xsl"
  type="text/xsl"   '
@brodybits brodybits added help-wanted External contributions welcome needs investigation Information is missing and needs to be researched labels Jun 21, 2020
@karfau karfau added spec:DOM Living Standard https://dom.spec.whatwg.org/ spec:DOM Level 3 https://www.w3.org/TR/DOM-Level-3-Core/ labels Jan 21, 2021
@karfau
Copy link
Member

karfau commented Jan 21, 2021

xmldom is currently targeting DOM Level 2, not the "Living standard" aka "DOM Level 4".
We will need to check to what degree those things are implemented.
Adding aliases/getters for the properties you propose would also make it compatible wit DOM Level 3 and 4. so it might be a low hanging fruit.

But the current mechanism to serialize all nodes back to a string, is that they are in childNodes. I see that browser implementations do not contain the <?xml version="1.0"?> in childNodes but still serialize it. I need to figure out how they store it/how they decide when/what to put as part of the serialized string.

PRs with tests are welcome but I would consider it a nice to have right now. (Discussion in #176 might change this.)

@karfau karfau added this to the before 1.0.0 milestone Aug 28, 2021
@karfau
Copy link
Member

karfau commented Aug 28, 2021

I just did some very lengthy research on this topic.
TLDR: It's a mess. You/We might want to split the different issues.

  1. <?xml version="1.0" [...]?> is not a processing instruction (even though it has the same syntax), but an XML declaration. It is optional and even browsers seem to not agree on preserving it when serializing a DOM. jsdom regularly has issues filed for it as well. There seems to be agreement that SAX parser should not fire a processing instruction event on this data, but our current implementation does. (But at some point in time we want to get rid of it: replace lib/sax.js? #55, in which case we will also have to deal with this issue in one way or the other.) We are currently doing quite a good job at preserving it, so no longer treating it as a procesing instruction would definitely be a breaking change, potentially even if we implement a different way to preserve it.
  • I guess it's currently less effort for downstreams to work around this by dropping or adding this "childnode" when needed.
  • By the way the only way to access the information about the XML declaration is DOM Level 3 Document attributes xmlVersion, xmlEncoding, xmlStandalone which have been removed from the DOM Living Standard (aka Level 4). So we would need to come up with something that is not present in Browsers/specs.
  • The only path that seems reasonable if somebody wants to invest time into this in the near future, would be to provide an optional handling of this, that users can enable.
  1. I agree that xmldom is obviously stripping data from processing instructions which it should not. I will need to dig deeper to see what we can do about it.

@microshine

  • Can we treat those issues separately?
  • Should this one just be about the missing data in PI, since the title would fit?

Any additional information you want to/can provide? (It looks like the linked issue xadesjs has been closed, by some workaround?)

@karfau karfau modified the milestone: before 1.0.0 Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help-wanted External contributions welcome needs investigation Information is missing and needs to be researched spec:DOM Level 3 https://www.w3.org/TR/DOM-Level-3-Core/ spec:DOM Living Standard https://dom.spec.whatwg.org/
Projects
Status: Has Priority
Development

No branches or pull requests

3 participants