Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

& parsing wrong when xmlns points to XHTML #203

Closed
SmartLayer opened this issue Mar 30, 2021 · 8 comments · Fixed by #338
Closed

& parsing wrong when xmlns points to XHTML #203

SmartLayer opened this issue Mar 30, 2021 · 8 comments · Fixed by #338
Assignees
Labels
needs investigation Information is missing and needs to be researched spec:DOM Living Standard https://dom.spec.whatwg.org/ spec:DOM-Level-2 https://www.w3.org/TR/DOM-Level-2-Core/ spec:no standard
Milestone

Comments

@SmartLayer
Copy link

SmartLayer commented Mar 30, 2021

$ cat xmldom_bug_22.js 
const { DOMParser } = require('@xmldom/xmldom')

const items = [
  `<html xmlns="http://www.w3.org/1999/xhtml"><script>let message = " &amp; ETH";</script></html>`,
  `<html><script>let message = " &amp; ETH";</script></html>`,
];

for (const item of items) {
  const xml = new DOMParser().parseFromString(item, "application/xml");
  console.log(xml.documentElement.childNodes[0].textContent);
}
$ node xmldom_bug_22.js 
let message = " &amp; ETH";
let message = " & ETH";

Where both messages should be the same. Something trigged the misintrepreation when xmlns="http://www.w3.org/1999/xhtml" presents.

P.S. this is not a duplicate of #58

@karfau
Copy link
Member

karfau commented Mar 30, 2021

@colourful-land Thx for reporting this, providing a test scenario and linking all related issues.

  1. Just to be sure: What versions of xmldom have you tried this with?

  2. Did I understand correctly, that you expect the output to be let message = " & ETH"; in both cases?

@SmartLayer
Copy link
Author

1. Just to be sure: What versions of xmldom have you tried this with?

I first tried 0.5.0 - failing the test, I tried again with the master branch from github, still fails.

2. Did I understand correctly, that you expect the output to be `let message = " & ETH";` in both cases?

Yes, as it is the case if the code runs in a browser using the browser's dom parser.

@karfau karfau self-assigned this Mar 31, 2021
@karfau karfau added needs investigation Information is missing and needs to be researched spec:DOM Living Standard https://dom.spec.whatwg.org/ spec:DOM-Level-2 https://www.w3.org/TR/DOM-Level-2-Core/ spec:no standard labels Mar 31, 2021
@karfau karfau added this to the next release milestone Mar 31, 2021
karfau added a commit to karfau/xmldom that referenced this issue Apr 3, 2021
Before starting to work on fixing xmldom#203, where namespaces and mime types are going to be used in multiple places, I decided to gather them in a separate module.

**conventions.js**
- `freeze`: is a small wrapper around `Object.freeze`, that gives type hints
  and also "works" if Object.freeze should not be available at runtime
- `MIME_TYPE`: has all mime types that are supported by `DOMParser.parseFromString` (second argument)
- `MIME_TYPE.isHTML`: short way to check if a mime type is the only one that enables HTML parsing
- `NAMESPACE`: has all namespaces that we are currently using in the code base
- `NAMESPACE.isHTML`: to check if a given value is the XHTML namespace

**entities.js**
- deprecated `entityMap` (but it's still available as an alias to `HTML_ENTITIES`)
- `HTML_ENTITIES`: new name for `entityMap`, is now immutable
- `XML_ENTITIES`: immutable map for the predefined XML entities
karfau added a commit to karfau/xmldom that referenced this issue Apr 3, 2021
Before starting to work on fixing xmldom#203, where namespaces and mime types are going to be used in multiple places, I decided to gather them in a separate module.

**conventions.js**
- `freeze`: is a small wrapper around `Object.freeze`, that gives type hints
  and also "works" if `Object.freeze` should not be available at runtime
- `MIME_TYPE`: has all mime types that are supported by `DOMParser.parseFromString` (second argument)
- `MIME_TYPE.isHTML`: short way to check if a mime type is the only one that enables HTML parsing
- `NAMESPACE`: has all namespaces that we are currently using in the code base
- `NAMESPACE.isHTML`: to check if a given value is the XHTML namespace

**entities.js**
- deprecated `entityMap` (but it's still available as an alias to `HTML_ENTITIES`)
- `HTML_ENTITIES`: new name for `entityMap`, is now immutable
- `XML_ENTITIES`: immutable map for the predefined XML entities
@karfau
Copy link
Member

karfau commented Apr 3, 2021

FYI: I have a rough plan how to fix this issue (I already invested quite some time into a draft so far), but it's not a trivial one, and it's a breaking change.

The main issue is that xmldom currently "detects" the default HTML namespace when parsing and applies certain rules, that should only be applied when setting the second argument mimeType of DOMParser.parseFromString() to 'text/html'. (According to the most current spec.)

Since there will be quite some changes and quite some missing tests for the areas of code that I need to touch, I will file multiple PRs before we can resolve this issue.

@SmartLayer
Copy link
Author

Thank you very much @karfau. Right now I simply replace all occurrences of xhtml namespace in lib/sax.js and lib/dom.js to keep my project going. Hope to see a new release so I can rever to use unmodified xmldom in the near future!

@karfau
Copy link
Member

karfau commented Apr 5, 2021

@colourful-land Interesting idea. I hope it helps more than it does any damage. Maybe you run the tests to see what things change/break. Just to be sure this doesn't introduce any serious bugs.

Let m know in case any support is needed.

I'm also looking forward to the next release(s), but can not promise any timeline. 😄

@karfau
Copy link
Member

karfau commented Dec 21, 2021

I needed to shift this into the next breaking release. I already invested quite some time into it, but it's not done and I don't want to hold back 0.8.0 any longer, which already has other potentially breaking changes (and mixing up to many of them is also not so nice for people doing updates).

@karfau
Copy link
Member

karfau commented Feb 16, 2022

Great news: The version I have in #338 no longer has the issue you describe.

It still is quite a massive change, so it will still need some time to get the PR done.

With all the breaking changes I think I will try to create a pre-release, so you (and others can test it before we release it as 0.9.0

@karfau
Copy link
Member

karfau commented Oct 9, 2022

@weiwu-zhang I finally managed to merge and release the PR before it managed to be alive for a whole year.
I created a pre-releaseas mentioned, so it will only be installable as @xmldom/xmldom@next or @xmldom/xmldom@0.9.0-beta.1.

THank you fro your endurance and let me know in case you find any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs investigation Information is missing and needs to be researched spec:DOM Living Standard https://dom.spec.whatwg.org/ spec:DOM-Level-2 https://www.w3.org/TR/DOM-Level-2-Core/ spec:no standard
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants