Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOMParser does not normalize line breaks #49

Closed
victorandree opened this issue May 14, 2020 · 2 comments · Fixed by #314
Closed

DOMParser does not normalize line breaks #49

victorandree opened this issue May 14, 2020 · 2 comments · Fixed by #314
Labels
bug Something isn't working spec:XML https://www.w3.org/TR/xml11/
Milestone

Comments

@victorandree
Copy link

If I parse an XML string containing newlines encoded as \r\n (#xD #xA), these are not normalized into \n (#xA). I think this violates the "End-of-Line Handling" section of the XML specification.

To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

@brodybits brodybits added the needs investigation Information is missing and needs to be researched label Jun 21, 2020
@karfau karfau added bug Something isn't working good first issue Good for newcomers help-wanted External contributions welcome spec:XML https://www.w3.org/TR/xml11/ labels Jan 21, 2021
@karfau karfau added this to the 0.6.0 milestone Jan 21, 2021
@karfau
Copy link
Member

karfau commented Jan 21, 2021

The more recent XML1.1 Spec also contains such a section:

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).
To simplify the tasks of applications, the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating all of the following to a single #xA character:

  • the two-character sequence #xD #xA
  • the two-character sequence #xD #x85
  • the single character #x85
  • the single character #x2028
  • any #xD character that is not immediately followed by #xA or #x85.

The characters #x85 and #x2028 cannot be reliably recognized and translated until an entity's encoding declaration (if present) has been read. Therefore, it is a fatal error to use them within the XML declaration or text declaration.

@karfau
Copy link
Member

karfau commented Aug 29, 2021

We solved half of it as part of #303. I should not have created that duplicate...

@karfau karfau modified the milestones: before 1.0.0, 0.8.0 Aug 29, 2021
karfau added a commit that referenced this issue Aug 31, 2021
karfau added a commit that referenced this issue Sep 9, 2021
karfau added a commit that referenced this issue Sep 9, 2021
@karfau karfau removed good first issue Good for newcomers help-wanted External contributions welcome needs investigation Information is missing and needs to be researched labels Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working spec:XML https://www.w3.org/TR/xml11/
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants