fix: Normalize line breaks according to spec #307

karfau · 2021-08-28T03:33:57Z

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Where #xD == \r and #xA == \n, so
\r\n => \n
\n\r => \n\n
\n => \n
\r => \n

BREAKING CHANGE: Certain combination of line break characters are normalized before parsing takes place and will no longer be preserved. For details see https://www.w3.org/TR/xml/#sec-line-ends

fixes #303

> XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA). > > To simplify the tasks of applications, the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character. Where `#xD` == `\r` and `#xA` == `\n`, so ` \r\n ` => ` \n ` ` \n\r ` => ` \n\n ` ` \n ` => ` \n ` ` \r ` => ` \n ` BREAKING CHANGE: Certain combination of line break characters are normalized before parsing takes place and will no longer be preserved. For details see https://www.w3.org/TR/xml/#sec-line-ends fixes #303

karfau · 2021-08-28T05:58:08Z

@brodybits I would like to land this one next, to avoid rebasing #291 over and over.

brodybits · 2021-08-29T01:45:28Z

test/parse/test-doc-whitespace.test.js

+			'#x20#xa#xa',
+		],
+	])(
+		'should normalize "\\r" not followed by "\\n" %s',


minor nit: should this be tested in attribute as well?

It can not easily, since the the algorithm in the other PR would replace it with a space.
But since we are replacing all occurrences in one go before parsing, I don't think it matters to much.

brodybits · 2021-08-29T01:54:07Z

test/parse/test-doc-whitespace.test.js

@@ -29,3 +29,57 @@ describe('errorHandle', () => {
 		expect({ actual, ...errors }).toMatchSnapshot()
 	})
 })
+
+describe('whitespace', () => {
+	const whitespaceToHex = (str) =>


question: could this be a more general-purpose reusable function?

Or am I violating this by suggesting a too-hasty abstraction: https://kentcdodds.com/blog/aha-programming

I was also thinking about it. I think as soon as it is needed elsewhere we will find a proper place for it.

brodybits

LGTM thanks!

I think it would be nice to take care of the minor nit, don't think it needs to be blocking though.

according to XML1.1 spec. Fixes #49 Since #307 only implemented XML 1.0 spec https://www.w3.org/TR/xml/#sec-line-ends https://www.w3.org/TR/xml11/#sec-line-ends

For [0]. Any usage of @xmldom/xmldom >= 0.8.0 will normalize these. See [1] and [2]. The current xml-encryption (2.0.0) does not do this normalization, but will in 2.0.1 [3]. It's technically within the path of xmlenc.decrypt() [4], but this follows how assertions have been handled (not handling non-normalized whitespace). [0] https://github.com/Clever/saml2/blob/6da3e9c39c326a2f6793bb87c6d12c9ab4446585/lib/saml2.coffee#L242-L245 [1] xmldom/xmldom#307 [2] xmldom/xmldom#314 [3] auth0/node-xml-encryption#101 [4] https://github.com/auth0/node-xml-encryption/blob/291f3f10d5d1d571a3b6da2d411aa323398f5650/lib/xmlenc.js#L185

For [0]. Any usage of @xmldom/xmldom >= 0.8.0 will normalize these, see [1] and [2]. The current xml-encryption (2.0.0) does not do this normalization, but will in 2.0.1 [3]. It's technically within the path of xmlenc.decrypt() [4], but this follows how assertions have been handled (not handling non-normalized whitespace). For xml-crypto, this was changed in 3.0.0 with [5]. [0] https://github.com/Clever/saml2/blob/6da3e9c39c326a2f6793bb87c6d12c9ab4446585/lib/saml2.coffee#L242-L245 [1] xmldom/xmldom#307 [2] xmldom/xmldom#314 [3] auth0/node-xml-encryption#101 [4] https://github.com/auth0/node-xml-encryption/blob/291f3f10d5d1d571a3b6da2d411aa323398f5650/lib/xmlenc.js#L185 [5] node-saml/xml-crypto#261

add semgrep support uodate xmldom breaking changes in xmldom: xmldom/xmldom#310 xmldom/xmldom#314 xmldom/xmldom#307

karfau added spec:XML https://www.w3.org/TR/xml11/ breaking change Some thing that requires a version bump due to breaking changes labels Aug 28, 2021

karfau requested a review from brodybits August 28, 2021 03:33

karfau mentioned this pull request Aug 28, 2021

fix: Normalize line breaks according to spec #305

Closed

karfau force-pushed the normalize-line-endings branch 2 times, most recently from 4801a70 to 02de7af Compare August 28, 2021 04:18

karfau added this to the next release with breaking changes milestone Aug 28, 2021

karfau force-pushed the normalize-line-endings branch from 02de7af to d2fb058 Compare August 28, 2021 05:57

brodybits reviewed Aug 29, 2021

View reviewed changes

brodybits approved these changes Aug 29, 2021

View reviewed changes

karfau merged commit 370a2ef into master Aug 29, 2021

karfau deleted the normalize-line-endings branch August 29, 2021 07:22

karfau added a commit that referenced this pull request Aug 31, 2021

fix: Normalize all line endings

9fb50d3

according to XML1.1 spec. Fixes #49 Since #307 only implemented XML 1.0 spec https://www.w3.org/TR/xml/#sec-line-ends https://www.w3.org/TR/xml11/#sec-line-ends

karfau mentioned this pull request Aug 31, 2021

fix: Normalize all line endings #314

Merged

karfau added a commit that referenced this pull request Sep 9, 2021

fix: Normalize all line endings

093f20b

according to XML1.1 spec. Fixes #49 Since #307 only implemented XML 1.0 spec https://www.w3.org/TR/xml/#sec-line-ends https://www.w3.org/TR/xml11/#sec-line-ends

karfau added a commit that referenced this pull request Sep 9, 2021

fix: Normalize all line endings

6478b29

according to XML1.1 spec. Fixes #49 Since #307 only implemented XML 1.0 spec https://www.w3.org/TR/xml/#sec-line-ends https://www.w3.org/TR/xml11/#sec-line-ends

gkwang added a commit to auth0/node-xml-encryption that referenced this pull request Oct 18, 2022

release 3.0.0

a3ff9cd

add semgrep support uodate xmldom breaking changes in xmldom: xmldom/xmldom#310 xmldom/xmldom#314 xmldom/xmldom#307

gkwang mentioned this pull request Oct 18, 2022

release 3.0.0 auth0/node-xml-encryption#104

Merged

4 tasks

gkwang added a commit to auth0/node-xml-encryption that referenced this pull request Oct 18, 2022

release 3.0.0 (#104)

6ae0fcd

add semgrep support uodate xmldom breaking changes in xmldom: xmldom/xmldom#310 xmldom/xmldom#314 xmldom/xmldom#307

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Normalize line breaks according to spec #307

fix: Normalize line breaks according to spec #307

karfau commented Aug 28, 2021

karfau commented Aug 28, 2021

brodybits Aug 29, 2021 •

edited

karfau Aug 29, 2021

brodybits Aug 29, 2021

karfau Aug 29, 2021

brodybits left a comment

fix: Normalize line breaks according to spec #307

fix: Normalize line breaks according to spec #307

Conversation

karfau commented Aug 28, 2021

karfau commented Aug 28, 2021

brodybits Aug 29, 2021 • edited

Choose a reason for hiding this comment

karfau Aug 29, 2021

Choose a reason for hiding this comment

brodybits Aug 29, 2021

Choose a reason for hiding this comment

karfau Aug 29, 2021

Choose a reason for hiding this comment

brodybits left a comment

Choose a reason for hiding this comment

brodybits Aug 29, 2021 •

edited