Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Normalize line breaks according to spec #307

Merged
merged 1 commit into from Aug 29, 2021
Merged

Conversation

karfau
Copy link
Member

@karfau karfau commented Aug 28, 2021

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Where #xD == \r and #xA == \n, so
\r\n => \n
\n\r => \n\n
\n => \n
\r => \n

BREAKING CHANGE: Certain combination of line break characters are normalized before parsing takes place and will no longer be preserved. For details see https://www.w3.org/TR/xml/#sec-line-ends

fixes #303

@karfau karfau added spec:XML https://www.w3.org/TR/xml11/ breaking change Some thing that requires a version bump due to breaking changes labels Aug 28, 2021
@karfau karfau requested a review from brodybits August 28, 2021 03:33
@karfau karfau force-pushed the normalize-line-endings branch 2 times, most recently from 4801a70 to 02de7af Compare August 28, 2021 04:18
> XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).
>
> To simplify the tasks of applications, the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Where `#xD` == `\r` and `#xA` == `\n`, so
` \r\n ` => ` \n `
` \n\r ` => ` \n\n `
` \n  ` => ` \n `
` \r  ` => ` \n `

BREAKING CHANGE: Certain combination of line break characters are normalized before parsing takes place and will no longer be preserved. For details see https://www.w3.org/TR/xml/#sec-line-ends

fixes #303
@karfau
Copy link
Member Author

karfau commented Aug 28, 2021

@brodybits I would like to land this one next, to avoid rebasing #291 over and over.

'#x20#xa#xa',
],
])(
'should normalize "\\r" not followed by "\\n" %s',
Copy link
Member

@brodybits brodybits Aug 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: should this be tested in attribute as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can not easily, since the the algorithm in the other PR would replace it with a space.
But since we are replacing all occurrences in one go before parsing, I don't think it matters to much.

@@ -29,3 +29,57 @@ describe('errorHandle', () => {
expect({ actual, ...errors }).toMatchSnapshot()
})
})

describe('whitespace', () => {
const whitespaceToHex = (str) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: could this be a more general-purpose reusable function?

Or am I violating this by suggesting a too-hasty abstraction: https://kentcdodds.com/blog/aha-programming

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about it. I think as soon as it is needed elsewhere we will find a proper place for it.

Copy link
Member

@brodybits brodybits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

I think it would be nice to take care of the minor nit, don't think it needs to be blocking though.

@karfau karfau merged commit 370a2ef into master Aug 29, 2021
@karfau karfau deleted the normalize-line-endings branch August 29, 2021 07:22
karfau added a commit that referenced this pull request Aug 31, 2021
karfau added a commit that referenced this pull request Sep 9, 2021
karfau added a commit that referenced this pull request Sep 9, 2021
mcab added a commit to Clever/saml2 that referenced this pull request Oct 15, 2022
For [0].

Any usage of @xmldom/xmldom >= 0.8.0 will normalize these. See [1] and [2].

The current xml-encryption (2.0.0) does not do this normalization, but will
in 2.0.1 [3]. It's technically within the path of xmlenc.decrypt() [4], but
this follows how assertions have been handled (not handling non-normalized
whitespace).

[0] https://github.com/Clever/saml2/blob/6da3e9c39c326a2f6793bb87c6d12c9ab4446585/lib/saml2.coffee#L242-L245
[1] xmldom/xmldom#307
[2] xmldom/xmldom#314
[3] auth0/node-xml-encryption#101
[4] https://github.com/auth0/node-xml-encryption/blob/291f3f10d5d1d571a3b6da2d411aa323398f5650/lib/xmlenc.js#L185
mcab added a commit to Clever/saml2 that referenced this pull request Oct 15, 2022
For [0].

Any usage of @xmldom/xmldom >= 0.8.0 will normalize these, see [1] and [2].

The current xml-encryption (2.0.0) does not do this normalization, but will
in 2.0.1 [3]. It's technically within the path of xmlenc.decrypt() [4], but
this follows how assertions have been handled (not handling non-normalized
whitespace).

For xml-crypto, this was changed in 3.0.0 with [5].

[0] https://github.com/Clever/saml2/blob/6da3e9c39c326a2f6793bb87c6d12c9ab4446585/lib/saml2.coffee#L242-L245
[1] xmldom/xmldom#307
[2] xmldom/xmldom#314
[3] auth0/node-xml-encryption#101
[4] https://github.com/auth0/node-xml-encryption/blob/291f3f10d5d1d571a3b6da2d411aa323398f5650/lib/xmlenc.js#L185
[5] node-saml/xml-crypto#261
mcab added a commit to Clever/saml2 that referenced this pull request Oct 15, 2022
For [0].

Any usage of @xmldom/xmldom >= 0.8.0 will normalize these, see [1] and [2].

The current xml-encryption (2.0.0) does not do this normalization, but will
in 2.0.1 [3]. It's technically within the path of xmlenc.decrypt() [4], but
this follows how assertions have been handled (not handling non-normalized
whitespace).

For xml-crypto, this was changed in 3.0.0 with [5].

[0] https://github.com/Clever/saml2/blob/6da3e9c39c326a2f6793bb87c6d12c9ab4446585/lib/saml2.coffee#L242-L245
[1] xmldom/xmldom#307
[2] xmldom/xmldom#314
[3] auth0/node-xml-encryption#101
[4] https://github.com/auth0/node-xml-encryption/blob/291f3f10d5d1d571a3b6da2d411aa323398f5650/lib/xmlenc.js#L185
[5] node-saml/xml-crypto#261
mcab added a commit to Clever/saml2 that referenced this pull request Oct 15, 2022
For [0].

Any usage of @xmldom/xmldom >= 0.8.0 will normalize these, see [1] and [2].

The current xml-encryption (2.0.0) does not do this normalization, but will
in 2.0.1 [3]. It's technically within the path of xmlenc.decrypt() [4], but
this follows how assertions have been handled (not handling non-normalized
whitespace).

For xml-crypto, this was changed in 3.0.0 with [5].

[0] https://github.com/Clever/saml2/blob/6da3e9c39c326a2f6793bb87c6d12c9ab4446585/lib/saml2.coffee#L242-L245
[1] xmldom/xmldom#307
[2] xmldom/xmldom#314
[3] auth0/node-xml-encryption#101
[4] https://github.com/auth0/node-xml-encryption/blob/291f3f10d5d1d571a3b6da2d411aa323398f5650/lib/xmlenc.js#L185
[5] node-saml/xml-crypto#261
gkwang added a commit to auth0/node-xml-encryption that referenced this pull request Oct 18, 2022
add semgrep support
uodate xmldom
breaking changes in xmldom:
xmldom/xmldom#310
xmldom/xmldom#314
xmldom/xmldom#307
gkwang added a commit to auth0/node-xml-encryption that referenced this pull request Oct 18, 2022
add semgrep support
uodate xmldom
breaking changes in xmldom:
xmldom/xmldom#310
xmldom/xmldom#314
xmldom/xmldom#307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Some thing that requires a version bump due to breaking changes spec:XML https://www.w3.org/TR/xml11/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xmldom is not normalizing line endings before parsing
2 participants