Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: Avoid empty namespace value like xmlns:ds="" #168

Merged

Conversation

pdecat
Copy link
Contributor

@pdecat pdecat commented Dec 22, 2020

This PR resolves an issue where empty namespaces are added when parsing/serializing XML.

This a port of the old jindw/xmldom#236 PR from the previous upstream repository.

Description from the original PR:

to avoid empty namespace value like xmlns:ds="" (X3-88886): A namespace URL cannot be empty => or we produce an invalid XML document

@pdecat pdecat changed the title avoid empty namespace value like xmlns:ds="" Avoid empty namespace value like xmlns:ds="" Dec 22, 2020
Copy link
Member

@brodybits brodybits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It would be really helpful if you can add a test that demonstrates the problem that is resolved by this proposal.

@pdecat
Copy link
Contributor Author

pdecat commented Dec 22, 2020

Hi @brodybits, pushed the test that I had implemented locally before finding the old PR jindw/xmldom#236.

Without the code change, it would fail with:

 FAIL  test/parse/node.test.js
  ● XML Node Parse › prefixed without empty namespace

    expect(received).toStrictEqual(expected) // deep equality

    Expected: "<w:p><w:r>test1</w:r><w:r>test2</w:r></w:p>"
    Received: "<w:p xmlns:w=\"\"><w:r>test1</w:r><w:r>test2</w:r></w:p>"

       99 |             })
      100 |
    > 101 |             expect(documentElement.toString()).toStrictEqual(source)
          |                                                ^
      102 |     })
      103 |
      104 |     it('cdata comment', () => {

      at Object.<anonymous> (test/parse/node.test.js:101:38)

@brodybits
Copy link
Member

Thanks for the quick update. It does look good to me. I would appreciate a quick review from another committer if anyone has time.

@karfau
Copy link
Member

karfau commented Dec 23, 2020

@pdecat thank you for your contribution.
I just want to make sure I correctly understand the test you created:

  • Do I understand correctly, that using a namespace but not defining it's URI in the XML, is the way that xmldom produces the empty namespace (for w)?
  • Would it also be possible to create such a namespace using the API?
  • I'm wondering, if there would also be a way to avoid the empty namespace right away, not only when we try to serialize it?

@pdecat
Copy link
Contributor Author

pdecat commented Dec 23, 2020

@pdecat thank you for your contribution.
I just want to make sure I correctly understand the test you created:

  • Do I understand correctly, that using a namespace but not defining it's URI in the XML, is the way that xmldom produces the empty namespace (for w)?

Exactly.

  • Would it also be possible to create such a namespace using the API?

Yes, here is a new test for this:

diff --git a/test/dom/ns-test.test.js b/test/dom/ns-test.test.js
index 8fd1809..cfda2b3 100644
--- a/test/dom/ns-test.test.js
+++ b/test/dom/ns-test.test.js
@@ -67,4 +67,11 @@ describe('XML Namespace Parse', () => {
                        '<html test="a" xmlns="http://www.w3.org/1999/xhtml" xmlns:rmf="http://www.frankston.com/public"><rmf:foo hello="asdfa"><test xmlns="http://www.frankston.com/public" bar="valx"></test><test xmlns="http://rmf.vc/n2" bar="valx"></test></rmf:foo></html>'
                )
        })
+
+       it('prefixed tag without namespace', () => {
+               var document = new DOMParser().parseFromString('<xml/>', 'text/xml')
+               document.documentElement.appendChild(document.createElementNS("", "w:r"))
+
+               expect(document.toString()).toStrictEqual('<xml><w:r>test</w:r></xml>')
+       })
 })

Output from master:

   ● XML Namespace Parse › prefixed tag without namespace

    expect(received).toStrictEqual(expected) // deep equality

    Expected: "<xml><w:r>test</w:r></xml>"
    Received: "<xml><w:r xmlns:w=\"\">test</w:r></xml>"

      75 |              document.documentElement.appendChild(element)
      76 |
    > 77 |              expect(document.toString()).toStrictEqual('<xml><w:r>test</w:r></xml>')
         |                                          ^
      78 |      })
      79 | })
      80 |

      at Object.<anonymous> (test/dom/ns-test.test.js:77:31)
  • I'm wondering, if there would also be a way to avoid the empty namespace right away, not only when we try to serialize it?

That would probably be better indeed!

@pdecat
Copy link
Contributor Author

pdecat commented Dec 23, 2020

Same test is green from this branch.

Actually, isn't the issue really only in the serialization of the XML, where invalid XML is generated if the namespace URI is empty.

@karfau
Copy link
Member

karfau commented Dec 23, 2020

Thx for the fast replies.
When executing document.createElementNS("", "w:r") in the browser, it throws:

Uncaught DOMException: Failed to execute 'createElementNS' on 'Document': The namespace URI provided ('') is not valid for the qualified name provided ('w:r').

Which is also the behavior specified in the "Living standard" (which is not what xmldom is currently claiming to support, but is the most recent standard):

So I'm not sure how to decide wether it's a good idea to add that second test.

If possible we should check if the standard xmldom tries to implement allows such DOMs to be created, and if not we should also change xmldom to fail on them, but this would be a much bigger change in behavior than this PR.

Since xmldom currently supports creating such doms, your fix is totally valid and prevents output of invalid XML. So @brodybits I'm all for landing and releasing it.

@karfau karfau added this to the 0.5.0 milestone Jan 21, 2021
@pdecat
Copy link
Contributor Author

pdecat commented Mar 9, 2021

Hi @brodybits @karfau, I expected this PR would have been merged before the 0.5.0 release. Is the fixed CVE in that release the reason it was released in a hurry without going through all the milestone's PRs?

…ld_upstream_pr-236_avoid_empty_namespace
@brodybits brodybits changed the title Avoid empty namespace value like xmlns:ds="" refactor!: Avoid empty namespace value like xmlns:ds="" Mar 9, 2021
@brodybits
Copy link
Member

My apologies. I just pushed an update to merge from master and took the liberty to update the title according to "Conventional Commits". Note that the bang (!) is used to indicate that this should be considered a breaking change.

@karfau I think we should merge this now.

@pdecat I would like to ask for one more favor: please ping us again in case we do not merge this in the next 2-3 days. This has fallen though the cracks for way too long!!

@karfau karfau merged commit 82b0481 into xmldom:master Mar 10, 2021
@karfau
Copy link
Member

karfau commented Mar 10, 2021

@brodybits Merged it, but just realized that it's not a refactoring, which by definition never changes behavior.
I would say it is a fix.
And I never heard of the ! for breaking changes before.
Let us update the changelog accordingly and check if there are any more thing in the 0.5.0 milestone before releasing the next version.

@pdecat pdecat deleted the port_old_upstream_pr-236_avoid_empty_namespace branch March 10, 2021 06:53
@karfau karfau modified the milestones: 0.5.0, 0.6.0 Mar 10, 2021
@karfau
Copy link
Member

karfau commented Mar 10, 2021

@brodybits I checked the 0.5.0 release and cleaned it up, the only thing left is to update the changelog.

@karfau karfau changed the title refactor!: Avoid empty namespace value like xmlns:ds="" fix!: Avoid empty namespace value like xmlns:ds="" Apr 9, 2021
@pdecat
Copy link
Contributor Author

pdecat commented Jun 3, 2021

FTR, this fix is not complete as altering the added test case a bit reveals the issue is still present in some cases:

# git diff
diff --git a/test/parse/node.test.js b/test/parse/node.test.js
index 6c10dc8..acde64e 100644
--- a/test/parse/node.test.js
+++ b/test/parse/node.test.js
@@ -88,13 +88,13 @@ describe('XML Node Parse', () => {
        })

        it('prefixed without empty namespace', () => {
-               const source = '<w:p><w:r>test1</w:r><w:r>test2</w:r></w:p>'
+               const source = '<w:p><w:r><w:rPr><w:rFonts w:ascii=\"FreeSans\" w:hAnsi=\"FreeSans\"/><w:color w:val=\"FF0000\"/></w:rPr><w:t>test1</w:t></w:r><w:r><w:t>test2</w:t></w:r></w:p>'
                const { documentElement } = new DOMParser().parseFromString(source)

-               expect(documentElement.firstChild.firstChild).toMatchObject({
+               expect(documentElement.firstChild.lastChild.firstChild).toMatchObject({
                        nodeValue: 'test1',
                })
-               expect(documentElement.lastChild.firstChild).toMatchObject({
+               expect(documentElement.lastChild.firstChild.firstChild).toMatchObject({
                        nodeValue: 'test2',
                })

# ./node_modules/jest/bin/jest.js -t "prefixed without empty namespace"
 FAIL  test/parse/node.test.js
  ● XML Node Parse › prefixed without empty namespace

    expect(received).toStrictEqual(expected) // deep equality

    Expected: "<w:p><w:r><w:rPr><w:rFonts w:ascii=\"FreeSans\" w:hAnsi=\"FreeSans\"/><w:color w:val=\"FF0000\"/></w:rPr><w:t>test1</w:t></w:r><w:r><w:t>test2</w:t></w:r></w:p>"
    Received: "<w:p><w:r><w:rPr><w:rFonts xmlns:w=\"\" w:ascii=\"FreeSans\" w:hAnsi=\"FreeSans\"/><w:color w:val=\"FF0000\"/></w:rPr><w:t>test1</w:t></w:r><w:r><w:t>test2</w:t></w:r></w:p>"

       99 |             })
      100 |
    > 101 |             expect(documentElement.toString()).toStrictEqual(source)
          |                                                ^
      102 |     })
      103 |
      104 |     it('cdata comment', () => {

      at Object.<anonymous> (test/parse/node.test.js:101:38)


Summary of all failing tests
 FAIL  test/parse/node.test.js
  ● XML Node Parse › prefixed without empty namespace

    expect(received).toStrictEqual(expected) // deep equality

    Expected: "<w:p><w:r><w:rPr><w:rFonts w:ascii=\"FreeSans\" w:hAnsi=\"FreeSans\"/><w:color w:val=\"FF0000\"/></w:rPr><w:t>test1</w:t></w:r><w:r><w:t>test2</w:t></w:r></w:p>"
    Received: "<w:p><w:r><w:rPr><w:rFonts xmlns:w=\"\" w:ascii=\"FreeSans\" w:hAnsi=\"FreeSans\"/><w:color w:val=\"FF0000\"/></w:rPr><w:t>test1</w:t></w:r><w:r><w:t>test2</w:t></w:r></w:p>"

       99 |             })
      100 |
    > 101 |             expect(documentElement.toString()).toStrictEqual(source)
          |                                                ^
      102 |     })
      103 |
      104 |     it('cdata comment', () => {

      at Object.<anonymous> (test/parse/node.test.js:101:38)


Test Suites: 1 failed, 28 skipped, 1 of 29 total
Tests:       1 failed, 573 skipped, 574 total
Snapshots:   0 total
Time:        2.227 s, estimated 3 s
Ran all test suites with tests matching "prefixed without empty namespace".

@brodybits
Copy link
Member

Thanks for the altered test case. It would be extremely helpful if you could raise a new issue to help us track the problem.

@pdecat
Copy link
Contributor Author

pdecat commented Jun 4, 2021

Opened #243

@karfau
Copy link
Member

karfau commented Jul 26, 2021

Since this is the original issue I will put this information remarks here:

Just today I stumbled across two confusing details in the "Scoping" section of the XML namespaces spec:

The attribute value in a namespace declaration for a prefix may be empty. This has the effect, within the scope of the declaration, of removing any association of the prefix with a namespace name. Further declarations may re-declare the prefix again:

<?xml version="1.1"?>
<x xmlns:n1="http://www.w3.org">
    <n1:a/>               <!-- legal; the prefix n1 is bound to http://www.w3.org -->
    <x xmlns:n1="">
        <n1:a/>           <!-- illegal; the prefix n1 is not bound here -->
	<x xmlns:n1="http://www.w3.org">
            <n1:a/>       <!-- legal; the prefix n1 is bound again -->
        </x>
    </x>
</x>

So I again checked the section that we previously used to reason for the fix it says (emphasis mine):

[...] Furthermore, the attribute value in the innermost such declaration must not be an empty string.

I think the case described in the scoping section is really a niche case, so I'm not to worried for now.

But we just implemented

Furthermore, the attribute value in the innermost such declaration must not be an empty string.

so the example provided in the spec will not fail to be parsed and will even be valid when serializing into a string again, since we skip serializing the "redeclared empty value namespace".

I'm just putting it here for completeness sake and for people to find this information in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants