Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notations declared in external DTD subsets are reported as undefined #184

Open
nektarios-kitsios opened this issue Nov 24, 2023 · 6 comments

Comments

@nektarios-kitsios
Copy link

Validation fails with the error below when a NOTATION that is declared in an external DTD subset is referenced in an internal DTD subset:

om.ctc.wstx.exc.WstxValidationException:
1 referenced notation undefined: first one 'IMAGE'
 at [row,col {unknown-source}]: [1,109]
        at com.ctc.wstx.exc.WstxValidationException.create(WstxValidationException.java:50)
        at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:593)
        at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:601)
        at com.ctc.wstx.dtd.FullDTDReader._reportVCViolation(FullDTDReader.java:1989)
        at com.ctc.wstx.dtd.FullDTDReader._reportUndefinedNotationRefs(FullDTDReader.java:1967)
        at com.ctc.wstx.dtd.FullDTDReader.parseDTD(FullDTDReader.java:639)
        at com.ctc.wstx.dtd.FullDTDReader.readInternalSubset(FullDTDReader.java:427)
        at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:277)
        at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3466)
        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2089)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1180)
        at org.codehaus.stax.test.vstream.TestExternalSubset.testNotationReferenceInInternalSubset(TestExternalSubset.java:77)

The following example demonstrates this problem:

XML:

<!DOCTYPE root SYSTEM 'test.dtd' [ <!ENTITY gr2 SYSTEM "gr2" NDATA IMAGE> ]>
<root>&extEnt;</root> 

DTD:

<!ELEMENT root (#PCDATA)>
<!ENTITY extEnt 'just testing'>
<!NOTATION IMAGE       PUBLIC "-//AS//NOTATION image format//EN"
                     "http://www.test.com/xml/common/dtd/notation/image">

Note that Xerces and xmllint validate the above XML successfully. This is in accordance with the XML specification which does not impose any specific order in the declarations. The only requirement is the following:

If both the external and internal subsets are used, the internal subset must be considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

The problem seems to be that woodstox reads the internal subset before the external one, and reports the error at the time the internal subset is read when the notation declaration has not been read yet.

For the record there used to be an old issue reporting this problem here: https://web.archive.org/web/20150507153747/http://jira.codehaus.org/browse/WSTX-264

@nektarios-kitsios
Copy link
Author

Attached is a patch with a unit test that demonstrates this case and a potential fix: Issue_184_-_potential_fix_and_unit_test.patch.txt

@cowtowncoder
Copy link
Member

Thank you for reporting this issue. Interesting that this was not covered by the extensive XMLTest suite (I think maybe https://www.w3.org/XML/Test/ but it has been years) that had ~2000 tests, many focusing on DTD aspects.

I hope to find time to at least get test integrated and then have a look at proposed solution.

@nektarios-kitsios
Copy link
Author

Attached is a new version of the patch that fixes a problem with parameter entity overrides (demonstrated in a unit test). To solve this the external subset is read twice, but there may be better ways to fix this.

Issue_184_-_potential_fix_and_unit_test_v2.patch.txt

@cowtowncoder
Copy link
Member

@nektarios-kitsios Apologies for slow follow up here. Would it be possible to split patch in two parts: one for unit tests, another for actual fix? I should be able to merge tests first (and move under failing until fixed), and could see how fix itself would work.

@nektarios-kitsios
Copy link
Author

As requested, split the patch in two parts:

Issue_184_-_unit_tests.patch.txt
Issue_184_-_potential_fix.patch.txt

@cowtowncoder
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants