Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid characters inside CDATA section #201

Open
marcinar opened this issue Mar 20, 2024 · 5 comments
Open

Invalid characters inside CDATA section #201

marcinar opened this issue Mar 20, 2024 · 5 comments

Comments

@marcinar
Copy link

Example:

var factory = XMLOutputFactory.newInstance();
factory.setProperty(WstxOutputProperties.P_OUTPUT_INVALID_CHAR_HANDLER, new ReplacingHandler('�'));

try (var output = Files.newOutputStream(Paths.get("/tmp/test.xml"))) {
    XMLStreamWriter writer = factory.createXMLStreamWriter(output);

    writer.writeStartDocument();
    writer.writeStartElement("Test");
    writer.writeCData("Text content\u001A...");
    writer.writeEndElement();
    writer.writeEndDocument();

    writer.close();
}

In this case, the writer doesn't replace (or even check for) any invalid characters. The output contains the 0x1A codepoint inside the CDATA section, which isn't valid XML.

@cowtowncoder
Copy link
Member

Hmmh. Ok, handler should definitely be called and used as expected.

Quick question: what implementation (which class) is writer here?

@marcinar
Copy link
Author

marcinar commented Mar 21, 2024

It's com.ctc.wstx.sw.SimpleNsStreamWriter, underlying outputter: com.ctc.wstx.sw.BufferingXmlWriter, the buffering writer uses com.ctc.wstx.io.UTF8Writer. I think that all stream writers using this outputter are affected.

@cowtowncoder
Copy link
Member

There is probably a test for this feature but one that might not output CDATA sections. One thing to try would be to modify the test case and reproduce the issue.

@marcinar
Copy link
Author

marcinar commented Mar 22, 2024

You're correct. There is a test for this exact thing, but the failure is suppressed:

doTestValid(f, evtType, "UTF-8", false);

Setting the last parameter to true reproduces the issue.

It also refers to an old issue which describes the problems with BufferingXmlWriter: https://web.archive.org/web/20150507153750/http://jira.codehaus.org/browse/WSTX-173.

@cowtowncoder
Copy link
Member

Ok. Too bad that original issue (WSTX-173) was not transferred from Codehaus Jira.

But at least there is a reproduction.

I don't think I will have much time to work on this in near future, but if anyone else has time and interest will be happy to do code reviews and help getting fix(es) merged.

Thank you for reporting this @marcinar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants