Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow found by OSS-Fuzz #158

Open
henryrneh opened this issue Feb 8, 2023 · 11 comments
Open

Stack overflow found by OSS-Fuzz #158

henryrneh opened this issue Feb 8, 2023 · 11 comments

Comments

@henryrneh
Copy link

henryrneh commented Feb 8, 2023

Dear dom4j developers,

Fuzzing has found a stack overflow in OSS-Fuzz with JVM Fuzzer Jazzer in dom4j. We have reviewed the finding and consider it security-related due to the potential of a denial of service.

Part of the crash stack trace:
== Java Exception: com.code_intelligence.jazzer.api.FuzzerSecurityIssueLow: Stack overflow (use '-Xss921k' to reproduce)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
Caused by: java.lang.StackOverflowError at java.base/java.lang.String.encodeWithEncoder(String.java:837)
at java.base/java.lang.String.encode(String.java:833)
at java.base/java.lang.String.getBytes(String.java:1786)
at com.code_intelligence.jazzer.runtime.TraceDataFlowNativeCallbacks.encodeForLibFuzzer(TraceDataFlowNativeCallbacks.java:166)
at com.code_intelligence.jazzer.runtime.TraceDataFlowNativeCallbacks.traceStrstr(TraceDataFlowNativeCallbacks.java:82)
at com.code_intelligence.jazzer.runtime.TraceCmpHooks.indexOf(TraceCmpHooks.java:185)
at org.dom4j.tree.NamespaceStack.getQName(NamespaceStack.java:199)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:194)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
...

We have included a reproducer.zip which contains a README that describes how to reproduce the issue.
We would appreciate if you could take a look into the findings. Do you see a risk that this might be exploited by untrusted input?

OSS-Fuzz Issue: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51351
Hint: The provided OSS-Fuzz Issue links are only accessible if the issue gets fixed or if you are the maintainer of the OSS-Fuzz project.

Fuzz target: https://github.com/google/oss-fuzz/blob/master/projects/dom4j/DOMReaderFuzzer.java

@henryrneh henryrneh changed the title Stack overflow found by OSS-Fuzz in dom4j Stack overflow found by OSS-Fuzz Feb 8, 2023
@FilipJirsak
Copy link
Contributor

I think the problem is not related to dom4j. According to the stacktrace, there seems to be an overflow in the XML parser. Try parsing the input with the same parser without using dom4j, or try using a different XML parser.

@henryrneh
Copy link
Author

Hello @FilipJirsak, thank you for the message.

Could you maybe elaborate what do you mean by:

According to the stacktrace, there seems to be an overflow in the XML parser.

My understanding from this issue is that DOMReader recursively calls readTree and readElement, did I understand something wrong? There are many recursive calls of the following two lines in the full stacktrace.

at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)

Could you clarify a bit more about choosing different XML parser? Is there a way to configure the fuzz target using a different XML parser?

I would not expect a stack overflow error if the XML nesting is too bug, but rather a dom4j exception.

Just to clarify, there's a recursion in the following code of DOMReader:

at org.dom4j.io.DOMReader.readElement(DOMReader.java:194)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
...

@henryrneh
Copy link
Author

Ping @FilipJirsak

@0roman
Copy link

0roman commented Mar 21, 2023

@FilipJirsak we have client which is affected by the issue. Do you plan to fix the issue in the upcoming weeks?

@FilipJirsak
Copy link
Contributor

There is no dom4j error listed in this issue yet. It is probably an XML document with too deep structure that the XML parser cannot handle. A dom4j error would be if the XML parser could process a particular document, but dom4j couldn't create a DOM from it.

@0roman
Copy link

0roman commented Mar 21, 2023

The stacktrace does not seem to point to an issue in Openjdk javax.xml.parsers.DocumentBuilder parser. The issue is triggered by org.dom4j.io.DOMReader.read(), isn't it?

@FilipJirsak
Copy link
Contributor

org.dom4j.io.DOMReader.read() only calls selected XML parser.

@0roman
Copy link

0roman commented Mar 21, 2023

Does this mean that dom4j is not able to handle any malformed XML data by default and is out of responsibility here?

@FilipJirsak
Copy link
Contributor

dom4j is not XML parser. It uses XML parser from JRE or other selected parser. dom4j only builds DOM from XML parser events. There is no XML file attached to this issue that would cause any problem.

@ecki
Copy link

ecki commented Mar 21, 2023

It works recursively on documents. That’s a design decision which can be argued about. It would be good if you could limit document nesting for untrusted sources - maybe in the parser. But I guess jdom could do it easier for depth or overall size.

@carlosame
Copy link
Contributor

XML parsing is not involved in the stack overflow. The reproducer parses a relatively small XML file (178KB) without any usage of DOM4J, then produces a plain DOM document (from the internal JDK DOM) and finally attempts to convert the DOM object to a DOM4J document. It's that last step that fails.

It works recursively on documents. That’s a design decision which can be argued about.

Recursion is difficult to avoid when you are recursively importing nodes, it depends on how you do it. I'm the author of a competing DOM implementation, so I decided to check whether mine had any issue when importing the same document. In the W3C DOM API importNode(Node, true) would be the equivalent of DOM4J's DOMReader.read(), so I did that for both the JDK DOM and my DOM:

Element docElm = doc.getDocumentElement();

// Import the tree to a new JDK DOM Document
Document newDoc = builder.newDocument();
Node newRoot = newDoc.importNode(docElm, true);
newDoc.appendChild(newRoot);

// Now import with css4j's DOM
DOMDocument cssdoc = new CSSDOMImplementation().createDocument("", null, null);
Node newCssRoot = cssdoc.importNode(docElm, true);
cssdoc.appendChild(newCssRoot);

Both imports are completed fast and flawlessly, so the same document isn't causing issues for other DOM implementations (and mine does recursion). The problem is most likely related to the weird way that DOM4J handles namespaces (NamespaceStack in this case).

There are other issues caused by the design of this library (see #79 or #114 which is also related to namespace handling), but even if someone took the time to fix them, that implies changing design decisions made 25 years ago and break backwards compatibility. To anyone hit by those issues, I'd suggest to look for more modern alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants