Stack overflow found by OSS-Fuzz #158

henryrneh · 2023-02-08T17:49:40Z

Dear dom4j developers,

Fuzzing has found a stack overflow in OSS-Fuzz with JVM Fuzzer Jazzer in dom4j. We have reviewed the finding and consider it security-related due to the potential of a denial of service.

Part of the crash stack trace:
== Java Exception: com.code_intelligence.jazzer.api.FuzzerSecurityIssueLow: Stack overflow (use '-Xss921k' to reproduce)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
Caused by: java.lang.StackOverflowError at java.base/java.lang.String.encodeWithEncoder(String.java:837)
at java.base/java.lang.String.encode(String.java:833)
at java.base/java.lang.String.getBytes(String.java:1786)
at com.code_intelligence.jazzer.runtime.TraceDataFlowNativeCallbacks.encodeForLibFuzzer(TraceDataFlowNativeCallbacks.java:166)
at com.code_intelligence.jazzer.runtime.TraceDataFlowNativeCallbacks.traceStrstr(TraceDataFlowNativeCallbacks.java:82)
at com.code_intelligence.jazzer.runtime.TraceCmpHooks.indexOf(TraceCmpHooks.java:185)
at org.dom4j.tree.NamespaceStack.getQName(NamespaceStack.java:199)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:194)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
...

We have included a reproducer.zip which contains a README that describes how to reproduce the issue.
We would appreciate if you could take a look into the findings. Do you see a risk that this might be exploited by untrusted input?

OSS-Fuzz Issue: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51351
Hint: The provided OSS-Fuzz Issue links are only accessible if the issue gets fixed or if you are the maintainer of the OSS-Fuzz project.

Fuzz target: https://github.com/google/oss-fuzz/blob/master/projects/dom4j/DOMReaderFuzzer.java

FilipJirsak · 2023-02-09T21:40:29Z

I think the problem is not related to dom4j. According to the stacktrace, there seems to be an overflow in the XML parser. Try parsing the input with the same parser without using dom4j, or try using a different XML parser.

henryrneh · 2023-02-14T12:16:05Z

Hello @FilipJirsak, thank you for the message.

Could you maybe elaborate what do you mean by:

According to the stacktrace, there seems to be an overflow in the XML parser.

My understanding from this issue is that DOMReader recursively calls readTree and readElement, did I understand something wrong? There are many recursive calls of the following two lines in the full stacktrace.

at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)

Could you clarify a bit more about choosing different XML parser? Is there a way to configure the fuzz target using a different XML parser?

I would not expect a stack overflow error if the XML nesting is too bug, but rather a dom4j exception.

Just to clarify, there's a recursion in the following code of DOMReader:

at org.dom4j.io.DOMReader.readElement(DOMReader.java:194)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:237)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:100)
...

henryrneh · 2023-02-27T09:29:24Z

Ping @FilipJirsak

0roman · 2023-03-21T09:11:44Z

@FilipJirsak we have client which is affected by the issue. Do you plan to fix the issue in the upcoming weeks?

FilipJirsak · 2023-03-21T13:49:26Z

There is no dom4j error listed in this issue yet. It is probably an XML document with too deep structure that the XML parser cannot handle. A dom4j error would be if the XML parser could process a particular document, but dom4j couldn't create a DOM from it.

0roman · 2023-03-21T20:10:18Z

The stacktrace does not seem to point to an issue in Openjdk javax.xml.parsers.DocumentBuilder parser. The issue is triggered by org.dom4j.io.DOMReader.read(), isn't it?

FilipJirsak · 2023-03-21T20:16:09Z

org.dom4j.io.DOMReader.read() only calls selected XML parser.

0roman · 2023-03-21T20:27:44Z

Does this mean that dom4j is not able to handle any malformed XML data by default and is out of responsibility here?

FilipJirsak · 2023-03-21T20:41:27Z

dom4j is not XML parser. It uses XML parser from JRE or other selected parser. dom4j only builds DOM from XML parser events. There is no XML file attached to this issue that would cause any problem.

ecki · 2023-03-21T22:45:09Z

It works recursively on documents. That’s a design decision which can be argued about. It would be good if you could limit document nesting for untrusted sources - maybe in the parser. But I guess jdom could do it easier for depth or overall size.

carlosame · 2023-03-31T11:40:32Z

XML parsing is not involved in the stack overflow. The reproducer parses a relatively small XML file (178KB) without any usage of DOM4J, then produces a plain DOM document (from the internal JDK DOM) and finally attempts to convert the DOM object to a DOM4J document. It's that last step that fails.

It works recursively on documents. That’s a design decision which can be argued about.

Recursion is difficult to avoid when you are recursively importing nodes, it depends on how you do it. I'm the author of a competing DOM implementation, so I decided to check whether mine had any issue when importing the same document. In the W3C DOM API importNode(Node, true) would be the equivalent of DOM4J's DOMReader.read(), so I did that for both the JDK DOM and my DOM:

Element docElm = doc.getDocumentElement();

// Import the tree to a new JDK DOM Document
Document newDoc = builder.newDocument();
Node newRoot = newDoc.importNode(docElm, true);
newDoc.appendChild(newRoot);

// Now import with css4j's DOM
DOMDocument cssdoc = new CSSDOMImplementation().createDocument("", null, null);
Node newCssRoot = cssdoc.importNode(docElm, true);
cssdoc.appendChild(newCssRoot);

Both imports are completed fast and flawlessly, so the same document isn't causing issues for other DOM implementations (and mine does recursion). The problem is most likely related to the weird way that DOM4J handles namespaces (NamespaceStack in this case).

There are other issues caused by the design of this library (see #79 or #114 which is also related to namespace handling), but even if someone took the time to fix them, that implies changing design decisions made 25 years ago and break backwards compatibility. To anyone hit by those issues, I'd suggest to look for more modern alternatives.

henryrneh changed the title ~~Stack overflow found by OSS-Fuzz in dom4j~~ Stack overflow found by OSS-Fuzz Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow found by OSS-Fuzz #158

Stack overflow found by OSS-Fuzz #158

henryrneh commented Feb 8, 2023 •

edited

FilipJirsak commented Feb 9, 2023

henryrneh commented Feb 14, 2023

henryrneh commented Feb 27, 2023

0roman commented Mar 21, 2023

FilipJirsak commented Mar 21, 2023

0roman commented Mar 21, 2023

FilipJirsak commented Mar 21, 2023

0roman commented Mar 21, 2023

FilipJirsak commented Mar 21, 2023

ecki commented Mar 21, 2023

carlosame commented Mar 31, 2023

Stack overflow found by OSS-Fuzz #158

Stack overflow found by OSS-Fuzz #158

Comments

henryrneh commented Feb 8, 2023 • edited

FilipJirsak commented Feb 9, 2023

henryrneh commented Feb 14, 2023

henryrneh commented Feb 27, 2023

0roman commented Mar 21, 2023

FilipJirsak commented Mar 21, 2023

0roman commented Mar 21, 2023

FilipJirsak commented Mar 21, 2023

0roman commented Mar 21, 2023

FilipJirsak commented Mar 21, 2023

ecki commented Mar 21, 2023

carlosame commented Mar 31, 2023

henryrneh commented Feb 8, 2023 •

edited