Skip to content
Stefan Bodewig edited this page Jun 29, 2018 · 14 revisions

General

The public API of the core libraries will be very similar between the Java and .NET implementations. At the same time they will use the established idioms of the respective platform. Interfaces are prefixed by I in the .NET version but not the Java version, delegates replace single-method interfaces, properties replace getter/setter pairs and events replace addXYZListener methods.

In order to make the implemented algorithms similar, implementation differences between Java's and .NET's XML stacks are hidden behind helper methods in the org.xmlunit.util package and the Org.XmlUnit.Util namespace.

Specifying Input

The Java implementation of the validation, XSLT and XPath parts are based on JAXP packages that all use javax.xml.transform.Source to specify XML documents or schema definitions. This makes Source the natural choice for an unified input type to all parts of XMLUnit.

For .NET there is no such common type, most of the class library implementations are based on reader abstractions. javax.xml.transform.Source only specifies a getter/setter pair for a system-Id, for .NET Org.XmlUnit.ISource extends this with a read-only property for an XmlReader. The Org.XmlUnit.Input namespace provides ISource implementations similar to the Source implementations of the Java class library.

XMLUnit for Java 1.x had a few static properties that controlled how the documents are to be interpreted - whether whitespace or comments are significant. Rather than repeating a similar design these options are available as decorators for Source in XMLUnit 2.x - CommentLessSource uses XSLT to strip comments from an arbitrary (I)Source, for example. This way new "interpretations" can be added as new classes without touching the whole library - at the same time they are now available for all parts of XMLUnit, not restricted to comparisons.

XSLT Convenience Layer

The Transformation class in org.xmlunit.transform provides a thin layer over TraX or System.Xml.Xsl respectively and really only exists to support specifying results of XSLT transformations as inputs. It is not supposed to be a general purpose XSLT API but will remain tailored to XMLUnit's needs.

Evaluating XPath Expressions

The XPathEngine interface is minimal as it is expected that more advanced features like expecting the outcome to correspond to a certain regular expression can better be implemented by matchers or constraints. In fact Hamcrest's StringMatcher and NUnit's StringAssert should be able to go a long way for testing.

Discussion

Should there be support to evaluate an XPath as qualified name the way XMLUnit for Java's assertXpathEvaluatesTo with a QualifiedName argument does or is this better implemented in a matcher on top of selectNodes?

Validating Documents and Schemas

The primary target for validation support is XML Schema and it is well covered by JAXP and System.Xml.Schema. Still the API shall be open to be extended to other "schema" languages. The schema languages are identified by strings rather than enums as string values are needed at least for JAXP anyway and .NET enums are not very convenient if you want to do more than just enumerate a set of values.

The Java version supports DTDs via a validating SAXParser and any schema language JAXP supports - this includes Relax NG's XML syntax if all the required libraries are available. The .NET version supports DTD and the deprecated XDR - at least as long as the .NET Framework version still supports XDR.

Discussion

Should there be a way to register additional languages and custom validators for them?

The DifferenceEngine

The core DifferenceEngine only performs comparisons and provides hooks for all kinds of decision making. It must not modify inputs, ignore contents or stop the comparison by itself, this is the job of interface implementations the user can select. The compare method drives a single comparison of two inputs and provides information about the atomic comparisons it performs to the registered listeners.

Many of the "hooks" or "helpers" are similar to what XMLUnit for Java 1.x provided, some have changed their name or the method signatures of the interface.

The 1.x DifferenceListener interface had two responsibilities, recording differences found and determining the severity of a difference - it is never informed of comparisons the DifferenceEngine considered equal, this is the job of MatchTracker, which can not alter the comparisons outcome. Whether the comparison as a whole should continue or not is decided by a ComparisionController.

In 2.x the responsibilities are distributed differently. DifferenceEvaluator is responsible for determining the severity of all comparisons, even those that seem to be equal. The ComparisonListener is notified of any kind of comparison and it is possible to selectively subscribe to comparisons whose outcome is equal, different or to all comparisons. Comparisoncontroller can halt the comparision as a whole, only its interface has changes when compared to XMLUnit for Java 1.x.

The DifferenceEvaluators class contains few implementations of DifferenceEvaluator including the Default implementation which uses similar rules to XMLUnit for Java 1.x's DifferenceEngine.

The ComparisonControllers class contains two trivial implementations, Default which behaves like DetailedDiff in 1.x and StopWhenDifferent which behaves like Diff.

In order to deal with documents that differ in the order of child nodes of a given parent XMLUnit for Java 1.x allowed the algorithm that identified which child element of the test document with which one of the child element to be overridden by a custom ElementQualifier implementations. There was no way to influence the selection of pairs for non-element children like comments.

XMLUnit 2.x uses the NodeMatcher interface which is more general. Its default implementation DefaultNodeMatcher performs matching similar to XMLUnit for Java 1.x where ElementSelector replaces ElementQualifier. The additional NodeTypeMatcher interface allows nodes of different types to be compared with each other - the default implementation allows CDATA sections and text nodes to be compared with each other.

XMLUnit 2.x will always ignore the order of attributes as the order is irrelevant according to the standard and XML parsers are free to modify the order as they see fit anyway. The default DifferenceEvaluator considers text nodes and CDATA sections similar as CDATA sections are really only serialization artifacts.

Discussion

DOMDifferenceEngine is a long class and the fact it recurses into the structure makes it difficult to stop the comparison at a arbitrary level. The XMLUnit for Java 1.x version used exception for control flow, which didn't feel right. The first 2.x implementations were littered with

result = someComparison();
if (result == CRITICAL) {
    return CRITICAL;
}
result = nextComparison();
if (result == CRITICAL) {
    return CRITICAL;
}
result = ...

(at that time DifferenceEvaluator was responsible for stopping the comparison process with a special ComparisonResult)

which wasn't any better either. We even used code generation at one point, but it was ugly as well. Right now comparisons are chained in construct that perform a certain deferred comparison only if the ComparisonController didn't signal to stop the whole comparison process. This doesn't really look pretty in Java without lambdas either. It would be good to find a nicer approach.

Fluent Builders

All parts of XMLUnit described so far provide traditional APIs that may be cumbersome to use in certain context like when formulating unit tests. This is particularly true when configuring a DifferenceEngine with various options and perform a comparison.

Builders using a fluent style are provided to create Sources from various inputs or perform XSLT transformations. There will be a builder for comparing XML documents and probably a related builder that helps configuring the node matching algorithms.