Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't match 2 xml with different node order and element order in nodes #123

Closed
DjerohN opened this issue May 31, 2018 · 8 comments
Closed
Labels

Comments

@DjerohN
Copy link

DjerohN commented May 31, 2018

Hello!

I'm trying to match 2 xml responses, test and actual, with different node order and different element order in nodes.

Example:

Expected XML: 
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <ResponseHeader>
    <someHeaderElement>value</someHeaderElement>
    <someHeaderElement1>value1</someHeaderElement1>
    <someHeaderElemen2t>value2</someHeaderElement2>
    <someHeaderElement3>value3</someHeaderElement3>
    <someHeaderElement4>value4</someHeaderElement4>
  </ResponseHeader>
  <ResponseBody>
    <Node>
      <element1>nodeValue</element1>
      <element2>nodeValue1</element2>
      <element3>nodeValue2</element3>
    </Node>
    <Node>
      <element1>nodeValue3</element1>
      <element2>nodeValue4</element2>
      <element3>nodeValue5</element3>
    </Node>
  </ResponseBody>
</Response>
Actual XML: 
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <ResponseHeader>
    <someHeaderElement>value</someHeaderElement>
    <someHeaderElement1>value1</someHeaderElement1>
    <someHeaderElemen2t>value2</someHeaderElement2>
    <someHeaderElement3>value3</someHeaderElement3>
    <someHeaderElement4>value4</someHeaderElement4>
  </ResponseHeader>
  <ResponseBody>
    <Node>
      <element2>nodeValue4</element2>
      <element3>nodeValue5</element3>
      <element1>nodeValue3</element1>
    </Node>
    <Node>
      <element2>nodeValue1</element2>
      <element3>nodeValue2</element3>
      <element1>nodeValue</element1>
    </Node>
  </ResponseBody>
</Response>

Here is my code I'm trying to match with:

public static void main(String[] args) {
    Diff myDiff = DiffBuilder
        .compare(actual)
        .withTest(expected)
        .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
        .checkForSimilar()
        .ignoreComments()
        .ignoreWhitespace()
        .build();
    StringBuilder message = new StringBuilder();
    Iterable<Difference> differences = myDiff.getDifferences();
    for (Difference difference : differences) {
      message.append(
          difference.toString()
              .replace("but was", "\nbut was ")
              .replace(" - comparing ", "\ncomparing ")
              .replace(" to ", "\nto ")
              + "\n\n\n"
      );
    }
    System.out.println(message.toString());
  }

As a result I get many differences between two xml

Expected child 'element1' 
but was  'null'
comparing <element1...> at /Response[1]/ResponseBody[1]/Node[1]/element1[1]
to <NULL> (DIFFERENT)


Expected child 'element2' 
but was  'null'
comparing <element2...> at /Response[1]/ResponseBody[1]/Node[1]/element2[1]
to <NULL> (DIFFERENT)

Please tell me what I'm doing wrong.

@bodewig
Copy link
Member

bodewig commented May 31, 2018

You are not telling XMLUnit which Node element to select. byNameAndText would help if there were multiple siblings with the same name that can be identified by their nested text - this is not the case in your example. For the Nodes it seems to be necessary to look into the nested text of a child element instead.

The correct solution really depends on what your real XML uses to identify the Node. Is it the text nested into the element1 child? If so, something like

        .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.conditionalBuilder()
            ..whenElementIsNamed("Node").thenUse(ElementSelectors.byXPath("./element1", ElementSelectors.byNameAndText))
            .elseUse(ElementSelectors.byName)
            .build()))

should work.

@DjerohN
Copy link
Author

DjerohN commented Jun 1, 2018

Thank you for your fast response and explanation. Your code works in this particular example.
The global problem is that I have many different XMLs with complex structure (Nodes in nodes in nodes etc. with elements within) and random node and element order. So, is there more general solution for matching those XMLs ignoring order, because writing steps for each case will be really painful.

@bodewig
Copy link
Member

bodewig commented Jun 1, 2018

Technically one could just try all permutations and use the one "that works". There is no built-in solution that would do that, yet. This is what #45 is about. One thing to take note of is this is going to be very inefficient, even more so on big documents - which at the same time would be the documents that would benefit from such an approach the most.

In the absence of a brute force try all paths approach XMLUnit needs you help. It cannot know what is expected to provide the "identifier" for any given XML element. This is why you'd need to spell out all the "identifying logic" explicitly right now.

At least I don't see any middle ground between "try all" and "configure explicitly". I'd be happy if anybody can provide a better idea.

@bodewig bodewig closed this as completed Jun 9, 2018
@Nihilum
Copy link

Nihilum commented Jul 2, 2018

If I may suggest a workaround for now, until #45 gets implemented. You could add a custom DifferenceEvaluator and then perform a simple-minded brute force check by comparing all parent nodes in the control tree against the nodes in the actual tree. Something like this did the trick for me:

Diff differences = DiffBuilder.compare(control).withTest(actual) .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName)) .withDifferenceEvaluator((comparison, outcome) -> { if (outcome == ComparisonResult.DIFFERENT) { outcome = ascendingBruteForceParentNodesComparison(comparison, outcome); } return outcome; }) .checkForSimilar() .build();

And the comparison itself:

ComparisonResult ascendingBruteForceParentNodesComparison(Comparison comparison, ComparisonResult outcome) {
        LOGGER.info("Using BruteForce to test difference: " + comparison.toString());


        List<Node> parentTestNodes = constructParentNodesHierarchy(comparison.getTestDetails().getTarget());

        Node controlNode = comparison.getControlDetails().getTarget();

        for (Node parentTestNode : parentTestNodes) {
            Node foundNode = findEqualNodeInHierarchy(controlNode, parentTestNode);

            if (foundNode != null && areNodesAtTheSameHierarchyLevel(controlNode, foundNode)) {
                LOGGER.info("Found testNode '" + foundNode.getNodeName() + "' at a different index." +
                        " Overriding ComparisonResult to SIMILAR.");
                return ComparisonResult.SIMILAR;
            }
        }

        LOGGER.info("Unable to find testNode '" + controlNode.getNodeName()
                + "'. Leaving previous ComparisonResult.");

        return outcome;
    }

    boolean areNodesAtTheSameHierarchyLevel(Node controlNode, Node testNode) {
        List<Node> parentControlNodes = constructParentNodesHierarchy(controlNode);
        List<Node> parentTestNodes = constructParentNodesHierarchy(testNode);

        if (parentControlNodes.size() != parentTestNodes.size()) {
            LOGGER.warn("Size of parent test nodes: " + parentTestNodes.size() +
                    ", size of parent control nodes: " + parentControlNodes.size()
                    + ". XML files are considered different because of different target nodes placement in the hierarchy.");
            return false;
        }

        for (int i = 0; i < parentControlNodes.size(); ++i) {
            Node parentControlNode = parentControlNodes.get(i);
            Node parentTestNode = parentTestNodes.get(i);

            if (!parentControlNode.getNodeName().equals(parentTestNode.getNodeName())) {
                LOGGER.warn("Parent control node '" + parentControlNode.getNodeName()
                        + "' at URI: " + parentControlNode.getBaseURI() + " is different than parent test node '"
                        + parentTestNode.getNodeName() + "' at URI: " + parentTestNode.getBaseURI()
                        + ". Comparison failed.");
                return false;
            }
        }

        return true;
    }

    Node findEqualNodeInHierarchy(Node controlNode, Node testNode) {
        if (controlNode.isEqualNode(testNode)) {
            return testNode;
        }

        NodeList children = testNode.getChildNodes();

        if (children == null) {
            return null;
        }

        for (int i = 0; i < children.getLength(); ++i) {
            Node childNode = children.item(i);

            Node foundNode = findEqualNodeInHierarchy(controlNode, childNode);

            if (foundNode != null) {
                return foundNode;
            }
        }

        return null;
    }

    List<Node> constructParentNodesHierarchy(Node node) {
        List<Node> parentNodes = new ArrayList<>();

        Node localParentNode = node.getParentNode();

        while (localParentNode != null) {
            parentNodes.add(localParentNode);
            localParentNode = localParentNode.getParentNode();
        }

        return parentNodes;
    }

@bodewig
Copy link
Member

bodewig commented Jul 10, 2018

Thank you for sharing @Nihilum. This uses Node.isEqualNode which is probably good enough in most cases.

What is making #45 more complicated is that you want to take NodeFilters and DifferenceEvaluators and likely ElementSelectors into account as well. Also this implementation might match the same test node to more than one control node if I understand it correctly, something which the DifferenceEngine wouldn't allow.

@DjerohN
Copy link
Author

DjerohN commented Aug 28, 2018

Thank you so much, @Nihilum. I tried your solution. The journey begins when you have situation like:

Expected XML: 
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <ResponseHeader>
    <someHeaderElement>value</someHeaderElement>
    <someHeaderElement1>value1</someHeaderElement1>
    <someHeaderElemen2t>value2</someHeaderElement2>
    <someHeaderElement3>value3</someHeaderElement3>
    <someHeaderElement4>value4</someHeaderElement4>
  </ResponseHeader>
  <ResponseBody>
    <Node>
      <element1>nodeValue</element1>
      <element2>nodeValue1</element2>
      <element3>nodeValue2</element3>
    </Node>
    <Node>
      <element1>nodeValue3</element1>
      <element3>nodeValue5</element3>
    </Node>
  </ResponseBody>
</Response>
Actual XML: 
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <ResponseHeader>
    <someHeaderElement>value</someHeaderElement>
    <someHeaderElement1>value1</someHeaderElement1>
    <someHeaderElemen2t>value2</someHeaderElement2>
    <someHeaderElement3>value3</someHeaderElement3>
    <someHeaderElement4>value4</someHeaderElement4>
  </ResponseHeader>
  <ResponseBody>
    <Node>
      <element3>nodeValue5</element3>
      <element1>nodeValue3</element1>
    </Node>
    <Node>
      <element2>nodeValue1</element2>
      <element3>nodeValue2</element3>
      <element1>nodeValue</element1>
    </Node>
  </ResponseBody>
</Response>

The main difference is that one Node doesn't have optional element2 and the order is mixed. I catch NPE in method constructParentNodesHierarchy, line Node localParentNode = node.getParentNode(); It can't get parent Node out of null :(

@mohannune
Copy link

Hi, I am trying to Compare Xmls using XMLUnit. I have 5 nodes in control Xml and I removed the node 3 in test Xml. The comparision results are happening differently like the node 3 in control Xml is compared with node4 in text Xml. Can any body suggest the correct code to compare the exact node.

Here is my Java Code:

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Iterator;
import java.util.LinkedHashSet;

import org.xmlunit.builder.DiffBuilder;
import org.xmlunit.diff.DefaultNodeMatcher;
import org.xmlunit.diff.Diff;
import org.xmlunit.diff.Difference;
import org.xmlunit.diff.ElementSelectors;

public class XMLDifference {
public static void main(String[] args) {
File actual = new File("D:\workspace\XMLUnitCompare\src\source.xml");
File expected = new File("D:\workspace\XMLUnitCompare\src\target.xml");
String control = null;
String test = null;
try {
control = new String(Files.readAllBytes(Paths.get(actual.toString())));
test = new String(Files.readAllBytes(Paths.get(expected.toString())));
} catch (IOException e) {
e.printStackTrace();
}

Diff documentDiff = DiffBuilder.compare(control).ignoreWhitespace().ignoreComments().normalizeWhitespace().withTest(test).withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))

.checkForSimilar().build();

int i = 0;
Iterator d = documentDiff.getDifferences().iterator();
LinkedHashSet set = new LinkedHashSet();
if (documentDiff.hasDifferences()) {
while(d.hasNext()) {
System.out.println((++i) + ") " + d.next());
}
}
}
}

Here are my Xmls.

control XMl :

Gambardella, Matthew <title>XML Developer's Guide</title> Computer 44.95 2000-10-01 An in-depth look at creating applications with XML. Ralls, Kim <title>Midnight Rain</title> Fantasy 5.95 2000-12-16 A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world. Corets, Eva <title>Maeve Ascendant</title> Fantasy 5.95 2000-11-17 After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society. Corets, Eva <title>Oberon's Legacy</title> Fantasy 5.95 2001-03-10 In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.

Test Xml 👍

Gambardella, Matthew <title>XML Developer's Guide</title> Computer 44.95 2000-10-01 An in-depth look at creating applications with XML. Ralls, Kim <title>Midnight Rain</title> Fantasy 5.95 2000-12-16 A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world. Corets, Eva <title>Oberon's Legacy</title> Fantasy 5.95 2001-03-10 In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant. Corets, Eva <title>The Sundered Grail</title> Fantasy 5.95 2001-09-10 The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.

Result 👍

  1. Expected attribute value 'bk103' but was 'bk104' - comparing <book name="bk103"...> at /catalog[1]/book[3]/@name to <book name="bk104"...> at /catalog[1]/book[3]/@name (DIFFERENT)
  2. Expected child 'title' but was 'null' - comparing <title...> at /catalog[1]/book[3]/title[1] to (DIFFERENT)
  3. Expected child 'publish_date' but was 'null' - comparing <publish_date...> at /catalog[1]/book[3]/publish_date[1] to (DIFFERENT)
  4. Expected child 'description' but was 'null' - comparing <description...> at /catalog[1]/book[3]/description[1] to (DIFFERENT)
  5. Expected child 'null' but was 'title' - comparing to <title...> at /catalog[1]/book[3]/title[1] (DIFFERENT)
  6. Expected child 'null' but was 'publish_date' - comparing to <publish_date...> at /catalog[1]/book[3]/publish_date[1] (DIFFERENT)
  7. Expected child 'null' but was 'description' - comparing to <description...> at /catalog[1]/book[3]/description[1] (DIFFERENT)
  8. Expected attribute value 'bk104' but was 'bk105' - comparing <book name="bk104"...> at /catalog[1]/book[4]/@name to <book name="bk105"...> at /catalog[1]/book[4]/@name (DIFFERENT)
  9. Expected child 'title' but was 'null' - comparing <title...> at /catalog[1]/book[4]/title[1] to (DIFFERENT)
  10. Expected child 'publish_date' but was 'null' - comparing <publish_date...> at /catalog[1]/book[4]/publish_date[1] to (DIFFERENT)
  11. Expected child 'description' but was 'null' - comparing <description...> at /catalog[1]/book[4]/description[1] to (DIFFERENT)
  12. Expected child 'null' but was 'title' - comparing to <title...> at /catalog[1]/book[4]/title[1] (DIFFERENT)
  13. Expected child 'null' but was 'publish_date' - comparing to <publish_date...> at /catalog[1]/book[4]/publish_date[1] (DIFFERENT)
  14. Expected child 'null' but was 'description' - comparing to <description...> at /catalog[1]/book[4]/description[1] (DIFFERENT)

@bodewig
Copy link
Member

bodewig commented Dec 30, 2019

@mohannune your comment i not related to this issue, please create a new issue if you've got a new question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants