Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship between DiffBuilder and JAXPXPathEngine #255

Closed
wiedehoeft opened this issue Nov 25, 2022 · 10 comments
Closed

Relationship between DiffBuilder and JAXPXPathEngine #255

wiedehoeft opened this issue Nov 25, 2022 · 10 comments

Comments

@wiedehoeft
Copy link

I have currently trouble because of very slow comparisons with the DiffBuilder class.
Changing the used xpath engine could be helpful but I am struggling with the relationship between DiffBuilder and JAXPXPathEngine.
Into the second class I can pass a custom XPathFactory (like Saxon), but I do not see a possibility to propagate this to the DiffBuilder.
Am I missing something? What is the relationship between the two classes, or is there just no relationship?
Then I would need a different approach.

Thanks for help,
I am currently a little bit confused.

@bodewig
Copy link
Member

bodewig commented Nov 25, 2022

well, the DifferenceEngine itself doesn't use XPath at all, it just walks the DOM tree - and this is slow if your document is big.

If - and only if - you use something like ElementSelectors.byXpath then the XPath engine will be used. And right now this is not configurable at all: https://github.com/xmlunit/xmlunit/blob/main/xmlunit-core/src/main/java/org/xmlunit/diff/ElementSelectors.java#L401

@wiedehoeft
Copy link
Author

The things I know so far:

  1. The comparisons gets stucked in Diffbuilder.build(). The comparison needs sometimes hours for one document.
  2. Because of this the xmlunit-library was cloned into the project and at every position where the JAXPXPathEngine was used the constructor was changed to:
    new JAXPXPathEngine(new net.sf.saxon.xpath.XPathFactoryImpl.XPathFactoryImpl()). This improved performances drastically; from hours to minutes.

For using the official xmlunit version again one solution could be to make the JAXPXPathEngine configurable?

Or the DiffBuilder is misused at this place and a complete new solution should be found?

@bodewig
Copy link
Member

bodewig commented Nov 29, 2022

If you can tell me why/where your DiffBuilder uses new JAXPXPathEngine() at all - so far I can only guess it is ElementSelectors.byXpath you use - I can answer your questions :-)

Maybe you can provide a snippet of your code that creates and sets up the DiffBuilder masking element names and other stuff that may be sensitive.

If you are using ElementSelectors.byXpath then we can easily add an overload that allows an XPathFactory as additional argument and you'd be able to configure it.

Of course it is possible you are using byXpath even though there'd be an easier solution. ElementSelector is an interface that can be implemented via Java which may allow things to be sped up quite a bit at the price of writing some code.

@wiedehoeft
Copy link
Author

This is the class the comparison is executed:

package xxx

import java.util.Date;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.apache.commons.lang3.time.StopWatch;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xmlunit.builder.DiffBuilder;
import org.xmlunit.diff.DefaultNodeMatcher;
import org.xmlunit.diff.ElementSelector;
import org.xmlunit.diff.ElementSelectors;
import org.xmlunit.diff.NodeMatcher;


class XmlUnitComparer {

      private final XmlComparePropertiesProvider xmlComProps;
      private final NodeMatcher nodeMatcher;

      private static final Logger LOG = LoggerFactory.getLogger(XmlUnitComparer.class);

      public XmlUnitComparer(XmlComparePropertiesProvider xmlComProps) {
            this.xmlComProps = xmlComProps;
            this.nodeMatcher = new DefaultNodeMatcher(addSelectors(xmlComProps));
      }

      public CompareResult compare(XmlUnitInputProvider input) {
            LOG.debug("Comparing input:: {}", input);
            StopWatch stopWatch = StopWatch.createStarted();
            DiffBuilder builder = DiffBuilder.compare(input.getBaselineMessage());
            builder.withTest(input.getTestMessage());

            builder.withNodeMatcher(nodeMatcher);

            if (xmlComProps.getIgnoreWhitespace()) {
                  builder.ignoreWhitespace();
            }
            if (xmlComProps.getIgnoreComments()) {
                  builder.ignoreComments();
            }
            if (xmlComProps.getIgnoredTags() != null && xmlComProps.getIgnoredTags().length > 0) {
                  builder.withNodeFilter(new IgnoreTagsNodeFilter(xmlComProps.getIgnoredTags()));
            }

            XmlFieldInformationExtractor fieldInformationExtractor = new XmlFieldInformationExtractor(xmlComProps.getAdditionalErrorInformationProperties(), xmlComProps.getSimilarMapping());
            builder.withComparisonListeners(fieldInformationExtractor);

            builder.checkForIdentical();

            LOG.debug("Before build");

            builder.build();
            LOG.debug("After build");

            stopWatch.stop();

            LOG.debug("Started comparison at:: " + new Date(stopWatch.getStartTime()));
            LOG.debug("Finished comparison in:: " + stopWatch.getTime(TimeUnit.SECONDS) + " seconds.");
            LOG.debug("Finished comparison in:: " + stopWatch.getTime(TimeUnit.MINUTES) + " minutes.");

            List<XmlCompareDetailResult> allResults = fieldInformationExtractor.getDetailResults();
            allResults.stream().forEach(detailResult -> detailResult.setStatus(fieldInformationExtractor.getOverallCompareStatus()));

            return new CompareResult(allResults, fieldInformationExtractor.getOverallCompareStatus());
      }

      private static ElementSelector[] addSelectors(XmlComparePropertiesProvider xmlComProps) {
            ElementSelector[] groupMatching = GroupMatchingElementSelector.from(xmlComProps.getGroupMatchingProperties());
            if (groupMatching.length == 0)
                  return new ElementSelector[] { ElementSelectors.byName };

            ElementSelector[] ret = new ElementSelector[groupMatching.length + 1];
            for (int i = 0; i < groupMatching.length; i++) {
                  ret[i] = groupMatching[i];
            }
            ret[groupMatching.length] = ElementSelectors.byName;
            return ret;

      }

      static class CompareResult {
            private final List<XmlCompareDetailResult> detailResults;
            private final XmlCompareResultStatus status;

            public CompareResult(final List<XmlCompareDetailResult> detailResults, final XmlCompareResultStatus status) {
                  this.detailResults = detailResults;
                  this.status = status;
            }

            public List<XmlCompareDetailResult> getDetailResults() {
                  return detailResults;
            }

            public XmlCompareResultStatus getStatus() {
                  return status;
            }
      }
}

XmlUnitInputProvider is following:

package xxx

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;

import org.w3c.dom.Node;
import org.xmlunit.builder.Input;
import org.xmlunit.util.Convert;
import org.xmlunit.util.DocumentBuilderFactoryConfigurer;

/**
 * This class wraps any {@link XmlCompareInputProvider} for xml unit since we need {@link Node} elements to evaluate xpaths for additional informations later
 *
 * 
 */
public class XmlUnitInputProvider implements XmlCompareInputProvider<Node> {

      private Node baselineMessage;
      private Node testMessage;
      private DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactoryConfigurer.Default.configure(DocumentBuilderFactory.newInstance());

      public XmlUnitInputProvider(XmlCompareInputProvider<?> provider) throws XmlCompareInputException {
            try {
                  if (provider.getBaselineMessage() != null) {
                        baselineMessage = getNodeFromInput(provider.getBaselineMessage(), documentBuilderFactory);
                  }
            } catch (Exception e) {
                  throw new XmlCompareInputException("Error creating xml baseline source", e, InputType.BASELINE);
            }
            try {
                  if (provider.getTestMessage() != null) {
                        testMessage = getNodeFromInput(provider.getTestMessage(), documentBuilderFactory);
                  }
            } catch (Exception e) {
                  throw new XmlCompareInputException("Error creating xml test source", e, InputType.TEST);
            }
      }

      @Override
      public Node getBaselineMessage() {
            return baselineMessage;
      }

      @Override
      public Node getTestMessage() {
            return testMessage;
      }

      protected static <T> Node getNodeFromInput(T input, DocumentBuilderFactory documentBuilderFactoryIn) {
            DocumentBuilderFactory documentBuilderFactory;
            if (documentBuilderFactoryIn == null) {
                  documentBuilderFactory = DocumentBuilderFactoryConfigurer.Default.configure(DocumentBuilderFactory.newInstance());
                  documentBuilderFactory.setNamespaceAware(true);
            } else {
                  documentBuilderFactory = documentBuilderFactoryIn;
            }
            Source test = Input.from(input).build();
            return Convert.toNode(test, documentBuilderFactory);
      }
}

All xmlunit dependencies are resolved by custom copies of the projects.
The customized classes are:

  • org.xmlunit.diff.ElementSelectors (~ line 400)
  • org.xmlunit.matchers.EvaluateXPathMatcher (~line 137)
  • org.xmlunit.matchers.HasXPathMatcher (~ line 83)

The customization always is: JAXPXPathEngine engine = new JAXPXPathEngine(new XPathFactoryImpl());

The Saxon library used is:

    <dependency>
            <groupId>net.sf.saxon</groupId>
            <artifactId>Saxon-HE</artifactId>
            <version>9.9.1-3</version>
      </dependency>

And the class JAXPXPathEngine was changed to:

/**
 * Simplified access to JAXP's XPath API.
 */
public class JAXPXPathEngine implements XPathEngine {
    private final XPath xpath;

    public JAXPXPathEngine(XPathFactory fac) {
        try {
                  xpath = fac.newXPath();
        } catch (Exception e) {
            throw new ConfigurationException(e);
        }
    }

    /**
     * Create an XPathEngine that uses JAXP's default XPathFactory
     * under the covers.
     */
    public JAXPXPathEngine() {
            this(XPathFactory.newInstance());
    }

     public Iterable<Node> selectNodes(String xPath, Source s) {
            try {
                  Object nodes = xpath.evaluate(xPath, Convert.toInputSource(s),
                              XPathConstants.NODESET);
                  if (nodes instanceof NodeList) {
                        return new IterableNodeList((NodeList) nodes);
                  }
                  if (nodes instanceof Iterable) {
                        return (Iterable<Node>) nodes;
                  }
                  return null;
            } catch (XPathExpressionException ex) {
                  throw new XMLUnitException(ex);
            }
    }

    public String evaluate(String xPath, Source s) {
        try {
            return xpath.evaluate(xPath, Convert.toInputSource(s));
        } catch (XPathExpressionException ex) {
            throw new XMLUnitException(ex);
        }
    }

    public Iterable<Node> selectNodes(String xPath, Node n) {
            try {
                  Object nodes = xpath.evaluate(xPath, n, XPathConstants.NODESET);
                  if (nodes instanceof NodeList) {
                        return new IterableNodeList((NodeList) nodes);
                  }
                  if (nodes instanceof Iterable) {
                        return (Iterable) nodes;
                  }
                  return null;
            } catch (XPathExpressionException ex) {
                  throw new XMLUnitException(ex);
            }
    }

    public String evaluate(String xPath, Node n) {
        try {
            return xpath.evaluate(xPath, n);
        } catch (XPathExpressionException ex) {
            throw new XMLUnitException(ex);
        }
    }

    public void setNamespaceContext(Map<String, String> prefix2Uri) {
        xpath.setNamespaceContext(Convert.toNamespaceContext(prefix2Uri));
    }

}

Thanks a lot for your help up to here.

@bodewig
Copy link
Member

bodewig commented Dec 5, 2022

GroupMatchingElementSelector most likely is what I'd need to see (it is code not provided by XMLUnit) - this one probably uses the XPathEngine in some way. If your project forks XMLUnit anyway it would probably be easier to write that against against the XPath engine you want to use directly.

In your code I do not see any usage of any Hamcrest matcher at all, so I'm not sure how/where EvaluateXPathMatcher or HasXPathMatcher are involved. Both of them contain withXPathFactory methods that would allow you to use a custom XPathFactory in order to use Saxon right out of the box.

The JAXPXPathEngine code you show is not very different from the one in XMLUnit. I must admit I don't understand how if (nodes instanceof NodeList) can ever be false, though.

I will add new overloads to ElementSelectors.byXpath that accept an XPathEngine parameter as this is clearly missing right now. Without knowing what GroupMatchingElementSelector may look like I can only speculate. It may be quite possible that you'd get a bigger performance boost by not using XPath at all rather than by switching the XPath implementation, though.

@bodewig
Copy link
Member

bodewig commented Dec 5, 2022

I will add new overloads to ElementSelectors.byXpath

added with a6bb601

@bodewig bodewig added this to the 2.9.1 milestone Dec 31, 2022
@bodewig
Copy link
Member

bodewig commented Dec 31, 2022

the changed byXpath signatures are now available as SNAPSHOTs. Unless I get any bad feedback I'm going to cut a new release early next year and consider this issue done.

@wiedehoeft
Copy link
Author

the changed byXpath signatures are now available as SNAPSHOTs. Unless I get any bad feedback I'm going to cut a new release early next year and consider this issue done.

I would like to test it, but it will be a litte more complicated to use a snapshot version. I will check if I find a possibility in the company I am currently working for and give feedback during next week.

Thank you for fast answers and solutions!

@wiedehoeft
Copy link
Author

I was able to test the new SNAPSHOT-version and the result was great!

I passed the saxon XPathFactoryImpl to ElementSelector.byXPath(...) and a customized version of the JAXPXPathEngine:

import javax.xml.transform.Source;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import org.xmlunit.ConfigurationException;
import org.xmlunit.XMLUnitException;
import org.xmlunit.util.Convert;
import org.xmlunit.util.IterableNodeList;
import org.xmlunit.xpath.JAXPXPathEngine;

/**
* Override two methods, for performance improvements.
*/
public class CustomJAXPXPathEngine extends JAXPXPathEngine {

     private final XPath xpath;

     public CustomJAXPXPathEngine(final XPathFactory fac) {
         try {
              xpath = fac.newXPath();
         } catch (final Exception e) {
              throw new ConfigurationException(e);
         }
     }

     @Override
     public Iterable<Node> selectNodes(final String xPath, final Source s) {
         try {
              final Object nodes = xpath.evaluate(xPath, Convert.toInputSource(s),
                        XPathConstants.NODESET);
              if (nodes instanceof NodeList) {
                   return new IterableNodeList((NodeList) nodes);
              }

              if (nodes instanceof Iterable) {
                   return (Iterable<Node>) nodes;
              }
              return null;
         } catch (final XPathExpressionException ex) {
              throw new XMLUnitException(ex);
         }
     }

     @Override
     public Iterable<Node> selectNodes(final String xPath, final Node n) {
         try {
              final Object nodes = xpath.evaluate(xPath, n, XPathConstants.NODESET);

              if (nodes instanceof NodeList) {
                   return new IterableNodeList((NodeList) nodes);
              }

              if (nodes instanceof Iterable) {
                   return (Iterable) nodes;
              }

              return null;

         } catch (final XPathExpressionException ex) {
              throw new XMLUnitException(ex);
         }
     }
}

Now it is as fast as our customized version and with the new release we are able to use xmlunit as is again :-)

@bodewig
Copy link
Member

bodewig commented Jan 10, 2023

Thank you for verifying. I've just published 2.9.1.

@bodewig bodewig closed this as completed Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants