Skip to content

A simple, fast asynchronous XML framework for parsing large XML files.

License

Notifications You must be signed in to change notification settings

scriptkittie/saxPath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

saxPath

Build Status

Overview

saxPath is an rule based SAX parser built for processing large XML files. saxPath performs the same functions as a typical SAX parser, while maintaining low memory usage for files of all sizes by only invoking callbacks for specific XML user determined XPATH expressions, therefore avoiding large heap growth for larger sized files. Once the parser reaches an XML path that matches an XPATH expression, it will load the XML only for that sub-tree that was matched. Text Content is only parsed in sub-trees, and not processed if a sub-tree is not being processed. You can choose to process each rule asynchronously, with the sacrifice of guaranteed order for each rule processed. Due to the nature this parsing and to maintain a low level of memory usage, only XPATH attributes and elements are able to be triggered, XPATH expressions such as the examples below will not work to trigger XPATH rules. However, all XPATH expressions will work in sub-tree's.

//text()[. = 'Text Content']

Or

//text()[contains(.,'Text Content that is not parsed')]

Expressions such the below will work:

//title[@lang]
/bookstore/book[price>35.00]
/bookstore/book[@test='title']/test123
/bookstore/book[@test='title'][@test2]/bookings

Once an XPATH expression has been matched, it is loaded into a special class called a SaxNode, which is a wrapper for the node with helper methods to query deeper into the subtree.

Changelog

Open spoiler to view changelog

1.0.0

  • Initial release.

Installation

Install from Maven Central

Just add the following dependencies to your maven pom.xml

<dependency>
    <groupId>io.laniakia</groupId>
    <artifactId>saxPath</artifactId>
    <version>1.0.0</version>
</dependency>

Example Usage

All rules have to have an XPATH expression specified to be triggered when the SAX parser reaches that path in the XML.

Create a Rule

Below is the skeleton of a basic Rule

import io.laniakia.rule.Rule;
...
public class RuleImpl extends Rule
{
	public static boolean processed = false;
	
	public RuleImpl(String path) throws Exception 
	{
		super(path);
	}

	@Override
	public void processRule(SaxNode saxNode) throws Exception 
	{
		//Do things here
	}
}

Setup for Rule Usage

String testXML = "<bookstore>ExcludeText<test>test</test><book>testingText<testing123></testing123></book></bookstore>";
XMLProcessor test = new XMLProcessor();
//Your XPATH expression to trigger this rule on goes here
test.addRule(new RuleImpl("/bookstore/book"));
test.process(new ByteArrayInputStream(testXML.getBytes(StandardCharsets.UTF_8)));

Process Rules Asynchronously

import io.laniakia.rule.Rule;
...
test.process(new ByteArrayInputStream(testXML.getBytes(StandardCharsets.UTF_8)), true);

Query SaxNode with XPATH Expression

import io.laniakia.rule.Rule;
...
@Override
public void processRule(SaxNode saxNode) throws Exception 
{
	SaxNode resultNode = saxNode.xPathQueryNode("/location/subTree");
}

Query SaxNode LIST with XPATH Expression

import io.laniakia.rule.Rule;
...
@Override
public void processRule(SaxNode saxNode) throws Exception 
{
	List<SaxNode> resultNode = saxNode.xPathQueryNodeList("/location/subTree");
}

Query Text of SaxNode

This method of extracting text uses getTextContent() of the resulting Node

import io.laniakia.rule.Rule;
...
@Override
public void processRule(SaxNode saxNode) throws Exception 
{
	String text = saxNode.getAllNodeText("/textpath");
}

Query Text of SaxNode V2

This method of extracting text uses the XPath method of extracting Node text, rather than getting the Node value of each node.

import io.laniakia.rule.Rule;
...
@Override
public void processRule(SaxNode saxNode) throws Exception 
{
	String text = saxNode.getXPathNodeText("/textpath");
}

Install from Source

Clone from remote repository then mvn install. All of the modules will be installed to your local maven repository.

git clone https://github.com/scriptkittie/saxPath.git
cd saxPath
mvn install

Issues/Forks

Please report any issues to the issues section & as always if you have any functionality requests go ahead and open an issue containing your suggestions.

If you have an addition to the project, fork it and submit a pull request. Any type of contributions are welcome.

Credits

Package written by StCypher

About

A simple, fast asynchronous XML framework for parsing large XML files.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages