Skip to content

Bring ICU's transliteration and normalization to XPath

License

Notifications You must be signed in to change notification settings

SCDH/icu-xpath-bindings

Repository files navigation

ICU XPath Bindings

This project provides XPath bindings of the ICU library for processing common Unicode tasks. It's based on the ICU library for Java (ICU4J) and can be used in the Saxon XSLT/XQuery processor.

The bindings only use a small set of the ICU library. Other parts may be added in future, if they are needed. XPath functions for the following tasks are provided:

  • normalization
  • transliteration

XPath Functions

The namespace name of the XPath extension functions is https://unicode-org.github.io/icu/. In this documentation, we are using the prefix icu bound to this namespace: xmlns:icu="https://unicode-org.github.io/icu/".

Getting started

For getting started have a look at the example sections in the transliteration and normalization documentation.

Installation

oXygen XML Editor

Installation for the oXygen XML editor is very simple. You only have to provide the following URL to the installation dialog from Help -> Install new add-ons...:

https://scdh.github.io/icu-xpath-bindings/descriptor.xml

Note: As we don't have a key for signing the extension, we will have to proceed anyway at some stage of the installation process.

After the installation, you can use the new XPath function everywhere in oXygen. You don't need to clone this repo.

Usage with Saxon's command line interface

tl;dr: Run mvn package and use the xslt.sh or saxon.sh shell wrappers with the option -config:saxon-config.xml.

Two things are necessary:

  1. Tell Saxon that there are XPath functions. This can be done via a Saxon configuration file. Such a configuration is in saxon-config.xml. You can use it from the Saxon command line interface via the argument -config:saxon-config.xml.

  2. Provide a jar file to the classpath, so that the Java classes that define the functions are available to Saxon. On the releases page, you can find jar files for each release. Use icu-xpath-bindings-VERSION-with-dependencies.jar or icu-xpath-bindings-VERSION.jar. The former has everything but Saxon packed into it. If using the latter one, dependency packages like ICU4J also have to be included into the classpath:

  • icu4j
  • icu4j-charset
  • icu4j-localespi
  • slf4j-api

You can get the dependency jar files manually through Maven Central or you can clone this git repository and run the Maven build process, which downloads and builds everything for you automatically:

mvn package

After you have run mvn package all the required jar files are present within the project:

  • bindings/target/icu-xpath-bindings-VERSION.jar
  • bindings/target/lib/icu4j-VERSION.jar
  • bindings/target/lib/icu4j-charset-VERSION.jar
  • bindings/target/lib/icu4j-localespi-VERSION.jar
  • bindings/target/lib/slf4j-api-VERSION.jar

For convenience, after running mvn package there will also be the shell scripts xslt.sh and saxon.sh in the repo's root folder. It's a shell wrapper around Saxon that sets the classpath correctly.

Java

When using Java, you should also have a look at the IcuXPathFunctionRegistry.register(Processor). Moreover, the classes with the function definition are registered for loading through the SPI.

Building locally

You can build and test the project locally. You can also install the oxygen plugin from a local build. Therefore, run

mvn -Drelease.url="" package

Then, you can provide the descriptor file under oxygen/target/descriptor.xml to the oxygen extension installation dialog.

Further Reading

License

MIT License

Copyright (c) 2023 SCDH, Westfälische Wilhelms-Universität Münster