Skip to content
This repository has been archived by the owner on Jun 6, 2019. It is now read-only.

JATS4R/validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JATS4R Validator

This is the code repository for the JATS4R client-side validator, deployed to http://jats4r.org/validator/.

This was the subject of a paper presented at Balisage, 2015: A client-side JATS4R validator using Saxon-CE.

Prerequisites:

  • Python - at least version 3.5
  • The instructions here assume you're working on a Unix environment with the bash shell; but should be easily adaptible to other environments.

Contents

Files and directories

The following are the directories and files in this repository, and what they are for.

  • assets/ - static resources
  • bin/ - shell scripts and XSLT files
  • jats/ - scripts to download and prepare NLM and JATS schema definition files.
  • samples/ - sample XML documents
  • schema/ - Schematron source files; see schema sources, below.
  • test/test-files/ - XML files used for testing.
  • LICENSE
  • README.md - This file
  • index.html, validate.js, validate.css - the main validator files

The following directories are generated in the course of building the validator.

FIXME double-check these

  • venv - Python virtual environment
  • lib - Third party libraries and tools. See Dependencies, below.
  • jats-schema - Flattened versions of all of the NLM and JATS DTDs
  • generated-xsl - This contains XSLT versions of the Schematron files. The contents here should not be edited directly. See Generating XSLTs from Schematron sources, below

Quick start

Here are the steps to get a working validator on your system.

The validator is deployed as a static web site, so you'll need to have access to a system with a web server such as Apache. Find a convenient location served by that server, and execute the following:

git clone --recursive https://github.com/JATS4R/validator.git
cd validator

# Initializes python's virtualenv; sets bash environment
. bin/setenv.sh

# extracts libraries, etc., and processes schematron
bin/setup.sh

Then, open the index.html page in your browser, through the web server on your system, and you should have a working validator.

Validation setup

Here is some more detailed information.

Whenever you open a new shell to work on this tool, configure its environment with:

. bin/setenv.sh

After initially cloning the repository, in order to configure the necessary tools, you'll need to run

bin/setup.sh

This does the following:

  1. Extracts several third party libraries into the lib directory,
  2. Builds flattened versions of the JATS DTDs and RNGs, writing them into the jats-schema directory, and
  3. Processes the schematron files, writing the results into generated-xsl.

If any changes are made to the Schematron files, you can rebuild them, without rerunning the entire setup, by running

bin/process-schematron.sh

To clean up the working directory, and start from scratch, just run

bin/clean.sh

Validating from the command line

To validate a JATS file named sample.xml, use the script validate.sh. For example,

validate.sh samples/minimal.xml

This will give a report for compliance of the input file minimal.xml with respect to all topics (math and permissions). By default, it only reports errors. If you want a full report (info, warnings, and errors) then enter:

validate.sh samples/minimal.xml info

Use the -h switch to get a list of all the possible arguments.

If your setup requires an OASIS catalog file to resolve the DTDs for JATS documents, you can use the environment variable JATS_CATALOG to point to that.

For example, you can download the JATS Bundle to get all of the DTDs for several versions and flavors of JATS (up to NISO JATS draft version 0.4), and unzip it, and then set the JATS_CATALOG environment variable to point to the master catalog from that:

cd ~
wget http://jatspan.org/downloads/jats-core-bundle-0.8.zip
unzip jats-core-bundle-0.8.zip
export JATS_CATALOG=~/jatspacks/catalog.xml

Now, when you run validate.sh, it will automatically use that catalog file to resolve any DTDs. For example:

validate.sh samples/sample.xml

Schema sources

The master schema files are in Schematron format, in the schema subdirectory.

The "master" Schematron file, which determines conformance or non-conformance, is jats4r.sch. This includes all topics, but only the "error level" tests.

There are two other "master" Schematron files, which break down the tests in two different ways: one by message severity (info, warnings, and errors) and one by topic (math and permissions).

The test files themselves are broken down into separate modules, by topic and by severity level. So, for example, permissions-errors.sch, permissions-warnings.sch, and permissions-info.sch define the tests for the permissions topic. All three run tests on permissions, but the permissions-errors reports only those things that are errors.

In summary, the master Schematron files are:

  • jats4r.sch - all topics, error level only
  • jats4r-level.sch - groups tests by message severity level. Using this with phase=info (or phase=#ALL) will run all of the tests.
  • jats4r-topic.sch - groups tests by topic. So, for example, when you run this with the phase=math, you will run just the math tests.

The generated-xsl subdirectory contains XSLT2 files that have been generated from the Schematrons, using the process-schematron.sh script. These XSLT files must not be edited directly. If a change is made to a Schematron, a new XSLT should be auto-generated, using the process-schematron.sh script.

When run against an instance, they will generate a report in Schematron Validation Report Language XML (SVRL).

Generating XSLTs from Schematron sources

To generate new XSLT files in the generated-xsl directory, first, as described above, you must source the bin/setup.sh script into your shell.

Then, use the script process-schematron.sh to convert the Schematron files into XSLT:

bin/process-schematron.sh

You can optionally pass this script an input-type (level or topic) and a phase (which depends on the input-type). Enter bin/process-schematron.sh -h for usage information.

This writes the output files into the generated-xsl directory.

Testing

To test the online validator, use the files in the test/test-files directory.

Automated tests coming soon. See issue 8.

How it works

For information on this implementation of this tool, see the paper we submitted to Balisage 2015, "A client-side JATS4R validator using Saxon-CE". The following is the data-flow diagram from that paper, illustrating in a compact form what is happening under the hood.

Data flow diagram

Limitations

Since it runs on the client, and we don't have access to all the features of libxml, the first phase, in which the tool looks for <?xml-model?> processing instructions and the doctype declaration, is done with a custom parser, that is not very robust. Therefore, some things will be missed: for example, if a processing instruction (PI) is "commented out", this validator will not notice the comment delimiters, and will treat the PI as though it weren't.

Dependencies

This tool has a number of dependencies. Some are system tools, that are present on many Unix systems, or that can be installed easily, and others are fetched and installed by the bin/setup.sh script.

System tools

You'll need to make sure that your system has the following.

  • wget
  • Java version 7 or later
  • Python 3
  • The Python pyyaml module

EcmaScript 6 Promise polyfill

Polyfill for the Promise feature.

This was downloaded from the GitHub jakearchibald/es6-promise repo, specifically, this version, and put into assets/es6-promise.min.js.

In setup.sh, this is copied into lib/es6-promise.min.js.

Fetch polyfill

Polyfill for the EcmaScript 6 fetch feature. From the GitHub github/fetch repo, this version was downloaded and put into the assets directory.

In setup.sh, this is copied to lib/fetch.js.

Saxon CE 1.1

Open-source client-side XSLT 2.0 processor.

Can be downloaded from this page.

In setup.sh, this is downloaded and extracted to lib/Saxonce.

xmltool.js

For now, this is saved in this repository in the assets directory. This is generated from the code in the jats4r/xml.js repository, which was forked from Alf Eaton's hubgit/xml.js repo, which was in turn forked from kripken/xml.js.

The setup.sh script copies this into lib.

NLM and JATS DTDs

The setup.sh script downloads these from the NCBI/nlm-dtd and NCBI/niso-jats repositories, and extracts under the lib directory.

DtdAnalyzer

This tool is used by the flatten.py script, to flatten each of the various JATS DTDs.

In setup.sh, this is downloaded into the lib/DtdAnalyzer-0.5 directory.

Saxon Home Edition

The saxon9he.jar file is included with the DtdAnalyzer.

Schematron schema

This is the schema that defines what is a valid Schematron file. This version is from 2005, and is included here as assets/isoSchematron.rng.

The setup.sh script copies this into lib.

Jing

This was originally downloaded from here, and was added to the assets directory of this repository.

The setup.sh script extracts this into lib/jing-20081028.

Schematron XSLT

Downloaded from here on 2015-04-02, and included as assets/iso-schematron-xslt2.zip.

The setup.sh script extracts this into lib/iso-schematron-xslt2.

Apache Commons OASIS catalog resolver

Downloaded from here on 2015-04-02, and included as assets/xml-commons-resolver-1.2.zip.

The setup.sh script extracts this into lib/xml-commons-resolver-1.2.

Prism

For syntax highlighting. This was downloaded from the Prism site with:

  • Languages: only "markup"

Chosen jquery library

For the select "sample articles" dropdown menu. See their home page.

Acknowledgements

Portions of this software were borrowed from the following sources: