Skip to content

elifesciences/bot-lax-adaptor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bot-lax-adaptor

This application:

  1. listens for messages from the elife-bot
  2. downloads XML from S3 via HTTP
  3. converts XML to a mostly complete representation of our article-json schema
  4. sends article-json to Lax to be ingested

installation

$ ./install.sh

web interface

The bot-lax-adaptor comes with a simple web interface that allows uploading eLife JATS XML, generating article-json from it and then validating it.

$ ./web.sh

See example-upload-file-to-api.sh.

conversion

$ source venv/bin/activate
$ python src/main.py /path/to/a/jats.xml

Output at time of writing looks like this.

convert specific article

Thin wrapper around the above command:

$ ./scrape-article.sh ./article-xml/articles/elife-09560-v1.xml

convert random article

Converts a random article to article-json:

$ ./scrape-random-article.sh

convert all articles

Converts all articles in the ./article-xml/articles/ directory, writing the results to ./article-json/. This script makes use of all available cores:

$ ./generate-article-json.sh

validation

The article-json generated by this application is structured according to the eLife json-schema article specification.

Because the XML only contains a partial representation of an article, validation also involves filling in certain gaps that can only be provided by Lax.

$ source venv/bin/activate
$ python src/validate.py /path/to/an/article.json

validate specific article-json

Thin wrapper around above command:

$ ./validate-json.sh ./article-json/elife-09560-v1.xml.json

validate all article-json

Validates all article-json in the ./article-json/ directory. This script makes use of all available cores:

$ ./validate-all-json.sh

backfill

populating a Lax installation

This generates, validates and then performs an ingest --force to lax for each article in the article-xml repository.

$ ./backfill.sh

The generation, validation and ingest actions happen in separate steps for greater parallelism.

updating a small subset of articles in Lax

This reads a list of article IDs from a file and then generates, validates and performs an ingest --force to lax for each article sequentially. It can be quite slow for a large number of articles.

$ ./backfill-many.sh

listening/sending

receiving messages from an AWS SQS queue

This is quite eLife-specific but can be modified easily if you're a developer:

$ ./bot-lax-listener.sh

testing

$ ./test.sh

Copyright & Licence

Copyright 2023 eLife Sciences. Licensed under the GPLv3

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.