Würzburg glosses extraction

Allows one to extract grammatical information on glosses from the Würzburg glosses lexicon (Kavanagh 2001).

Usage

Starting from the PDF version of the lexicon, one can use pdf2html.py to convert to HTML (uses PDFMiner), and then cleanhtml.py to remove tags (uses BeautifulSoup).

After preprocessing, run.py allows one to extract grammatical information out of the lexicon.

Web application

This project comes with a small web application (build in Flask) that allows you to run the extraction for a single gloss, in case something went wrong during the automatic phase. The web application can be started by running web.py.

Licence

This work is shared under a BSD 3-Clause licence. See LICENSE for more information.

Citation

To cite this repository, please use the metadata provided in CITATION.cff.

Contact

Würzburg glosses extraction is developed by Martijn van der Klis and the Research Software Lab at the Centre for Digital Humanities, Utrecht University.

For questions or suggestions, contact the Centre for Digital Humanities or open an issue in this respository.

References

Kavanagh, Seamus (2001). A lexicon of the Old Irish glosses in the Würzburg Manuscript of the Epistles of St. Paul. Edited by Dagmar S. Wodtko. Österreichische Akademie der Wissenschaften.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
extractor		extractor
tests		tests
webgloss		webgloss
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cleanhtml.py		cleanhtml.py
pdf2html.py		pdf2html.py
requirements.in		requirements.in
requirements.txt		requirements.txt
run.py		run.py
web.py		web.py

License

UUDigitalHumanitieslab/wurzburg-glosses-extraction

Folders and files

Latest commit

History

Repository files navigation

Würzburg glosses extraction

Usage

Web application

Licence

Citation

Contact

References

About

Resources

License

Stars

Watchers

Forks

Languages