OCA.py

by Michael Kubina

OCA.py is an acronym and describes this OCR Confidence Analysis script written in python.

This is a graduation work for the 2022 Data Librarian Certificate Course from the Technical University Cologne. The result of the graduation work is a script, which is called OCA.py. The script was published in August 2022. This is the corresponding jupyter-notebook with additional insights.

OCA.py is licensed under GPL3 (https://www.gnu.org/licenses/gpl-3.0.en.html)

Requirements

The following Python libraries are required:

requests
BeautifulSoup
pandas
os
numpy
pprint
matplotlib
pillow
shutil
seaborn

Install them with pip install -r requirements.txt.

This software also uses Bootstrap (https://getbootstrap.com/)

Usage

In this graduation_work branch, the script is specifically tailored towards the METS-file-location from the Staats- und Universitätsbibliothek Hamburg. You only need to provide the record identifier in order to use it. This also means, that you can currently just test it on objects from this specific library. For other METS-files a refactoring is necessary.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
PPN1026788544		PPN1026788544
PPN1041860838		PPN1041860838
PPN1672846668		PPN1672846668
PPN86268370X		PPN86268370X
ocapy		ocapy
LICENSE		LICENSE
Michael Kubina - Abschlussarbeit - Visualisierung OCR-Konfidenz.pdf		Michael Kubina - Abschlussarbeit - Visualisierung OCR-Konfidenz.pdf
Michael Kubina - Expose - Visualisierung OCR-Konfidenz.pdf		Michael Kubina - Expose - Visualisierung OCR-Konfidenz.pdf
README.md		README.md
oca.py		oca.py
ocapy.ipynb		ocapy.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPN1026788544

PPN1026788544

PPN1041860838

PPN1041860838

PPN1672846668

PPN1672846668

PPN86268370X

PPN86268370X

ocapy

ocapy

LICENSE

LICENSE

Michael Kubina - Abschlussarbeit - Visualisierung OCR-Konfidenz.pdf

Michael Kubina - Abschlussarbeit - Visualisierung OCR-Konfidenz.pdf

Michael Kubina - Expose - Visualisierung OCR-Konfidenz.pdf

Michael Kubina - Expose - Visualisierung OCR-Konfidenz.pdf

README.md

README.md

oca.py

ocapy.ipynb

ocapy.ipynb

requirements.txt

requirements.txt

Repository files navigation

OCA.py - Visualizing the word confidence of OCR results (ALTO-XML)

Requirements

Usage

About

Releases

Packages

Languages

License

UB-Mannheim/ocapy

Folders and files

Latest commit

History

Repository files navigation

OCA.py - Visualizing the word confidence of OCR results (ALTO-XML)

Requirements

Usage

About

Resources

License

Stars

Watchers

Forks

Languages