Skip to content
Karl Bartel edited this page Aug 2, 2020 · 2 revisions

Processing Steps

Getting from Wiktionary to end-user usable WikDict dictionaries involves many processing steps. This summarizes the steps driven by the code in this repository. The short names for the processing steps which are used in the source code are given in parenthesis.

Load RDF data into virtuoso

dbnary converts the Wiktionary markup into machine readable RDF triples. To query that data, it must first be loaded into an RDF database server, in our case the open source edition of OpenLink Virtuoso.

Query RDF data into sqlite (raw)

While RDF is very flexible, querying it is less efficient and the tooling is less mature compared to SQL databases. This steps runs SPARQL queries on the RDF data to extract all relevant data into tables in SQLite databases. No later step will touch the RDF data.

Postprocess queries data (process)

This steps cleans up the raw data and normalizes differences between the different languages.

Infer missing translations (infer)

Create target agnostic dictionary dbs (generic)

Export dictionaries for specific target applications

WikDict web dictionaries (wdweb)

TEI P5 files for FreeDict (tei)

Downloadable dictionaries via pyglossary