We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Bitextor generates translation memories from multilingual websites
Python 282 43
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Python 146 21
Tool to fix bitexts and tag near-duplicates for removal
Python 27 3
Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.
Python 8 2
PDF parser and converter to HTML
Java 80 14
Extracts plain text, language identification and more metadata from WARC records
C++ 18 5
Bicleaner fork that uses neural networks
Pre-filtering step for bicleaner
Repository for storing testing outputs from Bitextor
Loading…