Skip to content

Greeklish to greek converter and spell checker in ruby (fuzzy search)

Notifications You must be signed in to change notification settings

AlexAvlonitis/greeklish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Greeklish

Converts greeklish i.e 'ti kanete' to greek 'τι κάνετε' and applies appropriate spell-checking. Experimented with the Levenshtein algorithm.

High level Algorithm explanation

> Convert greeklish to greek

>> Build a BK-tree out of the greek dictionary

On the first run, it builds a BK-tree out of a 500,000-word Greek dictionary greek.dic and stores it in memory. It will also create a local "bk_tree.yml" file, so on subsequent runs, it can be used directly, resulting in faster loading.

What is a bk-tree

>> One to one mapping latin to greek conversion

This is nothing but a one to one mapping of each latin letter into its greek equivalent, specified by the en.yml file, with only few special cases of diphthongs.

>> Greek approximate spell checking / fuzzy search

For each mapped word to its greek equivalent, it parses the bk-tree and transforms it to the closest word it finds, specified by the DIST_THRESHOLD constant with a default value of 1. Meaning, it will either return exact matches of 0 distance or the first 3 matches with distance 1. example: αυπο -> "αυτό/αυγό". The larger the DIST_THRESHOLD the slower the performance.

Understanding Levenshtein edit distance (article)

Understanding Levenshtein edit distance (video)

How to run

irb

require './lib/greeklish'

Greeklish.convert('ti kanete')

About

Greeklish to greek converter and spell checker in ruby (fuzzy search)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages