Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edit_distance() does not compute the actual Damerau-Levenshtein distance when transpositions is set to True #2734

Closed
avena554 opened this issue Jun 22, 2021 · 0 comments · Fixed by #2736
Assignees

Comments

@avena554
Copy link
Contributor

avena554 commented Jun 22, 2021

I have recently noticed that edit_distance(s1, s2, transpositions=True) in nltk.metrics.distance does not compute the Damerau-Levenshtein distance between s1 and s2 -- i.e. the minimal number of operations among insertion, deletion, substitution and transpositions required to transform s1 into s2. As an example,

edit_distance("ca", "abc", transpositions=True) returns 3, where the distance is actually 2 (it can be achieved by first transposing c and a, then inserting b in the middle).

I am unsure whether this is intended behavior (apologies if it is). It seems that what the function computes in its present state, is what is referred to as 'Optimal String Alignment Distance' on Wikipedia (https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) -- which imho does not really have a natural and appealing interpretation. As a side note, despite its name, Optimal String Alignment is not a distance in the mathematical sense of the term (it does not satisfies the triangle inequality) which contradicts a comment in the heading of nltk.metrics.distance.

Best,
Antoine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants