You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have recently noticed that edit_distance(s1, s2, transpositions=True) in nltk.metrics.distance does not compute the Damerau-Levenshtein distance between s1 and s2 -- i.e. the minimal number of operations among insertion, deletion, substitution and transpositions required to transform s1 into s2. As an example,
edit_distance("ca", "abc", transpositions=True) returns 3, where the distance is actually 2 (it can be achieved by first transposing c and a, then inserting b in the middle).
I am unsure whether this is intended behavior (apologies if it is). It seems that what the function computes in its present state, is what is referred to as 'Optimal String Alignment Distance' on Wikipedia (https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) -- which imho does not really have a natural and appealing interpretation. As a side note, despite its name, Optimal String Alignment is not a distance in the mathematical sense of the term (it does not satisfies the triangle inequality) which contradicts a comment in the heading of nltk.metrics.distance.
Best,
Antoine
The text was updated successfully, but these errors were encountered:
I have recently noticed that
edit_distance(s1, s2, transpositions=True)
innltk.metrics.distance
does not compute the Damerau-Levenshtein distance betweens1
ands2
-- i.e. the minimal number of operations among insertion, deletion, substitution and transpositions required to transforms1
intos2
. As an example,edit_distance("ca", "abc", transpositions=True)
returns 3, where the distance is actually 2 (it can be achieved by first transposing c and a, then inserting b in the middle).I am unsure whether this is intended behavior (apologies if it is). It seems that what the function computes in its present state, is what is referred to as 'Optimal String Alignment Distance' on Wikipedia (https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) -- which imho does not really have a natural and appealing interpretation. As a side note, despite its name, Optimal String Alignment is not a distance in the mathematical sense of the term (it does not satisfies the triangle inequality) which contradicts a comment in the heading of
nltk.metrics.distance
.Best,
Antoine
The text was updated successfully, but these errors were encountered: