Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on edit distance align #3180

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

yzhaoinuw
Copy link
Contributor

This pull request attempts to fix the issue raised by @yzhangcs's comment as a follow-up to #3017. In short, when the substitution cost is greater than 2, the path generated by _edit_dist_backtrace() still includes substitution. This is incorrect because a deletion followed by an insertion incur a cost of 2, which is lower than substitution, and thus should be preferred over substitution.

In the following test case due to @yzhangcs, the current _edit_dist_backtrace() gives the following output

from nltk.metrics.distance import edit_distance_align

>>> edit_distance_align('a', 'b', float('inf')) 
[(0, 0), (1, 1)]

After the fix it correctly outputs [(0, 0), (0, 1), (1, 1)]. I tried to add the test case provided by @yzhangcs, but didn't figure out how. Could the maintainers of NLTK look into this and add this test case?

…nf, the path it returned still included substitutions. Modified it to check for substitution cost first before listing the directions to check. If substitution cost is inf, it will not add the direction that's obtained by substitution, thus preventing the bug.
…nf, the path it returned still included substitutions. Modified it to check for substitution cost first before listing the directions to check. If substitution cost is inf, it will not add the direction that's obtained by substitution, thus preventing the bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant