Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(suggestions): Replace wrong Jaro-Winkler #4668

Merged
merged 1 commit into from Jan 23, 2023

Conversation

corneliusroemer
Copy link
Contributor

Implementation of Jaro-Winkler similarity in the dguo/strsim-rs crate
is wrong, causing strings with common prefix >=10
to all be considered perfect matches

Using Jaro instead from the same crate fixes this issue
Benefit of favoring long prefixes exists for matching common names
But not for typo detection
Hence use of Jaro instead of Jaro-Winkler is acceptable

Confidence threshold adjusted so that bar is still suggested for baz
since Jaro is strictly < Jaro-Winkler
such an adjustment is expected. This is acceptable.
While exact suggestions may change, the net change will be positive
Suggestions are purely decorative and should thus not breaking change

Fixes #4660
Also see rapidfuzz/strsim-rs#53

Implementation of Jaro-Winkler similarity in the dguo/strsim-rs crate
is wrong, causing strings with common prefix >=10
to all be considered perfect matches

Using Jaro instead from the same crate fixes this issue
Benefit of favoring long prefixes exists for matching common names
But not for typo detection
Hence use of Jaro instead of Jaro-Winkler is acceptable

Confidence threshold adjusted so that `bar` is still suggested for `baz`
since Jaro is strictly < Jaro-Winkler
such an adjustment is expected. This is acceptable.
While exact suggestions may change, the net change will be positive
Suggestions are purely decorative and should thus not breaking change

Fixes clap-rs#4660
Also see rapidfuzz/strsim-rs#53
@epage
Copy link
Member

epage commented Jan 23, 2023

Thanks for following through on this!

@corneliusroemer
Copy link
Contributor Author

Thanks for the fast review and merge! It was quite fun tracing it back to the algorithm implementation.

In the future, it could be good to have a better, more real-life test set of possible_values and values to check that the threshold makes sense.

@corneliusroemer corneliusroemer deleted the fix-4660 branch January 23, 2023 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong Did you mean suggestion: for alignmentScorr clap suggests alignmentStart rather than alignmentScore
2 participants