Missing expected results in fuzzy search (no stemming) #375

lucaong · 2018-09-27T10:27:51Z

Performing fuzzy search seems to miss some words within the given edit distance.
Here is one example (disabling stemming and all other pipeline functions to ensure that we are only observing the behavior of fuzzy search):

const l = lunr(function () {
  this.field('txt')
  this.pipeline.remove(lunr.stemmer)
  this.pipeline.remove(lunr.trimmer)
  this.pipeline.remove(lunr.stopWordFilter)
  this.searchPipeline.remove(lunr.stemmer)
  this.searchPipeline.remove(lunr.trimmer)
  this.searchPipeline.remove(lunr.stopWordFilter)

  ;[
    { id: 1, txt: 'coscienza' },
    { id: 2, txt: 'scienza' },
    { id: 3, txt: 'conoscienza' },
    { id: 4, txt: 'coscienzaxx' },
  ].forEach(line => this.add(line))
})

l.search('coscienza~2')
// => [ { ref: '3', score: ... }, { ref: '1', score: ... } ]

In the example above, I would expect the words scienza and coscienzaxx to also match, as they are at edit distance of 2 from the query term coscienza (two deletions or insertions at the word boundary).

This is also visible if one observes the fuzzy TokenSet expansion for the term coscienza:

lunr.TokenSet.fromFuzzyString("coscienza", 2).toArray()
// => results contains `*scienza` and `coscienza`, but not `scienza` or `coscienza**`
// (in the context of fuzzy search the * token is not linked to itself, so it matches exactly 1 character)

I am not sure if this is a bug or the intended behavior of fuzzy search. In the latter case, maybe it would deserve a mention in the documentation.

Thanks again for the great work!

The text was updated successfully, but these errors were encountered:

olivernn · 2018-10-16T19:00:04Z

Sorry for taking a while to get to this...

Looks like a bug to me, I put together a simplified reproduction on jsfiddle.

It looks like, for some reason, that trailing characters only match if they are the same as the last character in the fuzzy string, weird! This also explains why the test is passing.

I'll dig into this a bit and come up with a fix, thanks for reporting.

hoelzro · 2018-10-25T03:56:10Z

Looking at q.toArray() from @olivernn's example, I see the following output:

[ '*oo',
  '*foo',
  'oo',
  'ofo',
  'f*o',
  'f*oo',
  'fo',
  'fo*',
  'fo*o',
  'foo' ]

Would I be incorrect in thinking that foo* should be in there as well? The presence of fo*o explains why fooo is in the intersection, but why food is not.

Fixes GH olivernn#375 Before, insertions were not made at the end of a fuzzy string for token sets

hoelzro · 2018-10-26T00:43:56Z

I've created a PR at #382 that I believe fixes this issue.

Fixes GH #375 Before, insertions were not made at the end of a fuzzy string for token sets

olivernn · 2018-10-30T18:39:45Z

I've just pushed 2.3.5 which includes the fix from @hoelzro .

hoelzro added a commit to hoelzro/lunr.js that referenced this issue Oct 26, 2018

Make sure fuzzy strings have fuzziness at the end

c1e0f24

Fixes GH olivernn#375 Before, insertions were not made at the end of a fuzzy string for token sets

hoelzro mentioned this issue Oct 26, 2018

Make sure fuzzy strings have fuzziness at the end #382

Merged

olivernn pushed a commit that referenced this issue Oct 29, 2018

Make sure fuzzy strings have fuzziness at the end

cc8876d

Fixes GH #375 Before, insertions were not made at the end of a fuzzy string for token sets

olivernn closed this as completed Oct 30, 2018

justinrummel mentioned this issue Dec 29, 2018

lunr 2.3.5 update mmistakes/minimal-mistakes#2010

Merged

lucaong mentioned this issue Feb 11, 2019

Missing results within edit distance in fuzzy search #390

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing expected results in fuzzy search (no stemming) #375

Missing expected results in fuzzy search (no stemming) #375

lucaong commented Sep 27, 2018

olivernn commented Oct 16, 2018

hoelzro commented Oct 25, 2018 •

edited

hoelzro commented Oct 26, 2018

olivernn commented Oct 30, 2018

Missing expected results in fuzzy search (no stemming) #375

Missing expected results in fuzzy search (no stemming) #375

Comments

lucaong commented Sep 27, 2018

olivernn commented Oct 16, 2018

hoelzro commented Oct 25, 2018 • edited

hoelzro commented Oct 26, 2018

olivernn commented Oct 30, 2018

hoelzro commented Oct 25, 2018 •

edited