Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search hangs when searching a repeated substring with wildcard #270

Closed
meltuhamy opened this issue May 30, 2017 · 10 comments
Closed

Search hangs when searching a repeated substring with wildcard #270

meltuhamy opened this issue May 30, 2017 · 10 comments

Comments

@meltuhamy
Copy link

See the jsFiddle here: https://jsfiddle.net/bwe0f44g/

var idx = lunr(function () {
  this.ref('name')
  this.field('text')
  this.add({
  "name": "long",
  "text": "Zlah_testZlah_Zlah_testZlah_testZlah_testZlah_testZlah_testZlah_testZlah_testZlah_testZlah"
});
})

window.search = function search(){
	var t0 = performance.now();
  var result = idx.search('*Zlah*');
  var t1 = performance.now();
  alert('Found '+result.length+' results in ' + (t1-t0) + 'ms');
}
@meltuhamy meltuhamy changed the title Search hangs when searching a repeated string with wildcard Search hangs when searching a repeated substring with wildcard May 30, 2017
@meltuhamy
Copy link
Author

It seems that if the search includes a * at the beginning of the query, the search hangs.

idx.search('*Zlah*') // hangs
idx.search('Zlah*') // ok

@olivernn
Copy link
Owner

On my machine that search eventually completes after 12 seconds, something is clearly not right though.

The test data looks fairly pathological. Out of interest, how did you discover it?

My guess is that the leading wildcard is causing many iterations through the graph that represents the token, I'll need to take a closer look with a debugger to see where its getting stuck though.

@meltuhamy
Copy link
Author

I'm using wildcards to allow substring searches. If there are a lot of repeated items in the result, it will take longer and longer to perform the search. For now, I'm using my own pipeline function to expand all tokens into their substrings so the user never needs to use a wildcard character (I'm using lunr in the context of a loose auto complete search scenario).

@olivernn
Copy link
Owner

Searches with leading wildcards are considerably more expensive, that said, I wouldn't expect it to take this long. I'll have a look through the relevant code but its a bit involved and I wrote it over a year ago now.

Perhaps there is a better way of implementing what you are trying to achieve though, can you give an example of the kind of documents you are searching within, and the kind of query you are trying? Is string in the document multiple tokens merged together into a string? If I understood what you're trying to achieve better I might be able to suggest a more performant way of implement it.

@meltuhamy
Copy link
Author

I'm doing a search auto complete feature where it finds all documents which have a substring of the search term. e.g.

the words

hello world
blah
word
nice
dice

and the search term d

should get the results

hello world
word
dice

because d is a substring of the two reuslts.

To begin with, I was able to achieve this using the wildcard method (i.e. search for *d*) but found the issue above and performance problems.

I'm open to ideas as to what's a better way of doing it, but for now I have a working solution which is a pipeline function to expand all terms to their subsets.

@olivernn
Copy link
Owner

olivernn commented Jun 3, 2017

If you have a solution that is working then thats great. I'll still spend a bit of time trying to understand the particular case you originally posted, perhaps there is some optimisation or bug causing the long run time.

@meltuhamy
Copy link
Author

Yes it seems that is quite a nasty bug that is preventing me from using wildcards. Thanks for your help!

@nerumo
Copy link

nerumo commented Sep 25, 2017

@meltuhamy how exactly did you do the workaround with the pipeline function? I'm afraid that expanding every keyword to subsets would make my index burst.

I currently had to disable leading wildcards because of this issue (>20s with leading wildcard compared to <400ms without).

@olivernn
Copy link
Owner

olivernn commented Sep 3, 2018

I've pushed 2.3.3 which should resolve the long search times with some wildcard searches, let me know if it solves the issue, thanks.

@olivernn olivernn closed this as completed Sep 3, 2018
@meltuhamy
Copy link
Author

Confirmed this fixes it. Thanks @olivernn !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants