Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leading wildcards on certain search terms cause 100% CPU, freezing browser #368

Closed
kspearrin opened this issue Aug 11, 2018 · 11 comments
Closed

Comments

@kspearrin
Copy link

kspearrin commented Aug 11, 2018

See the following example: https://jsfiddle.net/8s0dzp2c/22/

Something about the amazon URL being stored in this url field causes the browser to freeze and become unresponsive when searching by a wildcard query.

https://www.amazon.com/ap/signin?_encoding=utf8&openid.assoc_handle=usflex&openid.claimed_id=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.identity=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.mode=checkid_setup&openid.ns=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0&openid.ns.pape=http%3a%2f%2fspecs.openid.net%2fextensions%2fpape%2f1.0&openid.pape.max_auth_age=0&openid.return_to=https%3a%2f%2fwww.amazon.com%2f%3fref_%3dnav_custrec_signin

If I trim the URL to something like https://www.amazon.com, there is no issue. See here: https://jsfiddle.net/8s0dzp2c/24/

Also, if I remove the wildcard altogether, it doesn't freeze anymore.

Is there some character in the URL that is causing things to bug out or something?

@olivernn
Copy link
Owner

It's entirely possible that the wildcard handling code is sub-optimal in Lunr. I think that is more likely the cause rather than there being special characters in that string.

What are you trying to search for in the URL? You could try splitting it into more meaningful parts, e.g domain, subdomain, path, query etc?

@kspearrin
Copy link
Author

@olivernn The example jsfiddle I posted above shows what I am searching for: "am"

Another thing I noticed is that if I just search for "a", the problem does not occur.

Unfortunately the nature of this field comes from user input and could have anything in it. It is not well structured. This URL was just an example I found during testing.

@kspearrin
Copy link
Author

kspearrin commented Aug 17, 2018

@olivernn Trying to debug this more today since we have more users reporting the problem now.

The problem is definitely showing in lunr.TokenSet.intersect as shown by the following profile:

image

image

This was searching for *am* with the following test:

<!DOCTYPE html>
<html>
<head>
<title>lunr test</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.2/lunr.js"></script>
</head>
<body>

<input type="text" id="query">
<button type="button" id="search">
    Search
</button>

<script>

const idx = lunr(function() {
    this.field('key');
    this.add({
        "key": "https://www.amazon.com/ap/signin?_encoding=utf8&openid.assoc_handle=usflex&openid.claimed_id=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.identity=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.mode=checkid_setup&openid.ns=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0&openid.ns.pape=http%3a%2f%2fspecs.openid.net%2fextensions%2fpape%2f1.0&openid.pape.max_auth_age=0&openid.return_to=https%3a%2f%2fwww.amazon.com%2f%3fref_%3dnav_custrec_signin",
        "id": "1"
    });
});

document.getElementById('search').addEventListener('click', (event) => {
    const query = document.getElementById('query').value;
    console.log('Searching "' + query + '"');
    console.time('Searched');
    const results = idx.search(query);
    console.log('Results: ' + results.length);
    console.timeEnd('Searched');
}, false);

</script>

</body>
</html>

@kspearrin
Copy link
Author

Some more findings with this particular use case:

The full value of the URL that causes the problem here is:

https://www.amazon.com/ap/signin?_encoding=utf8&openid.assoc_handle=usflex&openid.claimed_id=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.identity=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.mode=checkid_setup&openid.ns=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0&openid.ns.pape=http%3a%2f%2fspecs.openid.net%2fextensions%2fpape%2f1.0&openid.pape.max_auth_age=0

This takes ~30000 ms to complete the *am* search query on.

If I reduce the value of the URL to the following, the same *am* search query completes in 30ms.

https://www.amazon.com/ap/signin?_encoding=utf8&openid.assoc_handle=usflex&openid.claimed_id=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.identity=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0%2fidentifier_select&openid.mode=checkid_setup&openid.ns=http%3a%2f%2fspecs.openid.net%2fauth%2f2.0&openid.ns.pape=

I have only removed the following from the tail of the URL:

http%3a%2f%2fspecs.openid.net%2fextensions%2fpape%2f1.0&openid.pape.max_auth_age=0

@kspearrin
Copy link
Author

So I found another term that isn't a crazy long URL that will also cause this same problem. It appears that the issue always occurs with a leading wildcard. Trailing wildcards do not cause a problem.

Indexed value: ffffffffffffffffffdfgsdfgdsfgsdfgsdfgsdfgsdfgsdfgsdfgertsdfgsd

Search query: *ff

Takes 47 seconds to complete on my powerful desktop machine.


<!DOCTYPE html>
<html>
<head>
<title>lunr test</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.2/lunr.js"></script>
</head>
<body>

<input type="text" id="query">
<button type="button" id="search">
    Search
</button>

<script>

const idx = lunr(function() {
    this.field('key');
    this.add({
        "key": "ffffffffffffffffffdfgsdfgdsfgsdfgsdfgsdfgsdfgsdfgsdfgertsdfgsd",
        "id": "1"
    });
});

document.getElementById('search').addEventListener('click', (event) => {
    const query = document.getElementById('query').value;
    console.log('Searching "' + query + '"');
    console.time('Searched');
    const results = idx.search(query);
    console.log('Results: ' + results.length);
    console.timeEnd('Searched');
}, false);

</script>

</body>
</html>
Searching "*ff"
Results: 0
Searched: 47188.121826171875ms

Searching "ff*"
Results: 1
Searched: 7.14501953125ms

@kspearrin kspearrin changed the title Something about this URL field freezes the browser on wildcard query Leading wildcards on certain search terms cause 100% CPU, freezing browser Aug 20, 2018
@olivernn
Copy link
Owner

Sorry for the delay in responding, and thanks for taking the time to dig a bit more into this issue. This seems similar to #270. I think this is a case of 'catastrophic backtracking'. Lunr isn't using a regex but the process of matching on wildcards is similar.

I think the issue will be in lunr.TokenSet#intersect but will need to spend a bit of time trying to figure out if there is a way to protect against these kind of cases.

@kspearrin
Copy link
Author

Indeed, it does seem to have something to do with repeating values. Here is an even simpler example:

<!DOCTYPE html>
<html>
<head>
<title>lunr test</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.2/lunr.js"></script>
</head>
<body>

<input type="text" id="query">
<button type="button" id="search">
    Search
</button>

<script>

const idx = lunr(function() {
    this.field('key');
    this.add({
        "key": "ffffffffffffffffffffffffffffff",
        "id": "1"
    });
});

document.getElementById('search').addEventListener('click', (event) => {
    const query = document.getElementById('query').value;
    console.log('Searching "' + query + '"');
    console.time('Searched');
    const results = idx.search(query);
    console.log('Results: ' + results.length);
    console.timeEnd('Searched');
}, false);

</script>

</body>
</html>
Searching "ff"
Results: 0
Searched: 1.626220703125ms
Searching "ff*"
Results: 1
Searched: 0.557861328125ms
Searching "*ff"
Results: 1
Searched: 13159.626953125ms

Let me know if there is anything more I can do to help.

@olivernn
Copy link
Owner

So, I did a bit of digging and I found this. I cannot remember why it was there in the first place, plus the TODO comment seems to imply that at some point in the past I was also unsure about it. All the current tests pass without it, plus, the particular case you mention in this issue is no longer a problem. The performance for all wildcard matching also improves (slightly) without it.

It seems like removing it should be fine, but I'm a bit cautious. I need to convince myself that it isn't serving any purpose. I'm going to write some more tests and try and trace through the implementation to try and remember what I was thinking 3 years ago when adding that code!

@kspearrin
Copy link
Author

Just another observation: It seems that the more repeat letters that exist, the worse the problem becomes. For example with the previous test, fffffffffffffffffffffffffffffffffffffffffffffffff performs much worse than fffffff.

@olivernn
Copy link
Owner

olivernn commented Sep 3, 2018

I've just published 2.3.3 which includes a fix for this issue. Let me know if it works for you, thanks.

@olivernn olivernn closed this as completed Sep 3, 2018
@kspearrin
Copy link
Author

Seems to be working well now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants