Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU stuck in levenshtein calculation #1087

Closed
atombender opened this issue Oct 25, 2018 · 17 comments · May be fixed by novalina26/anchor#1
Closed

100% CPU stuck in levenshtein calculation #1087

atombender opened this issue Oct 25, 2018 · 17 comments · May be fixed by novalina26/anchor#1

Comments

@atombender
Copy link

Description

Browser gets stuck in 100% CPU for a long time whenever reloading after a code change, via Webpack.

Expected behavior

Shouldn't do that.

Actual behavior

Gets stuck in haveTextSimilarity calling levenshtein.get() on two strings that are 96227 characters long.

Suffice to say, the Levenshtein implementation in fast-levenshtein might be "fast", but it's not fast. If I modify haveTextSimilarity to return false on huge strings, it no longer gets bogged down.

Why are the strings being compared so big? I don't know; I'm not using Webpack in any unusual way. My components are importing CSS such as ionicons and which end up containing Base64-encoded source maps, which certainly contributes to the size.

Environment

React Hot Loader version: 4.3.11

  1. Operating system: macOS
  2. Browser and version: Chrome 69.0.3497.100
@theKashey
Copy link
Collaborator

Levenshtein should be O(n) on the similar texts, and could increase complexity, as difference grew. But it should be a super small on hot update.

I could not disable it, as long it would lead to false positive replaces, but you can mitigate the problem by "registering" component - babel plugin does it for "top level" variables.

Is your component defined on a top level?

@atombender
Copy link
Author

Not sure what you mean by registering. The component that made me discover this issue isn't the top level one, but it's embedded two levels down or so from the root.

@theKashey
Copy link
Collaborator

That's not about module structure, only about you ability to "export" some component from a file.

//anyFile.js

// module top level variable
const iAmOnTopLevel = () => <div>aha</div>;

// component would be created inside HOC, ie not "top level". 
const likeAHOC = (Component) => () => <div>and this one is hidden inside</div>

Levenshtein tests is applied only for that "hidden" inside another functions components. Should be called veeeery rarely.

@atombender
Copy link
Author

atombender commented Oct 26, 2018

I see. The component in question is within two HOCs:

import React from 'react';
import PropTypes from 'prop-types';
import {connect} from 'react-redux';
import {withRouter} from 'react-router';

export default withRouter(
  connect(state => ({
    loginSessionState: state.loginSession,
    feeds: state.feeds.models
  }))(
    class GlobalNavigation extends React.Component {
      // ...
    }));

@theKashey
Copy link
Collaborator

As long it is not created inside HOC, but would be a given some props by HOC - just extract class declaration outside of connect, and call it a day

// a top level variable!
class GlobalNavigation extends React.Component {
      // ...
 }

export default withRouter(
  connect(state => ({
    loginSessionState: state.loginSession,
    feeds: state.feeds.models
  }))(GlobalNavigation));

@VinSpee
Copy link

VinSpee commented Oct 26, 2018

Hi! I'm seeing some 100% CPU issues as well when hot-reloading.

@meinders
Copy link

meinders commented Dec 21, 2018

I'm experiencing something similar here (react-hot-loader@4.3.12), but on a component that shouldn't be 'hidden' as far as I can tell, since it's a default export:

export default class PropertyPage extends React.Component
{
	render()
	{
		// about 52k of code (after babel/webpack)
	}
}

When reloading, haveTextSimilarity calculates the Levenshtein distance on the old and new code of the render function (why?). Both strings are about 52k characters long (non-minified).

From what I read the Levenshtein distance is not O(n) but O(n^2). I did a quick benchmark with two variations of a random string, 52k characters long, with a Levenshtein distance of 1. This takes 45 seconds, run from Node.

It doesn't seem like a good idea to use such a costly function when hot reloading large pieces of code.

@atombender
Copy link
Author

atombender commented Dec 21, 2018

Indeed. I haven't had time to go back to this issue, but it's bit me several times recently, and I'd like to see it fixed.

A stupid but very effective fix would be to revert to a string equality check if the combined size of the strings exceeds a certain number. The danger of false positives is moot when you can't even get a result.

Levenshtein has different complexities depending on what algorithm is used. I doubt the one used here is O(n), not just because that's difficult to achieve, but also because clearly the performance drop we are seeing isn't linear.

The matrix method, which requires n * m bytes of space to hold the matrix, is the cheapest, as good as O(n + m), I believe.

But if you're only interested in checking if the distance exceeds a specific number, and you only need an approximation, there is a much faster algorithm, which should be O(n^(1+d)) in time, where d is the maximum distance. No idea if anyone has implemented in JS.

@kromit
Copy link

kromit commented Jan 10, 2019

I can confirm that it is not linear. I have a large form that takes 6.5 seconds to check with haveTextSimilarity. Any half of the same form is almost instantly re-rendered.

Thinking about about removing RHL entirely and go the webpack HMR way.

@theKashey
Copy link
Collaborator

Sound like it's time to reconsider this moment. I'll try to use another algo, as long as I dont need a real levenshtein here - just some (short) distance metric

@petesantos
Copy link

petesantos commented Aug 21, 2019

I am not sure if this is an issue with the algorithm or an issue with the browser. I am not seeing this behavior in Firefox. I did some profiling and it looks like Chrome is churning at this line of code in fast-levenshtein:

strCmp = str1.charCodeAt(i) === str2Char[j];

I created some snippets that reproduce the issue, but the reproduction is highly dependent on the contents of the strings being compared. I can reproduce this issue with a long string of source code, but cannot reproduce it with randomly generated strings. I also took the source strings in question and randomly re-arranged the the lines of code and the issue disappeared. I also appended random junk to the end of the strings and the issue disappeared. I made arbitrary changes to the strings (removing all instances of a given character) and the issue disappears.

One change I made that I think would be harmless is to remove all the leading whitespace on every line, then run the comparison. This resolves the issue on chrome, but I am not sure why this is. It looks like a bug in V8, but i need to better isolate the repro steps.

If you could put a workaround in for this issue, it would be appreciated.
This workaround works for me:

var haveTextSimilarity = function haveTextSimilarity(a, b) {
    return (
        // equal or slight changed
        a === b || levenshtein.get(a.replace(/\n(\s+)/g, '\n'), b.replace(/\n(\s+)/g, '\n')) < a.length * 0.2
     );
};

@theKashey
Copy link
Collaborator

theKashey commented Aug 21, 2019

That make a sense! String management in Chrome is a bit special, especially for the big strings (and script is a big string) - even substring are big - https://bugs.chromium.org/p/v8/issues/detail?id=2869

Your hack is doing a small string from a big one, thus fixing the issue.

@petesantos
Copy link

Thanks @theKashey!

Will the hack I proposed or something similar be added to the code to enable faster reloads in chrome?

@theKashey
Copy link
Collaborator

Sure. I read some stuff around the problem, as found the best way(in terms of speed first, and then in simplicity) to solve the problem -

function clearStringFast(str) {
  return str.length < 12 ? str : (' ' + str).slice(1);
}

//

a === b || levenshtein.get(clearStringFast(a), clearStringFast(b)) < a.length * 0.2

@petesantos - I would greatly appreciate if confirm that this code solves your problem as well. In that case I'll release a new version just in an hour.

@petesantos
Copy link

@theKashey Your fix does resolve the issue - thanks again!
Would you mind posting links to some of the resources you read that identify the problem?

@theKashey
Copy link
Collaborator

  1. My main source https://habr.com/ru/post/449368/ - not in English, but it has a long and detailed discussion.
  2. Adventures in the land of substrings and RegExps. guevara/read-it-later#2353 - (in English) and with pictures :)
  3. Chrome/FireFox(it's also affected) bug tracker.

@theKashey
Copy link
Collaborator

Released within 4.12.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants