Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: Pre-merge global elements at normalization time #99

Open
matwilko opened this issue Jul 13, 2023 · 9 comments
Open

Perf: Pre-merge global elements at normalization time #99

matwilko opened this issue Jul 13, 2023 · 9 comments

Comments

@matwilko
Copy link

Since global elements will always end up being applied to every file (if they are going to be processed at all), then all the global elements { !files } are going to need to be merged again and again for every file config fetched (that's not cached).

Rather than repeat this for potentially every file, I'd propose that during the normalization process, all global elements be pre-merged backwards through the array and eliminated.

This obviously has the benefit of each individual file having to process fewer array elements when its config is fetched, and the number of merge operations therefore required. And depending on the merge function of individual schemas, it could significantly reduce the work they have to do overall if the global elements apply things that reduce the amount of processing required.

It does have some upfront cost, but we're using a O(n) operation once at normalisation time to reduce the n in a O(n * fileCount) at getConfig time.

This process could also be used to gather all the global ignores and eliminate those elements as well, doing the work that this.ignores usually has to do for "free".

So, to be more concrete, here's some very quick and dirty example code of what this might look like:

function preMergeGlobals(configs) {
    const [...preMergedConfigs, globalIgnores] = [...preMergeGlobalsInner(configs.reverse())];
    this.length = 0;
    this.push(...preMergedConfigs.reverse());
    
    // TODO: Figure out where to put the global ignores so that `this.ignores` doesn't have to traverse the array
}

function* preMergeGlobalsInner(reversedConfigs) {
    const globalIgnores = [];
    let currentGlobalToMerge = undefined;
    for (const config in reversedConfigs) {

        if (!config.files && !config.ignores) {
            currentGlobalToMerge = currentGlobalToMerge
                ? this[ConfigArraySymbol.schema].merge(config, currentGlobalToMerge) // We're iterating backwards, so apply the earlier config on top of this one
                : config;
        } else if (config.ignores && Object.keys(config).length === 1) {
            globalIgnores.push(config.ignores);
        } else {
            yield currentGlobalToMerge
                ? this[ConfigArraySymbol.schema].merge(config, currentGlobalToMerge) // We're iterating backwards, so apply the earlier config on top of this one
                : config;
        }
    }

    yield { ignores: globalIgnores.reverse().flat() };
}

Happy to put together a PR with properly thought out code for this if you think it's worth implementing.

@nzakas
Copy link
Contributor

nzakas commented Jul 13, 2023

I looked into this initially and it's more complicated than it seems. Even though global configs are all merged together for every file, the order in which they appear matters because you might have a global config followed by a config with files followed by a global config, and depending on if files matches, it could change the result because that second config is applying changes first and then the last global config needs to apply its changes on top of that. So, A+B+C now vs. A+C+B if we pre-merged the global configs.

We could definitely pre-merge global configs that are next to each other, though.

@matwilko
Copy link
Author

matwilko commented Jul 13, 2023

Sure, I saw that scenario as well, and the code I demo-ed above does actually account for that 😄

As we go down the array, any global configs are merged together, but that merged global is then merged into any subsequent non-global element in the array - the globals don't "skip" any of the elements.

In effect, what we're trying to accomplish is:

  • Extract and remove all { ignores: [..] } elements
  • While there are any global config elements remaining, find the last one, and merge it into every preceding element, then remove it
  • Insert a new global ignores element at the beginning of the array with the merged global ignores

That of course is extremely inefficient (quadratic at least). The algorithm I described above is linear, but I believe accomplishes the same effect.

To work through an example step by step:

[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    { rules: { a: 1 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
    { rules: { b: 2, c: 1 } },
    { rules: { c: 2 } },
    { files: { a: 2 } },
    { ignores: ['node_modules', 'dist'] }
]
//----------
[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    { rules: { a: 1 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
    { rules: { b: 2, c: 1 } },
    { rules: { c: 2 } },
    { files: { a: 2 } },
    //--{ ignores: ['node_modules', 'dist'] }-- //<--- global ignores, remove and add to globalIgnores, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = undefined;
]
//----------
[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    { rules: { a: 1 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
    { rules: { b: 2, c: 1 } },
    { rules: { c: 2 } },
    { files: { a: 2 } } //<--- not global, keep with no merge as !currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = undefined;
]
//----------
[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    { rules: { a: 1 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
    { rules: { b: 2, c: 1 } },
    //--{ rules: { c: 2 } },-- //<--- global, remove and merge into currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { c: 2 } };
    { files: { a: 2 } }
]
//----------
[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    { rules: { a: 1 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
    //--{ rules: { b: 2, c: 1 } },-- //<--- global, remove and merge into currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { b: 2, c: 2 } };
    { files: { a: 2 } }
]
//----------
[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    { rules: { a: 1 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } }, //<--- not global, keep and merge currentGlobalToMerge in, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { b: 2, c: 2 } };
    { files: { a: 2 } }
]
//----------
[
    { files: ['**/*.js'] },
    { ignores: ['.git'] },
    //--{ rules: { a: 1 } },-- //<--- global, remove and merge into currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { a: 1, b: 2, c: 2 } };
    { files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
    { files: { a: 2 } }
]
//----------
[
    { files: ['**/*.js'] },
    //--{ ignores: ['.git'] },-- //<--- global ignores, remove and add to globalIgnores, globalIgnores = [['node_modules', 'dist'], ['.git']], currentGlobalToMerge = { rules: { a: 1, b: 2, c: 2 } };
    { files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
    { files: { a: 2 } }
]
//----------
[
    { files: ['**/*.js'], rules: { a: 1, b: 2, c: 2 } }, //<--- not global, keep and merge currentGlobalToMerge in, globalIgnores = [['node_modules', 'dist'], ['.git']], currentGlobalToMerge = { rules: { a: 1, b: 2, c: 2 } };
    { files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
    { files: { a: 2 } }
]
//----------
[
    { ignores: ['.git', 'node_modules', 'dist'] }, //<-- insert the merged global ignores in their correct order
    { files: ['**/*.js'], rules: { a: 1, b: 2, c: 2 } },
    { files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
    { files: { a: 2 } }
]
//----------

Hopefully that all makes sense!

@nzakas
Copy link
Contributor

nzakas commented Jul 13, 2023

Ah gotcha -- lots of issues to go through so it's tough to get deep into the details on each one.

I think I follow what you're proposing. My concern is that we are introducing complexity for a theoretical performance advantage, and those tradeoffs are usually tricky.

If you're up for creating a prototype to see what kind of perf advantage we might see, I'd be happy to evaluate it.

@matwilko
Copy link
Author

matwilko commented Aug 7, 2023

TLDR: We get at least a 60% speedup with this approach 😄

OK, so, I've thrown together a prototype that merges the globals as described, and at the same time calculates the global ignores, get files(), and pre-calculates the universal/non-universal patterns arrays for each element with files in it.

Obviously there's some work that could be done to make this a bit nicer, and a lot of ifs/ternarys can go away if all the optimizations are taken, but hopefully the idea is clear enough.

I've confirmed all tests are passing.

I've benchmarked against two sets of data, both derived from the ESLint codebase. First, the set of files that are linted by ESLint in the ESLint codebase (derived by basically console.log'ing in the right spot during file enumeration), and second, a full file listing of every file in the ESLint codebase (excluding node_modules). The benchmarks below show the total time to getConfig() all of the files in the respective lists, so it's a reasonable estimate for the amount of time taken during a full ESLint run in resolving file configurations. The ESLint (#) line is stock ESLint as loaded from npm for reference.

Platform info:
==============
   Windows_NT 10.0.19044 x64
   Node.JS: 18.16.0
   V8: 10.2.154.26-node.26
   CPU: AMD Ryzen 5 Microsoft Surface (R) Edition × 12
   Memory: 7 GB

Suite: Files linted by eslint
   ESLint (#)                                                                                   0%              (9 rps)   (avg: 109ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: false, universalGlobs: false       +73.12%             (16 rps)   (avg: 63ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: false, universalGlobs: true        +75.82%             (16 rps)   (avg: 62ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: true, universalGlobs: false        +74.38%             (16 rps)   (avg: 63ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: true, universalGlobs: true         +74.73%             (16 rps)   (avg: 62ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: false, universalGlobs: false        +82.68%             (17 rps)   (avg: 60ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: false, universalGlobs: true          +84.7%             (17 rps)   (avg: 59ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: true, universalGlobs: false         +79.21%             (16 rps)   (avg: 61ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: true, universalGlobs: true          +84.42%             (17 rps)   (avg: 59ms)
-----------------------------------------------------------------------

Suite: All files in eslint dir
   ESLint (#)                                                                                   0%              (8 rps)   (avg: 122ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: false, universalGlobs: false       +59.17%             (13 rps)   (avg: 76ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: false, universalGlobs: true        +61.71%             (13 rps)   (avg: 75ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: true, universalGlobs: false        +59.84%             (13 rps)   (avg: 76ms)
   mergeGlobals: true, globalIgnores: false, fileGlobs: true, universalGlobs: true         +60.31%             (13 rps)   (avg: 76ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: false, universalGlobs: false        +65.53%             (14 rps)   (avg: 73ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: false, universalGlobs: true         +68.43%             (14 rps)   (avg: 72ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: true, universalGlobs: false         +66.75%             (14 rps)   (avg: 73ms)
   mergeGlobals: true, globalIgnores: true, fileGlobs: true, universalGlobs: true          +68.95%             (14 rps)   (avg: 72ms)
-----------------------------------------------------------------------

So, for files that we know have a configuration (i.e. case 1), we're seeing somewhere between a 75-85% speedup in config resolution (and this is with all the regular caching in place).

For all files, including a huge number that are to be ignored (and might not even be sent through FlatConfigArray if ESLint's file enumeration already excluded them), we're still seeing between 60-68% speedup.

Also, I know we're talking about ~50ms on the ESLint codebase here, but every little helps with performance stuff, and for larger codebases, the absolute number would probably grow very quickly (ESLint is very conservative with the number of files it seems to use - many codebases I've seen would split the same code into a much larger set of files!)

I'm happy to turn this into a proper implementation if you'd like to move forward.

(I should note that the code on the branch is not usable as-is, as it's been written on my employer's time/resources, and I don't have authorization for it to be licensed for OSS use, so please don't move forward with it directly. I'll have to go through some internal processes to get the right approvals to implement this properly and allow the release/licensing of any code 😄)

@nzakas
Copy link
Contributor

nzakas commented Aug 9, 2023

Interesting, thanks for doing that. It's definitely worth exploring further.

Can you share the absolute numbers for each run? That would help put these results in context.

@matwilko
Copy link
Author

matwilko commented Aug 9, 2023

The total millisecond times are at the end of the line if you scroll to the right - or is there a different measure you'd like to see?

@nzakas
Copy link
Contributor

nzakas commented Aug 11, 2023

Ah sorry, I was misreading.

What I was looking for was a before/after comparison. If I'm reading this correctly, it seems like all of the examples are pre-merging configs? I'd like to see that side by side with the current version's numbers.

@matwilko
Copy link
Author

Yep, the ESLint (#) line at the top of both runs is the reference, it's testing against FlatConfigArray directly from stock ESLint (i.e. require-ed directly from node_modules) 😄

The other lines use a local checkout of eslint with a small update to FlatConfigArray to allow passing through the optimization params.

@nzakas
Copy link
Contributor

nzakas commented Aug 14, 2023

Ah! Thanks. I totally missed that there was a number all the way over on the right for that line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants