Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch mode performance issue in seal process #8126

Closed
MLoughry opened this issue Oct 2, 2018 · 19 comments
Closed

Watch mode performance issue in seal process #8126

MLoughry opened this issue Oct 2, 2018 · 19 comments

Comments

@MLoughry
Copy link
Contributor

MLoughry commented Oct 2, 2018

Feature request

What is the expected behavior?
Watch mode rebuilds would re-use, as much as feasible, the chunk graph from the previous rebuild, to optimize build speed

What is motivation or use case for adding/changing the behavior?
Our project has >12k modules, and using the latest webpack (without monkey-patching alluded to in #7546) results in 75-90s rebuild times. Earlier 4.x builds would rebuild in 15-20s.

When profiling a run where the only change I make to trigger a rebuild is adding a newline to a code file,
79% of time is spent in Compilation.seal(), and another 11% in garbage collection. Compilation.processDependenciesBlocksForChunkGroups() alone counts for 76% of the rebuild time (including GC).

How should this be implemented in your opinion?
I have no idea; I'd need guidance from @sokra or someone else more familiar with this area. I looked at simply optimizing processDependenciesBlocksForChunkGroups, but couldn't get a good enough understanding of the code to make any meaningful change. I'm also not sure how a rebuild Compilation re-uses data from a previous compilation, so I had no idea how to even begin investigating that route.

Are you willing to work on this yourself?
Yes; I've been given a couple dev weeks to work on our team's build performance. However, I don't think I can make any meaningful progress without some guidance.

@MLoughry
Copy link
Contributor Author

MLoughry commented Oct 2, 2018

I've hosted the CPU profiles for the above stats @ https://github.com/MLoughry/webpack-8126-profiles

@sokra
Copy link
Member

sokra commented Oct 3, 2018

image

Could you remove the breakpoints while profiling?

@MLoughry
Copy link
Contributor Author

MLoughry commented Oct 3, 2018

🤦 Sorry about that. I've pushed new profiles without breakpoints.

The regressions is much less dire, but the areas for improvement remain the same. In a 21.7s rebuild, 11.7s is spent in processDependenciesBlocksForChunkGroups (including GC)doing work that is unneeded (in this particular rebuild, and likely in many/most rebuilds)

@sokra
Copy link
Member

sokra commented Oct 4, 2018

Thanks.

I think the slowness is related to this piece of code:

webpack/lib/Compilation.js

Lines 1828 to 1845 in 4c461e2

// 1. Get minimal available modules
// It doesn't make sense to traverse a chunk again with more available modules.
// This step calculates the minimal available modules and skips traversal when
// the list didn't shrink.
let minAvailableModules = minAvailableModulesMap.get(chunkGroup);
if (minAvailableModules === undefined) {
minAvailableModulesMap.set(chunkGroup, new Set(availableModules));
} else {
let deletedModules = false;
for (const m of minAvailableModules) {
if (!availableModules.has(m)) {
minAvailableModules.delete(m);
deletedModules = true;
}
}
if (!deletedModules) continue;
availableModules = minAvailableModules;
}

or this

const filteredDeps = deps.filter(filterFn);

The algorithm tracks paths through the ChunkGroups of the application and finds the minimal modules available at each ChunkGroup. Depending on the number of possible paths it can be expensive.

The first snippet is entered for each ChunkGroup with the availableModules from parent. When no information is available for the entered ChunkGroup the list is copied. If there is already some information available we calculate the intersection for the available modules. In your profile the first part seem to be responsible for 2.5s and the second part maybe for 7.5s. A low hanging fruit would be: In step 8 add a needCopy flag witch is false when nextChunkGroups.size === 1 and avoid to copy the list in step 1.

Or the second snippet is responsible for the 7.5s (the areModulesAvailable in filterFn). It tries to remove edges in the graph where all modules are already available. A low hanging fruit would be: newAvailableModules can only shrink for an egde. So when an egde is not filtered it will never be filtered. We can store this i. e. in the edge and skip calculation next time.

Another easy idea is to merge step 4 and 5 and filter deps while iterating. There is not really a need to create a new array with filter. GC is pretty heavy here so avoiding to create objects could help.

Let's start with measuring to figure out if these snippets are nearly responsible for the slowness. The low hanging fruits can be tackled anyway (send a PR).

@MLoughry
Copy link
Contributor Author

MLoughry commented Oct 4, 2018

I continue to find boneheaded mistakes on my part. Apparently, I was profiling against a version of webpack from May, when I last contributed, rather than the most recent. I've re-profiled, and updated the repo.

I've also made changes as I believe you intended, but they're not leading to perf gains as expected. It's entirely probable I misunderstood your suggestsions. Either way, the profiles for those are posted to the same repo (appended with -changes)

I've opened #8143 for conversation and comments. From my limited profiling, I'm not seeing any improvements. I'll try running more rigorous benchmarks tomorrow, but I wanted to get something back to you before EU Friday.

@sokra
Copy link
Member

sokra commented Oct 5, 2018

  • Use the lastest node.js version
  • Use the lastest webpack version
  • Try node --max-old-space-size=8192 (maybe you are at the memory limit and GC get slow there)

If that doesn't help you could try to make queue2 a priority queue and process the shortest available modules list first. This should reduce the number of loops needed.

Maybe also track some statistics the the algorithm, like the number of inputChunkGroups, the number of loops and how often the loop is exited here:

if (!deletedModules) continue;


Technically we could leave the loop before step 2:

// 2. Get the edges at this point of the graph

when the same chunkGroup is already queued up again. Maybe track the count how often a chunkGroup is queued up and exit early when count > 0.

@MLoughry
Copy link
Contributor Author

Sorry, I got sidetracked on other projects, but I'm looping back around to this.

Maybe also track some statistics the the algorithm, like the number of inputChunkGroups, the number of loops and how often the loop is exited here:

I have the stats, but I'm not sure how to interpret them. In our build, on an arbitrary recent commit, we have 5 inputChunkGroups, and do 9027 total loops in the "Part 2" while loop. 8563 (or ~95%) of the loops exit out in the deleteModules check.

I implemented a priority queue for your first suggestion in your last post @ https://github.com/MLoughry/webpack/tree/priority-queue. It seems to result in a minor perf gain, but pales in comparison to the monkey patch I mention in #7546.

Running a benchmark of 20 rebuilds:

test run avg
webpack@4.21.0 6.78s
priority-queue branch 6.44s
webpack@4.21.0 with monkey-patch 4.82s

However, I either don't fully understand your second suggestion, or it results in a few snapshot test failures.

test/StatsTestCases.test.js
  ● StatsTestCases › should print correct stats for module-deduplication-named

    expect(value).toMatchSnapshot()

    Received value does not match stored snapshot "StatsTestCases should print correct stats for module-deduplication-named 1".

    - Snapshot
    + Received

    @@ -1,11 +1,11 @@
      "    Asset       Size  Chunks             Chunk Names
    -     e1.js   9.28 KiB       0  [emitted]  e1
    +     e1.js   9.26 KiB       0  [emitted]  e1
          e2.js    9.3 KiB       1  [emitted]  e2
    -     e3.js   9.32 KiB       2  [emitted]  e3
    - async1.js  820 bytes       3  [emitted]  async1
    - async2.js  820 bytes       4  [emitted]  async2
    +     e3.js   9.31 KiB       2  [emitted]  e3
    + async1.js  803 bytes       3  [emitted]  async1
    + async2.js  753 bytes       4  [emitted]  async2
      async3.js  820 bytes       5  [emitted]  async3
      Entrypoint e1 = e1.js
      Entrypoint e2 = e2.js
      Entrypoint e3 = e3.js
      chunk    {0} e1.js (e1) 144 bytes >{3}< [entry] [rendered]
    @@ -15,23 +15,22 @@
       [5] ./e1.js 108 bytes {0} [built]
       [6] ./c.js 9 bytes {0} [built]
      chunk    {1} e2.js (e2) 144 bytes >{4}< [entry] [rendered]
       [0] ./b.js 9 bytes {0} {1} {2} [built]
       [1] ./a.js 9 bytes {0} {1} {2} [built]
    -  [3] ./f.js 9 bytes {1} {4} [built]
    +  [4] ./f.js 9 bytes {1} [built]
       [7] ./e2.js 108 bytes {1} [built]
       [8] ./e.js 9 bytes {1} [built]
      chunk    {2} e3.js (e3) 144 bytes >{5}< [entry] [rendered]
        [0] ./b.js 9 bytes {0} {1} {2} [built]
        [1] ./a.js 9 bytes {0} {1} {2} [built]
    -   [4] ./h.js 9 bytes {2} {5} [built]
    +   [3] ./h.js 9 bytes {2} {5} [built]
        [9] ./e3.js 108 bytes {2} [built]
       [10] ./g.js 9 bytes {2} [built]
    - chunk    {3} async1.js (async1) 89 bytes <{0}> <{5}> >{4}< [rendered]
    + chunk    {3} async1.js (async1) 89 bytes <{0}> <{5}> [rendered]
        [2] ./d.js 9 bytes {0} {3} [built]
       [11] ./async1.js 80 bytes {3} [built]
    - chunk    {4} async2.js (async2) 89 bytes <{1}> <{3}> >{5}< [rendered]
    -   [3] ./f.js 9 bytes {1} {4} [built]
    + chunk    {4} async2.js (async2) 80 bytes <{1}> >{5}< [rendered]
       [12] ./async2.js 80 bytes {4} [built]
      chunk    {5} async3.js (async3) 89 bytes <{2}> <{4}> >{3}< [rendered]
    -   [4] ./h.js 9 bytes {2} {5} [built]
    +   [3] ./h.js 9 bytes {2} {5} [built]
       [13] ./async3.js 80 bytes {5} [built]"

@sokra
Copy link
Member

sokra commented Oct 18, 2018

hmm... I need to look into that...

@sokra
Copy link
Member

sokra commented Oct 18, 2018

webpack@4.21.0 with monkey-patch

btw. this is not valid solution since it's broken. It's shortcutting part of the calculation because it's accidentally modifying the array.

@sokra
Copy link
Member

sokra commented Oct 18, 2018

hmm... I tried your branch, but it completes all StatsCases just fine.

I'll create a PR from your branch to let the CI take a look.

@sokra
Copy link
Member

sokra commented Oct 18, 2018

#8238

@MLoughry
Copy link
Contributor Author

The priority-queue branch doesn't include the second suggestion (which caused the failure). I hadn't committed those changes (and they're currently sitting on my home computer while I'm at work).

I agree the monkey-patch is not a valid general-purpose solution, as the change you made that my monkey patch undoes was to fix a legitimate bug. However, if a compilation never had that original issue (as ours doesn't), then the "incorrectness" is irrelevant.

@sokra
Copy link
Member

sokra commented Oct 19, 2018

Could you try this branch: https://github.com/webpack/webpack/tree/perf/chunk-graph

@ChristianIvicevic
Copy link

I am having performance issues with this as well since webpack is going through all unrelated files as well described in #9020.

@SleepWalker
Copy link

I've found interesting thing. If I use babel-plugin-dynamic-import-node-babel-7, that replaces all imports with require, the execution time of processDependenciesBlocksForChunkGroups drops from 3-4s to 500ms (the most of the work performed in while (queue2.length) { loop).

As a result, I've decided to use that plugin for both node and browser targets in dev mode for faster builds (1.5-2s incremental build). There are chances, that it's something with my webpack configuration, but I can't imagine which setting can slowdown imports processing.

@webpack-bot
Copy link
Contributor

This issue had no activity for at least half a year.

It's subject to automatic issue closing if there is no activity in the next 15 days.

@alexander-akait
Copy link
Member

Should be fixed #8242, anyway if you still have problem with perf please open new issue with examples, thanks

@sibelius
Copy link

is this fixed for you @SleepWalker?

@SleepWalker
Copy link

@sibelius, sorry, I can't recheck it. I have left that project more than a half year ago and have no access to it any more :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants