Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change forEach to forEachLimit and purge cache to fix memory consumption issues #8609

Closed
wants to merge 8 commits into from
Closed

Change forEach to forEachLimit and purge cache to fix memory consumption issues #8609

wants to merge 8 commits into from

Conversation

dav-is
Copy link
Contributor

@dav-is dav-is commented Jan 9, 2019

I did some memory profiling on a large app to find the issue. The app was crashing with memory usage of around 1.4GB.

The timelines are only from the "additional chunk assets processing (91%)" stage plus or minus some time. As a note, the profiling slows down the process considerably and I started the profiling manually close to the crash, so don't pay much attention to the lengths of time shown in the timelines.

Before applying this PR

Note that it allocates 550MB all at once (this is not the heap usage, only inside the profiling scope)

beforepr

When applying the forEachLimit(15) change

I noticed the same amount of memory being allocated, except now it was adding 34MB of memory usage every frame. Note that each time it is storing _cachedSource

usingeachlimit

When setting source._cachedSource = undefined with forEachLimit

The heap memory usage went down drastically and stopped running out of memory. The _cachedSource should be removed after a file is emitted.

afterpr

Performance Concerns

I ran time on two processes.

Before PR using max_old_space_size so node didn't run out of memory: 2m9s with 1.4GB peak memory usage
After PR using standard memory limts: 2m15s without running out of memory (other processes in the build use a lot of memory so the peak usage wouldn't be a useful metric here)

Accessing Private Variables

When I spoke to @sokra about this PR he had issue with the fact that I was using private variables and I can't depend on source being a CachedSource. This doesn't matter because I'm not accessing the variable, I'm just setting it to undefined. If the source isn't a CachedSource then it will have no effect and will not cause any errors.

The alternative of this is adding a method to CachedSource to purge the cache and checking that source is an instance of CachedSource and calling the method. There's no such thing as private variables in JavaScript and webpack doesn't use TypeScript, so I don't really see a reason to do this.

If anyone has another solution, I'd be open to suggestions.

The forEachLimit limit

I set the limit to 15 and it seems reasonable, but might need to be tweaked.

Thanks to @timneutkens for his help in diagnosing this issue with me.

@jsf-clabot
Copy link

jsf-clabot commented Jan 9, 2019

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

@webpack-bot
Copy link
Contributor

For maintainers only:

  • This need to be documented (issue in webpack/webpack.js.org will be filed when merged)

@sokra
Copy link
Member

sokra commented Jan 9, 2019

You need to add the types in the declaration.d.ts file as we have no typings for neo-async

@webpack-bot
Copy link
Contributor

Thank you for your pull request! The most important CI builds succeeded, we’ll review the pull request soon.

@timneutkens
Copy link
Contributor

From testing so far it seems like this doesn't solve our issue.

@webpack-bot
Copy link
Contributor

@dav-is Thanks for your update.

I labeled the Pull Request so reviewers will review it again.

@sokra Please review the new changes.

@dav-is
Copy link
Contributor Author

dav-is commented Jan 16, 2019

@sokra I added more details to the initial post

@dav-is dav-is changed the title Change forEach to forEachLimit to fix memory consumption issues Change forEach to forEachLimit and purge cache to fix memory consumption issues Jan 16, 2019
lib/Compiler.js Outdated
@@ -340,6 +341,7 @@ class Compiler extends Tapable {
source.existsAt = targetPath;
source.emitted = true;
this.outputFileSystem.writeFile(targetPath, content, callback);
source._cachedSource = undefined;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeFile is async and potentially may fail; does it change how _cachedSource should be cleared? (for ex., only on successful write, or only after the async operation is complete)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If writeFile fails, wouldn't the whole process fail?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an assumption this piece of code should not make.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this code currently seems to assume that writeFile uses _cachedSource synchronously which may not be the case depending on writeFile implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say a single file's cache wasn't cleared, it would have the memory footprint of a single file (not a big deal).

Say a single file's cache was prematurely pruged, webpack knows how to repopulate the cache so it wouldn't lose the source (it's only a cache)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be clear is that the cache is never purged and piles up to a GB+. Removing it right after it's written is the obvious end of life for a cache element.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you think that writeFile could be asynchronous? The forEachLimit is the async part

The fact that a callback is passed to it. Is the writeFile call guaranteed to finish all its work synchronously despite receiving a callback?

What should be clear is that the cache is never purged and piles up to a GB+. Removing it right after it's written is the obvious end of life for a cache element.

Yes, I get that. That's why I'm trying to clarify when this "after it's written" event happens. Is it guaranteed to happen synchronously within the writeFile call? If so, the code is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting pretty late here (2am) I'll take another look at what you're saying in the morning 👍

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting pretty late here (2am) I'll take another look at what you're saying in the morning 👍

Yes, it's 12am on my end. Thanks for addressing the memory issue and paying attention to my comments!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sompylasar I looked into it and it makes sense to purge the cache before writeFile and I tested it and the fix still works. Thanks for your feedback 👍

lib/Compiler.js Outdated Show resolved Hide resolved
@sokra
Copy link
Member

sokra commented Jan 16, 2019

I can't easily take this change, as it would be a breaking change. While it probably improves memory/performance for you, it could affect other plugin negatively. Plugins could access the assets in the afterEmit/done hook. Purging the cache in emitAssets would make these plugins run the source generation again.

We need to put this behind an opt-in flag and enable it by default in the next major.

Instead of removing the _cachedSource it makes more sense (and is cleaner) to replace the source in assets with i. e. a SizeOnlySource (which only allows size() and throws on source()). This also has the benefit that the real Source can be GC'd and not only the cache of it. We should get rid of the existsAt and emitted hacks at the same time. I think a WeakMap<Source, { existsAt, sizeOnlySource }> plus a emittedAssets: Set<string> would work here. As removing existsAt and emitted is a breaking change, this should be moved to the next major.

@dav-is
Copy link
Contributor Author

dav-is commented Jan 17, 2019

Closing in favor of #8642

@dav-is dav-is closed this Jan 17, 2019
@dav-is dav-is deleted the change-foreach-to-foreachlimit branch January 17, 2019 18:57
filipesilva added a commit to filipesilva/angular-cli that referenced this pull request Jan 22, 2019
filipesilva added a commit to filipesilva/angular-cli that referenced this pull request Jan 22, 2019
mgechev pushed a commit to angular/angular-cli that referenced this pull request Jan 22, 2019
timneutkens pushed a commit to vercel/next.js that referenced this pull request Jan 24, 2019
ikjelle pushed a commit to ikjelle/angular-cli that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants