Manage memory issues by introducing --maxWorkerJobs #9070

antonoberg · 2019-10-21T14:21:55Z

🚀 Feature Proposal TLDR;

One way to manage a lot of the reported memory issues could be to add a new parameter --maxWorkerJobs=x that forces the tests to be run with a worker pool and respawn a new worker in a new process after x jobs are completed.

I can provide a PR for this if you believe it's a good idea. Before I provide that PR, I would love to get feedback on the idea.

BACKGROUND

We have experienced memory issues in tests that import a larger internal library. Similar to the issues described in: #8832

When using jest 24.9.0 (and latest master), running integration tests in band results in an increasing heap size and failing tests in our CI. Also when running the tests locally in watch mode it results in an out-of-memory crash after a while.

When running the tests suite from #8832 in watch mode the heap size is increased by each iteration until it is stabilized around 1000 mb. However with our internal library it does not stabilize, but results in a memory crash. Our internal library might potentially do some questionable things, but I believe the memory issues come from importing other libs that are known for causing memory issues in jest tests.

When we downgrade jest one version at the time we get rid of the issue at jest 23.6.0. When running #8832 tests in watch mode for 23.6.0, the heap size is reset between each iteration.

Comparing Jest 24.9.0 and 23.6.0

Second run of #8832 test suite in watch mode with jest 24.9.0

Second run of #8832 test suite in watch mode with jest 23.6.0

We can see that in 24.9.0 heap size starts from 517 mb while in 23.6.0 the heap size start from 58 MB (we can also see that in 24.9.0 the run is much faster).

The difference is in the logic for using worker pool or not

We went through the commits between 23.6.0 and 24.0.0 and found that the commit that introduced this issue is: 3c7b110

This commit modifies the logic that determines if the worker pool is to be used or not. We evaluate how this affected how the tests were run by logging the PID. Below you can see a modified version of a single test from the #8832 test suite where the PID is logged.

Before `3c7b110`:

Iteration 1:

Iteration 2:

Note that the PID is different

After `3c7b110`:

Iteration 1:

Iteration 2:

Note that the PID is the same.

What this commit does is that it causes our tests to be run without the worker pool. Before this commit, when running in watch mode a new worker in a new process was created on each new iteration and therefore prevented the heap to get too big (in most cases).

Proposed solution

We can use the fact that creating a new process for sure releases allocated memory.

We can introduce a new flag --maxWorkerJobs=x that does 2 things:

It forces the tests to use the worker pool.
It kills a worker process and spawns a new after a worker has performed x jobs.

This might not be a proper solution to the memory issues but a way to manage them. Since the worker pool already respawns workers if they die, the implementation should be fairly easy.

Below follows a proof of concept (passing the maxWorkerJobs prop is not included, also here is only the change in ChildProcessWorker and not in NodeThreadsWorker):

1. Forces the tests to use the worker pool.

2. Kills a worker after x jobs

Now running the tests with --maxWorkerJobs=5 results in the following heap behaviour:

As you can see the heap size drops after 5 tests (compare to the screenshots in #8832).

This will prevent the tests from crashing when running in watch mode and it will also prevent the heap from growing too big in general.

Both running with the worker pool and restarting workers will add a time overhead so this should definitely be an opt-in behavior. If you don't have tests that import something that results in memory issues you should not use this option. However, I believe that if you are experiencing memory issues, then setting this limit (not lower than you need) will be a lot better compared to having memory crashes.

What do you think? Is this a good idea?

The text was updated successfully, but these errors were encountered:

jeysal · 2019-10-21T20:28:04Z

Thanks for the suggestion and detailed writeup!
I'm not entirely comfortable with making it so easy for people to work around leaks, and then having to maintain this feature.
Idea: What if there was a signal/way to tell the worker farm from inside the worker to not reuse this worker? (Maybe there already is one, although I can't spontaneously think of a loophole allowing you to do that)
Then a custom OneShotEnvironment test environment could do that in its teardown.

grantila · 2019-10-22T07:23:07Z

@jeysal please describe how you define a leak.

Note that we're not talking about code (when run normally - not through jest) that leaks in the sense that while running it is eating more and more memory, but rather code that expects to run in a traditional Node.js environment and not be re-run / re-require'd and not be subject to require cache manipulations, etc.

A package that works perfectly fine for everyone, should not have to be "jest compatible".

antonoberg · 2019-10-22T08:26:12Z

@jeysal thanks for you reply. I think i cant use teardown but I can potentially in a custom environment patch process.send and use that to kill the worker. If I combine that with keeping track of how many runs each worker have done in temporary files with names that includes the pid and that is removed in in some global cleanup I think I can achieve what i want - reset a worker after x runs - without having to modify jest. Is it ideal? Likely not but might do the trick. I have a proof of concept that is working.

However I think the concern that @grantila raised is valid.

jeysal · 2019-10-22T10:33:29Z

please describe how you define a leak.

In this context, anything that requires a workaround like restarting workers after a few test files.

jest compatible

I don't think this is a good term for this property - Jest is not the only module that requires things multiple times (and it is indeed one that I think has exceptionally good reasons to do it). You can do it with Node's module APIs directly.

Anyway, this topic does not seem relevant to this thread, because it's about avoiding memory issues in the first place, not about a workaround for when they occur.

jeysal · 2019-10-22T10:35:27Z

use that to kill the worker

I don't think jest-worker will like that. I believe that this will require some changes, but only to jest-worker.

in temporary files with names that includes the pid

You shouldn't need to do that - in a test environment, you can access the actual global of the worker and use that to store this information.

antonoberg · 2019-10-23T07:49:07Z

Thanks!

Putting the following in a custom test environment seams to work fine:

I have not experienced any objections from the jest-worker.

As long as the exit code differs from 0, SIGTERM_EXIT_CODE and SIGKILL_EXIT_CODE it will start a new worker.

jest-worker/src/workers/ChildProcessWorker

This results in the following:

What I would need is the possibility to force jest to use the worker pool. @jeysal - What do you think about introducing a parameter like --forceWorkerPool or --minWorkers or something similar that is used in jest-core/src/testSchedulerHelper to force the tests to not run in band but to always use the worker pool.

grantila · 2019-10-24T14:24:15Z

If module magic is necessary, a safe solution would be to not re-use the same process ever, but to always create a new process for each file. Run-in-band would then mean to have parallelism 1 while the default could be cpus().length or whatever.

For performance reasons, jest could make sure to pre-spawn a few processes which can quickly be used as other test files have finished.

For Node 12 and later, worker_threads could be used which spawns a bit faster.

jeysal · 2019-10-28T00:34:02Z

@antonoberg Awesome! Maybe an explicit --runInBand=false could be used to force workers? Would also help with some hacks we did :D

Run-in-band would then mean to have parallelism 1

runInBand has other benefits (e.g. debugging), we don't want to replace it with a simple maxWorkers=1

For performance reasons, jest could make sure to pre-spawn a few processes which can quickly be used as other test files have finished.

I dont' see how that could improve performance, best case it's the same (just more spawning at the beginning and more test execution at the end), worst case you spawn processes you'll never need.

For Node 12 and later, worker_threads could be used which spawns a bit faster.

There is an implementation of this but it's not currently enabled because there were issues

antonoberg · 2019-11-04T13:23:50Z

We have solved it on our end so no need for the --runInBand=false flag. Thanks again for your attention @jeysal! I close this issue now.

klippx · 2020-03-10T09:45:50Z

What happened in the end, what was your fix?

github-actions · 2021-05-11T12:06:03Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

antonoberg added the 🚀 Feature Request label Oct 21, 2019

antonoberg closed this as completed Nov 4, 2019

github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage memory issues by introducing --maxWorkerJobs #9070

Manage memory issues by introducing --maxWorkerJobs #9070

antonoberg commented Oct 21, 2019

jeysal commented Oct 21, 2019

grantila commented Oct 22, 2019

antonoberg commented Oct 22, 2019

jeysal commented Oct 22, 2019

jeysal commented Oct 22, 2019

antonoberg commented Oct 23, 2019

grantila commented Oct 24, 2019

jeysal commented Oct 28, 2019

antonoberg commented Nov 4, 2019

klippx commented Mar 10, 2020

github-actions bot commented May 11, 2021

Manage memory issues by introducing --maxWorkerJobs #9070

Manage memory issues by introducing --maxWorkerJobs #9070

Comments

antonoberg commented Oct 21, 2019

🚀 Feature Proposal TLDR;

BACKGROUND

Comparing Jest 24.9.0 and 23.6.0

Second run of #8832 test suite in watch mode with jest 24.9.0

Second run of #8832 test suite in watch mode with jest 23.6.0

The difference is in the logic for using worker pool or not

Before 3c7b110:

Iteration 1:

Iteration 2:

After 3c7b110:

Iteration 1:

Iteration 2:

Proposed solution

1. Forces the tests to use the worker pool.

2. Kills a worker after x jobs

jeysal commented Oct 21, 2019

grantila commented Oct 22, 2019

antonoberg commented Oct 22, 2019

jeysal commented Oct 22, 2019

jeysal commented Oct 22, 2019

antonoberg commented Oct 23, 2019

grantila commented Oct 24, 2019

jeysal commented Oct 28, 2019

antonoberg commented Nov 4, 2019

klippx commented Mar 10, 2020

github-actions bot commented May 11, 2021

Before `3c7b110`:

After `3c7b110`: