Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation failure when running lots of tests with --runInBand in limited env (Travis) #2179

Closed
davidtheclark opened this issue Nov 29, 2016 · 38 comments

Comments

@davidtheclark
Copy link

This is a specific and concrete problem (I think) related to the performance concerns I voiced in #2171.

There's a recommendation in #1742 and #2162 to use --runInBand when running tests on Travis (or, presumably, other slower environments), in order to avoid memory problems that cause slowdowns and failures. When I use --runInBand on a ton of tests (for stylelint), I do avoid the problems I have without that flag; however, I just end up running into another failure. Apparently Jest consumes all the memory and crashes:

 PASS  src/rules/time-no-imperceptible/__tests__/index.js
<--- Last few GCs --->
  115688 ms: Scavenge 1389.0 (1458.0) -> 1389.0 (1458.0) MB, 0.5 / 0 ms (+ 1.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  117175 ms: Mark-sweep 1389.0 (1458.0) -> 1388.9 (1458.0) MB, 1486.6 / 0 ms (+ 2.4 ms in 2 steps since start of marking, biggest step 1.4 ms) [last resort gc].
  118452 ms: Mark-sweep 1388.9 (1458.0) -> 1344.5 (1458.0) MB, 1277.1 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x3c92d50b4629 <JS Object>
    2: /* anonymous */(aka /* anonymous */) [/home/travis/build/stylelint/stylelint/node_modules/esprima/esprima.js:35] [pc=0x2c59275e252] (this=0x3c92d50041b9 <undefined>,root=0x8ad703dc279 <an Object with map 0x2df09a5073e1>,factory=0x8ad703de6c9 <JS Function (SharedFunctionInfo 0x15effa467661)>)
    3: /* anonymous */ [/home/travis/build/stylelint/stylelint/node_modules/esprima/esprima.js:39...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)
@cpojer
Copy link
Member

cpojer commented Nov 29, 2016

Hey! I'm sorry I'm just getting back from a vacation and am pretty swamped. Please note that all of this is on top of my mind and I'll get back to it soon but it may take a bit more than a week until I can dedicate real time for this. I'm sure we can fix most of these issues.

@davidtheclark
Copy link
Author

@cpojer 👍 I'm just trying to document what I can while it's on my mind. Good luck catching up after vacation!

@cpojer
Copy link
Member

cpojer commented Nov 29, 2016

Thanks! I really appreciate that you are doing this – if you can find any significant problems inside of Jest that we can turn into a PR, that would definitely make this process faster.

Jest doesn't do well on resource capped machines (low memory/low cpu/slow disk access) and things like code coverage generate a ton of data. Generally we don't have the same kind of issues you are experiencing with Facebook's projects that also use Travis, so we'll have to figure out what's going on.

@AdrienFromToulouse
Copy link

AdrienFromToulouse commented Nov 29, 2016

Hi, @cpojer

I just faced the same kind of issue with a slow test run with Jest. I did a repo to illustrate the use case: https://github.com/AdrienFromToulouse/docker-nodejs-mysql-tester/tree/master
One branch master is running Jest, the mocha branch... mocha.

Tested on my recent Mac Book Pro locally but not on any CI environment yet.

comments:

  • the Jest version is slower and does not exit automatically after test is completed.
  • Mocha version is way faster, and container exit after test is completed.

I may have done something wrong tho.

P.S: it runs a mysql server so during the very first yarn test, there is a tempo in the docker-compose file in order to wait for the server to be mounted. Subsequent calls to yarn test will be faster.

Hope it helps :)
Cheers!

@damonbauer
Copy link

damonbauer commented Nov 29, 2016

I see similar things in Travis:

<--- Last few GCs --->
  490987 ms: Mark-sweep 1272.7 (1457.8) -> 1272.7 (1457.8) MB, 3018.6 / 0 ms [last resort gc].
  490990 ms: Scavenge 1272.8 (1457.8) -> 1272.8 (1457.8) MB, 3.2 / 0 ms [allocation failure].
  490992 ms: Scavenge 1272.8 (1457.8) -> 1272.8 (1457.8) MB, 2.2 / 0 ms [allocation failure].
  494057 ms: Mark-sweep 1272.8 (1457.8) -> 1272.7 (1457.8) MB, 3065.1 / 0 ms [last resort gc].
  497047 ms: Mark-sweep 1272.7 (1457.8) -> 1272.7 (1457.8) MB, 2989.6 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0xf80ab5b4629 <JS Object>
    1: Join(aka Join) [native array.js:154] [pc=0x2abf92213938] (this=0xf80ab5041b9 <undefined>,o=0x34eb19ee5ff9 <JS Array[7]>,v=7,C=0xf80ab583201 <String[2]: , >,B=0x13a88c992ef9 <JS Function ConvertToString (SharedFunctionInfo 0xf80ab552599)>)
    2: InnerArrayJoin(aka InnerArrayJoin) [native array.js:331] [pc=0x2abf922127aa] (this=0xf80ab5041b9 <undefined>,C=0xf80ab583201 <String[2]: , >,o=0x...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

This log comes from a test that uses mount from enzyme, and react-intl, so it's got a decent amount of DOM assertions, e.g.:

it('should show the VIP header when isVip is true', function () {
  let props = Object.assign({}, DEFAULT_PROPS, {isVip: true});
  element = mountWithIntl(<WelcomeVoucherModal {...props} />);
  let header = element.ref('header');
  expect(header.find({id: 'settings.welcomeVipPlusMember'})).toBeTruthy();
});

I've got ~1800 tests running and it this log always appears during the last two tests. Here's the command I'm running (I had to add -w='2' instead of -i because it would run out of memory really quickly... messing around with different values for -w was the only way I could get Travis to get through all of my tests):

jest -w='2' --no-cache --coverage

@thymikee
Copy link
Collaborator

May be related: #1893

@damonbauer
Copy link

@thymikee - it very well could be related. My project is using babel-polyfill.

@jeddy3
Copy link

jeddy3 commented Dec 3, 2016

Please note that all of this is on top of my mind ... I'm sure we can fix most of these issues.

This is excellent news, thanks!

As an update to our progress in migrating to jest in stylelint:

Using node --expose-gc ./node_modules/.bin/jest --logHeapUsage on my local machine the first 4 tests have heapsizes around 50mb:

 PASS  lib/utils/__tests__/isCustomProperty.test.js (40 MB heap size)
 PASS  lib/rules/selector-no-qualifying-type/__tests__/index.js (44 MB heap size)
 PASS  lib/rules/selector-max-empty-lines/__tests__/index.js (60 MB heap size)
 PASS  lib/rules/at-rule-empty-line-before/__tests__/index.js (61 MB heap size)

The heapsize then creeps up, until finally the last 4 tests (before it runs out of memory) have heapsizes around 1500mb:

PASS  lib/utils/__tests__/validateObjectWithStringArrayProps.test.js (1371 MB heap size)
 PASS  lib/utils/__tests__/isOnlyWhitespace.test.js (1381 MB heap size)
 PASS  lib/utils/__tests__/findAtRuleContext.test.js (1387 MB heap size)
 PASS  lib/utils/__tests__/nextNonCommentNode.test.js (1388 MB heap size)
<--- Last few GCs --->

[55400:0x102801600]   214980 ms: Mark-sweep 1397.5 (1525.0) -> 1397.4 (1526.0) MB, 1828.0 / 0.0 ms  allocation failure GC in old space requested
[55400:0x102801600]   217100 ms: Mark-sweep 1397.4 (1526.0) -> 1397.2 (1489.0) MB, 2118.8 / 0.0 ms  last resort gc
[55400:0x102801600]   218967 ms: Mark-sweep 1397.2 (1489.0) -> 1397.2 (1489.0) MB, 1866.3 / 0.0 ms  last resort gc

May be related: #1893 ("babel-polyfill cause memory leak (?)")

In the past week we moved away from needing babel, in favour of writing in node@4 code. As such, I think we can rule out babel-polyfill being the source our particular out-of-memory crashes.

@cpojer We'll keep poking around at our end as we're keen to move to jest for the excellent developer experience, but we'll keep our fingers crossed that there's a solution within jest for the out-of-memory crashes we're having.

@cpojer
Copy link
Member

cpojer commented Dec 3, 2016

Wow this is great, thank you so much for the investigation. Here is a bunch more that you should think about:

  • Jest parallelizes tests, so you should potentially try to detect leaks in a single thread using -i, otherwise all the threads will just split the leaking and take longer to crash.
  • What has helped me in the past is to use v8-profiler, capture a heap snapshot at certain times and then load it in chrome and take a look at the retained references. Again, this should be done using -i because the leak currently seems to happen in the worker processes and not in the main process, so if you run it without -i, you'll only capture the heap of the main process without any test runs. See https://github.com/node-inspector/v8-profiler

From the looks of it it seems like one of two things, both equally likely:

  • You guys aren't cleaning up after your tests, retaining more memory than you should. This may even be a node bug with the vm module – I've had issues with that years ago on a different service I was building at FB. For some reason an "Error" object in a vm context was retaining every single JS module at FB and crashing our service after 100 requests. This was tough to track down.
  • Jest is retaining information or the vm context after a test has completed in a process and the test data has already been sent back.

One thing I thought could be an issue is code coverage – that definitely requires a ton of memory and at FB we had to bump the memory that node can consume. @DmitriiAbramov did a fantastic job to rewrite code coverage support but to make it really efficient we may need support from the community to make it more "streamy": store coverage on the file system and then process them as a stream at the end of a test run. Alternatively it may also be possible to merge coverage information in the main process after each test has finished running. I don't know how feasible either of them is, though.

At FB we currently have one repo with about 5000 test suites and even with coverage we aren't running into any big memory leaks which would hint at a leak issue in your code but I'm not entirely convinced and think it may be a Jest bug.

@davidtheclark
Copy link
Author

It's worth pointing out here that the rule test files, which are causing this memory failure, do not look like standard Jest tests. In order to try switching to Jest without rewriting all of the tests (which would be either a huge manual pain or a moderate jscodeshift pain, would have to try it out), I wrote a setup file that exposes a global.testRule function that then calls describe and it. I wonder if that's some kind of antipattern, and Jest makes assumptions that it violates.

@cpojer
Copy link
Member

cpojer commented Dec 3, 2016

We have the same thing at FB, actually, so I don't think your usage of describe/it is a problem.

@cpojer
Copy link
Member

cpojer commented Dec 3, 2016

btw. this is where Jest creates workers: https://github.com/facebook/jest/blob/master/packages/jest-cli/src/TestRunner.js#L324 while the slowdown of crashing workers and starting them for each individual test would not be tenable, it is something you can try to change to see if it avoids the memory leak problem (ie. it will ensure it is not a set of tests with leaks but rather the aggregate suite over time)

@davidtheclark
Copy link
Author

I think I have some evidence that the memory problem is likely within Jest, rather than stylelint, because it looks to me like the same problem occurs with very simple tests.

I created 200 test files, each with 200 test cases, and each test case nothing more than:

describe(`file ${testFileNo}, case ${testCaseNo}`, () => {
  it(`file ${testFileNo}, case ${testCaseNo}, spec`, () => {
    expect(true).toBe(true)
  })
})

Running those tests, I see the same heap growth we saw in stylelint's tests:

node --expose-gc ./node_modules/.bin/jest --logHeapUsage --runInBand --testPathPattern tmp/tests/.*
 
 PASS  tmp/tests/test-file-189.test.js (47 MB heap size)
 PASS  tmp/tests/test-file-117.test.js (66 MB heap size)
 PASS  tmp/tests/test-file-118.test.js (83 MB heap size)
 PASS  tmp/tests/test-file-119.test.js (101 MB heap size)
 PASS  tmp/tests/test-file-116.test.js (119 MB heap size)
 PASS  tmp/tests/test-file-122.test.js (135 MB heap size)
 PASS  tmp/tests/test-file-123.test.js (153 MB heap size)
 PASS  tmp/tests/test-file-124.test.js (171 MB heap size)
 PASS  tmp/tests/test-file-125.test.js (189 MB heap size)
 PASS  tmp/tests/test-file-126.test.js (206 MB heap size)
...

@davidtheclark
Copy link
Author

Hmm ... nevermind. When running those tests I forgot to remove our jest-setup.js script. When I did remove that script (good news!) the heap size stayed constant at 33MB, every time.

So that's good news for Jest, suggests that our problem is somewhere in stylelint's own files or it's dependencies.

@cpojer
Copy link
Member

cpojer commented Dec 4, 2016

That's great progress. Can you share your setup script? Does it do anything with global objects, timers etc. – anything form the environment that it attaches and might stick around and cannot be GC'd?

@davidtheclark
Copy link
Author

Looks like all I need in the setup script to cause the heap to grow is const stylelint = require("./lib/index"). So probably our problem is within stylelint's code, not the setup script itself.

Never debugged a memory leak before. Here we go, I guess ...

@jeddy3
Copy link

jeddy3 commented Dec 4, 2016

Can you share your setup script?

@cpojer You'll find the the setup script here.

@davidtheclark
Copy link
Author

@cpojer: Here's a stylelint update that's partly good, partly not so good ... We found that a nested dependency on graceful-fs was causing the memory usage to grow indefinitely, as garbage collections were inadequately cleaning up. After removing that dependency, we are able to run all 22000+ rule tests on Travis in Node 4 (--runInBand) without running out of memory.

However, if we add coverage reporting we still run out of memory — you did mention above that this demands a lot of memory.

Jest is kind of "off the hook" for the rule tests, then (:+1:); but the fact remains that Jest demands so much memory that problems arise where they don't with other runners. I wonder if other people experiencing memory problems also have some code somewhere in their app causing a slight memory leak that only becomes a problem when running Jest.

@cpojer
Copy link
Member

cpojer commented Dec 7, 2016

Yeah you definitely have a special case: massive amounts of tests but low memory. At FB, we have both massive amounts of tests and insane amounts of memory, so it is generally not a problem. I'd love to reduce coverage memory usage and I believe the way to do it is by through either:

  • Merge coverage results in the main process after each test file and get rid of the per-test-result coverage data.
  • Write coverage results to a file and process it as a stream at the end of a run.

Maybe @DmitriiAbramov can help identify how much work that is. This is definitely something where we'll need some help with.

@davidtheclark
Copy link
Author

davidtheclark commented Dec 7, 2016

Write coverage results to a file and process it as a stream at the end of a run.

I don't have experience with workers but have used this method before. Can't make promises around the holiday season, of course, but if you can point me in the direction of some relevant code I'll try to help out with this.

@aaronabramov
Copy link
Contributor

@cpojer we do merge coverage results after each test. Maybe the easy fix would be to delete testResult.coverage after each test. That should free up a lot of memory

@cpojer
Copy link
Member

cpojer commented Dec 7, 2016

@DmitriiAbramov wait we do!? I didn't realize that. We still need to extract coverage data per file for phabricator; if we delete it from individual tests, will we still be able to do this? I think this would dramatically reduce memory usage of Jest.

@kentaromiura might be interested as we'd have to change the coverage processor around a bit.

@aaronabramov
Copy link
Contributor

@cpojer yeah. it should work.
i would do something like

testResult.coverage = 'coverage object was removed to reduce memory consumption, run with -keepCoverage' to keep it'

or something like that.
Phabricator should still work, i think we only use the end result to generate the coverage json to feed in

@cpojer
Copy link
Member

cpojer commented Dec 7, 2016

I don't see the value that coverage from one individual test run provides so I'd prefer to just delete it completely after it was merged. imho only the aggregate really matters after a run. If you'd like to know file-specific coverage, run the test in isolation. What do you think?

@aaronabramov
Copy link
Contributor

i think it doesn't matter in 99%.
unless someone wants to build a custom reporter for some very specific use case.
I'll just remove it and let's see if it helps

@ooHmartY
Copy link

Hi all,
Just chiming in after doing some more investigation it seems like Runtime is a bit out of control.
And the issue might not be related to coverage. Even running a limited suite within the Jest repository quickly exposes this issue.

Using the following command you can attach to the node process and take heap dumps:

node --inspect=5858 ./packages/jest-cli/bin/jest.js runtime --runInBand --logHeapUsage

I'm hoping that someone has more context as to how Runtime comes into play during a test run.

Here are my findings (last column in each screenshot is "Retained Size"):

It looks like there is a Runtime object created for every suite (just making an assumption based on the # of runtime objects).
screen shot 2016-12-14 at 10 22 15 am

The _moduleRegistry and _resolver properties in each Runtime grow quickly and are never released.
screen shot 2016-12-14 at 10 28 15 am

These properties also contain information about modules that are not relevant to the current suite (not sure if this is by design).
screen shot 2016-12-14 at 10 34 00 am

The context in the majority of these objects seems to point to graceful-fs and gracefulify.
screen shot 2016-12-14 at 10 30 45 am

This issue seems like a deal breaker for teams that have a large number of test. I'm hoping we can find a path to resolution soon.

@cpojer
Copy link
Member

cpojer commented Dec 14, 2016

@ooHmartY thanks for the analysis. "Runtime" is what runs each test in a vm context in isolation. This is where it gets tricky: why are these objects retained? It is entirely possible that you are doing something in your tests that causes all your test code to be retained. While I'm not entirely sure if this is a Jest issue, I'd appreciate if you could profile this again with test suites as simple as test('works', () => expect(1).toBe(1)) and let me know if you can repro this leak.

At Facebook we run thousands of test files in one single run and don't ever run into memory issues like you are describing. I highly doubt that you have a test suite that's bigger than what we have, which hints at this problem being at least partially the user code that is run within Jest.

One simple thing I can think of is when you attach listeners or similar to global objects, for example if you did require('fs')[Math.random] = global inside of a test, that would make it so everything from one test file will be retained in the parent context.

I'm really hoping we can figure out further fixes here. Does my description help?

@ooHmartY
Copy link

ooHmartY commented Dec 14, 2016

Hi @cpojer thanks for the quick reply.

Just wanted to clarify that this analysis was run in the actually Jest repo by running the following commands:

git clone git@github.com:facebook/jest.git
cd ./jest
yarn install
node --inspect=5858 ./packages/jest-cli/bin/jest.js runtime --runInBand --logHeapUsage

The last command was lifted from package.json

Looks like there is repo with some simple tests created for #1893 .
I'll can try again using a simple test and post my findings.

@thymikee
Copy link
Collaborator

Turned out it's an issue with jsdom leaking Window objects. We track it in this thread: #1893

Here's the relevant issue in JSDOM repo: jsdom/jsdom#1682
We'd definitely ❤️ any help on that, but it requires debugging jsdom itself.

Closing this.

@azz
Copy link
Contributor

azz commented Jan 4, 2018

Just FYI we had a very similar issue in the Prettier repo, graceful-fs was pulled in via read-pkg-up.

image

This commit works around the problem by aliasing graceful-fs to regular fs: prettier/prettier@9eb3e0a

Ref: prettier/prettier#3624 (comment)

@cpojer
Copy link
Member

cpojer commented Jan 4, 2018

Ohhhh, this is pretty interesting actually because when you require('fs') in Jest, you actually get graceful-fs (otherwise you'd run out of file handles pretty quickly at the scale of FB). I think the problem here may be that when you are requiring graceful-fs inside of a test itself, it tries to require fs which is a copy of graceful-fs. Maybe we should alias graceful-fs to the one from the parent context.

@azz
Copy link
Contributor

azz commented Jan 4, 2018

This gets more complex when you're testing bundled code (in our case we're using Rollup). Maybe this issue should be documented?

@bertho-zero
Copy link
Contributor

I have the same issue with the version 22.4.4.

jest --no-cache fix this.

@jfairley
Copy link

jfairley commented Apr 9, 2019

As @bertho-zero mentioned, jest --no-cache does fix this, but for me it makes my tests run so much slower, it's not a great solution.


I have implemented the solution from @azz and @cpojer with success!
#2179 (comment)

jest.config.js:

  moduleNameMapper: {
    // Jest wires `fs` to `graceful-fs`, which causes a memory leak when
    // `graceful-fs` does `require('fs')`.
    // Ref: https://github.com/facebook/jest/issues/2179#issuecomment-355231418
    "graceful-fs": "<rootDir>/test/helpers/fs.js"
  },

test/helpers/fs.js:

module.exports = require('fs');

@SimenB
Copy link
Member

SimenB commented May 8, 2020

This should be fixed in Jest 25.5 btw, via #9443

@felipeloha
Copy link

I still have this issue with jest 26.6.3

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests