Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC regression in v12.5.0 #28722

Closed
theninj4 opened this issue Jul 16, 2019 · 25 comments
Closed

GC regression in v12.5.0 #28722

theninj4 opened this issue Jul 16, 2019 · 25 comments
Labels
memory Issues and PRs related to the memory management or memory footprint. v8 engine Issues and PRs related to the V8 dependency.

Comments

@theninj4
Copy link

  • Version: v12.6.0
  • Platform: Linux xxxxxxx 4.15.0-45-generic Improve repo description #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I'm working on some "big data" type stuff and figured I'd have a crack at it with Node.js. I'm effectively streaming a huge data set from disk into memory - millions of Set()s holding hundreds of Numbers. I'm starting Node.js as follows (that's not a typo, 142GB):

$ node --max-old-space-size=142000 ./path/to/code.js

Now, I've been using v11.15.0 for a while, and this all works remarkably well all things considered. I reinstalled my server last week and ended up with the latest v12 release - I'd heard there'd been some changes to how the memory side of things works and figured I'd give it a whirl. I'm using the same flags as v11, the identical command as above.

First up, v12 runs around 4 times slower than v11 when loading my data set into memory. Secondly, when I pass ~13GB of heap usage (as reported by process.memoryUsage().heapUsed) the process prints the following message and dies:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Let me know if I can help in any way.

@theninj4
Copy link
Author

theninj4 commented Jul 16, 2019

Here's an example that demonstrates the issue:

const bigData = [ ]
const startDate = new Date()

while (true) {
  const someData = new Set()
  for (let i = 0; i < 128; i++) {
    someData.add(i)
  }
  bigData.push(someData)
  if (bigData.length % 1000 === 0) {
    process.stdout.write([
      `\r${bigData.length.toLocaleString()}`,
      `@`,
      `${Math.floor(process.memoryUsage().heapUsed / (1024 * 1024)).toLocaleString()}MB`,
      `+`,
      `${((new Date()) - startDate).toLocaleString()}ms`
    ].join(' '))
  }
}

Node v11.15.0: note how fast it is I killed it around 6.6m / 16.8GB - you'll have to trust me that this runs all the way to the 40GB limit and dies.

$  node --max-old-space-size=40000 test.js                                                         
6,627,000 @ 16,823MB + 154,639ms
^C

Node v12.6.0: note the duration (the speed of iteration remains consistent throughout)

$ node --max-old-space-size=40000 test.js                                                         
6,459,000 @ 16,268MB + 263,427ms
<--- Last few GCs --->

[4630:0x44e2fd0]   263748 ms: Scavenge 16270.1 (16414.7) -> 16269.9 (16414.7) MB, 84.1 / 0.0 ms  (average mu = 0.996, current mu = 0.997) allocation failure 
[4630:0x44e2fd0]   263836 ms: Scavenge 16270.1 (16414.7) -> 16269.9 (16414.7) MB, 83.4 / 0.0 ms  (average mu = 0.996, current mu = 0.997) allocation failure 
[4630:0x44e2fd0]   263923 ms: Scavenge 16270.1 (16414.7) -> 16270.1 (16414.7) MB, 84.1 / 0.0 ms  (average mu = 0.996, current mu = 0.997) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: Scavenger: semi-space copy Allocation failed - JavaScript heap out of memory
 1: 0x9aedf0 node::Abort() [node]
 2: 0x9aff86 node::OnFatalError(char const*, char const*) [node]
 3: 0xb078ce v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb07c49 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xce4ae5  [node]
 6: 0xd4527a  [node]
 7: 0xd47252 void v8::internal::BodyDescriptorBase::IteratePointers<v8::internal::IterateAndScavengePromotedObjectsVisitor>(v8::internal::HeapObject, int, int, v8::internal::IterateAndScavengePromotedObjectsVisitor*) [node]
 8: 0xd476d0 void v8::internal::BodyDescriptorBase::IterateJSObjectBodyImpl<v8::internal::IterateAndScavengePromotedObjectsVisitor>(v8::internal::Map, v8::internal::HeapObject, int, int, v8::internal::IterateAndScavengePromotedObjectsVisitor*) [node]
 9: 0xd47bb1 v8::internal::Scavenger::IterateAndScavengePromotedObject(v8::internal::HeapObject, v8::internal::Map, int) [node]
10: 0xd48e4b v8::internal::Scavenger::Process(v8::internal::OneshotBarrier*) [node]
11: 0xd4c47d v8::internal::ScavengingTask::RunInParallel() [node]
12: 0xd00a42 v8::internal::ItemParallelJob::Task::RunInternal() [node]
13: 0xd00e3c v8::internal::ItemParallelJob::Run() [node]
14: 0xd4a7d5 v8::internal::ScavengerCollector::CollectGarbage() [node]
15: 0xcdecdb v8::internal::Heap::Scavenge() [node]
16: 0xcf0353 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
17: 0xcf1047 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
18: 0xcf3b78 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
19: 0xcbd7e8 v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::RootIndex, int, v8::internal::Object, v8::internal::AllocationType) [node]
20: 0xcbd8f0 v8::internal::Handle<v8::internal::FixedArray> v8::internal::Factory::NewFixedArrayWithMap<v8::internal::FixedArray>(v8::internal::RootIndex, int, v8::internal::AllocationType) [node]
21: 0xe97e57 v8::internal::OrderedHashTable<v8::internal::OrderedHashSet, 1>::Allocate(v8::internal::Isolate*, int, v8::internal::AllocationType) [node]
22: 0xe97f23 v8::internal::OrderedHashTable<v8::internal::OrderedHashSet, 1>::Rehash(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashSet>, int) [node]
23: 0xe9849b v8::internal::OrderedHashTable<v8::internal::OrderedHashSet, 1>::EnsureGrowable(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashSet>) [node]
24: 0xf77c20 v8::internal::Runtime_SetGrow(int, unsigned long*, v8::internal::Isolate*) [node]
25: 0x13162b9  [node]
[1]    4630 abort (core dumped)  node --max-old-space-size=40000 test.js

@theninj4
Copy link
Author

theninj4 commented Jul 16, 2019

Oh, here are the tests without any memory flags.

v12.6.0: (it stutters significantly as it runs out of memory, crawls along for a few more seconds and explodes):

$ node test.js                                                                                    
812,000 @ 2,048MB + 15,396ms
<--- Last few GCs --->

[5123:0x4782ef0]    14817 ms: Mark-sweep 2048.8 (2083.7) -> 2047.1 (2070.7) MB, 213.2 / 0.0 ms  (+ 325.4 ms in 107 steps since start of marking, biggest step 12.2 ms, walltime since start of marking 565 ms) (average mu = 0.099, current mu = 0.071) finaliz[5123:0x4782ef0]    15419 ms: Mark-sweep 2052.5 (2070.7) -> 2045.4 (2092.0) MB, 585.2 / 0.0 ms  (average mu = 0.061, current mu = 0.028) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x13162b9]
Security context: 0x36684c280911 <JSObject>
    1: /* anonymous */ [0x3f35c048ea9] [/home/ninj4/test.js:~1] [pc=0x2f7b3e80933f](this=0x03f35c048fd9 <Object map = 0x19094bbc0459>,0x03f35c048fd9 <Object map = 0x19094bbc0459>,0x03f35c048f99 <JSFunction require (sfi = 0x83a9b651df1)>,0x03f35c048f09 <Module map = 0x19094bbfa1a9>,0x083a9b64eaf1 <String[#19]: /home/ninj4/test.js>,0x03f35c048ee9 <String[11]: /home...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x9aedf0 node::Abort() [node]
 2: 0x9aff86 node::OnFatalError(char const*, char const*) [node]
 3: 0xb078ce v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb07c49 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xce4ae5  [node]
 6: 0xcf032b v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 7: 0xcf1047 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0xcf278f v8::internal::Heap::HandleGCRequest() [node]
 9: 0xc8ab64 v8::internal::StackGuard::HandleInterrupts() [node]
10: 0xf93ba7 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x13162b9  [node]
[1]    5123 abort (core dumped)  node test.js

v11.15.0 (slams into the limit pretty hard and explodes):

node test.js                                                                                    
554,000 @ 1,400MB + 8,330ms
<--- Last few GCs --->
n [5726:0x3913010]     8388 ms: Mark-sweep 1397.9 (1427.6) -> 1397.6 (1430.1) MB, 118.4 / 0.0 ms  (+ 33.7 ms in 36 steps since start of marking, biggest step 8.0 ms, walltime since start of marking 160 ms) (average mu = 0.178, current mu = 0.082) allocation[5726:0x3913010]     8578 ms: Mark-sweep 1400.4 (1430.1) -> 1397.8 (1428.6) MB, 181.6 / 0.0 ms  (+ 0.0 ms in 1 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 182 ms) (average mu = 0.118, current mu = 0.043) allocation f

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x16e6c6cfc5d]
Security context: 0x19125689d9f1 <JSObject>
    1: add [0x191256890799](this=0x28f1ca7ca161 <Set map = 0x206897f83d81>,64)
    2: /* anonymous */ [0x12dfb9c8df51] [/home/ninj4/test.js:~1] [pc=0x16e6c772618](this=0x12dfb9c8e089 <Object map = 0x206897f82521>,0x12dfb9c8e089 <Object map = 0x206897f82521>,0x12dfb9c8e049 <JSFunction require (sfi = 0x2d41e4effbc9)>,0x12dfb9c8dfb9 <Module map = 0x206897fc5c81>...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x95bd00 node::Abort() [node]
 2: 0x95cc46 node::OnFatalError(char const*, char const*) [node]
 3: 0xb3dbde v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb3de14 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xf3ce52  [node]
 6: 0xf3cf58 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [node]
 7: 0xf49678 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xf4a18b v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xf4cec1 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [node]
10: 0xf16df6 v8::internal::Factory::AllocateRawArray(int, v8::internal::PretenureFlag) [node]
11: 0xf177ba v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [node]
12: 0xf17860 v8::internal::Handle<v8::internal::FixedArray> v8::internal::Factory::NewFixedArrayWithMap<v8::internal::FixedArray>(v8::internal::Heap::RootListIndex, int, v8::internal::PretenureFlag) [node]
13: 0x10bd257 v8::internal::OrderedHashTable<v8::internal::OrderedHashSet, 1>::Allocate(v8::internal::Isolate*, int, v8::internal::PretenureFlag) [node]
14: 0x10c07c1 v8::internal::OrderedHashTable<v8::internal::OrderedHashSet, 1>::Rehash(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashSet>, int) [node]
15: 0x10c0d3b v8::internal::OrderedHashTable<v8::internal::OrderedHashSet, 1>::EnsureGrowable(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashSet>) [node]
16: 0x11ae6ac v8::internal::Runtime_SetGrow(int, v8::internal::Object**, v8::internal::Isolate*) [node]
17: 0x16e6c6cfc5d 
[1]    5726 abort (core dumped)  node test.js

@theninj4
Copy link
Author

...sorry the timings differ a bit. The regression was introduced in v12.5.0

v12.1.0

~ ➤ nvm install 12.1.0                                                                              
Now using node v12.1.0 (npm v6.9.0)
~ ➤ node --max-old-space-size=40000 test.js                                                         
7,468,000 @ 18,877MB + 200,636ms
^C

v12.2.0

~ ➤ nvm install 12.2.0                                                                              
Now using node v12.2.0 (npm v6.9.0)
~ ➤ node --max-old-space-size=40000 test.js                                                         
7,467,000 @ 18,876MB + 200,328ms
^C

v12.3.0

~ ➤ nvm install 12.3.0                                                                              
Now using node v12.3.0 (npm v6.9.0)
~ ➤ node --max-old-space-size=40000 test.js                                                         
7,823,000 @ 19,768MB + 215,159ms
^C

v12.4.0

~ ➤ nvm install 12.4.0                                                                              
Now using node v12.4.0 (npm v6.9.0)
~ ➤ node --max-old-space-size=40000 test.js                                                         
8,084,000 @ 20,425MB + 227,961ms
^C

v12.5.0 (this is where the performance degraded)

~ ➤ nvm install 12.5.0                                                                              
Now using node v12.5.0 (npm v6.9.0)
~ ➤ node --max-old-space-size=40000 test.js                                                         
5,983,000 @ 15,076MB + 223,417ms
^C

v12.6.0

~ ➤ nvm install 12.6.0                                                                              
v12.6.0 is already installed.
Now using node v12.6.0 (npm v6.9.0)
~ ➤ node --max-old-space-size=40000 test.js                                                         
6,017,000 @ 15,157MB + 223,319ms
^C

@theninj4 theninj4 changed the title Possible memory regression in v12 GC regression in v12.5.0 Jul 16, 2019
@theninj4
Copy link
Author

https://github.com/nodejs/node/releases/tag/v12.5.0
I'm guessing this relates to:

Updated V8 to 7.5.288.22

... @psmarshall I don't suppose you're still involved with V8 Garbage Collection?

@jasnell
Copy link
Member

jasnell commented Jul 16, 2019

Hmmm... I'm wondering if this is the same issue I'm seeing... using valgrind with ./node -pe "console.log('hi')" on Ubuntu is yielding a massive number of issues....

See: https://gist.github.com/jasnell/001808188c36a9936bdbddb69817e84f

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

@joyeecheung @targos ... there's definitely something wrong happening. Valgrind is showing a massive number of issues on latest master. I haven't had a chance to bisect the issues, but look at the gist above to see where the failures are happening.

@jasnell jasnell added memory Issues and PRs related to the memory management or memory footprint. v8 engine Issues and PRs related to the V8 dependency. labels Jul 17, 2019
@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

ping @nodejs/v8 @nodejs/v8-update

@joyeecheung
Copy link
Member

joyeecheung commented Jul 17, 2019

Could be related to #28558

@hashseed The warnings in https://gist.github.com/jasnell/001808188c36a9936bdbddb69817e84f seem to mostly about using uninitialized memory within v8::internal::Snapshot::Initialize and v8::Context::FromSnapshot, is that expected?

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

Potentially related, yes, but just to be certain, building with the --without-node-snapshot option has no effect. Valgrind still shows the warnings.

@targos
Copy link
Member

targos commented Jul 17, 2019

@jasnell do you also see the warnings with d8 ?

You would need to apply #28733 and ./configure --enable-d8.
The binary ends up in out/Release/d8.

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

No, it doesn't happen with d8. Looking it over, it potentially appears like one of the allocators is not being initialized appropriately. Still investigating tho.

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

Working on bisecting the commits. Build... Test... Bisect... Build... Test... Bisect... going to take a while.

@targos
Copy link
Member

targos commented Jul 17, 2019

@jasnell, if it can help, I could narrow down the range of commits to this: 9bead0c...779a243

@targos
Copy link
Member

targos commented Jul 17, 2019

This is what I find:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
4c8fe4a
7cb8981
We cannot bisect more!

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

Both of those would make sense. All of the failures appear to originate in the random number generation for EntropySource. Everything else seems to snowball out from there.

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

/cc @sam-github

@targos
Copy link
Member

targos commented Jul 17, 2019

Is there still a chance that this is related to the original issue or should we open a new one?

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

It's really difficult to say. I'm going to try reverting the three commits from that PR and see where we end up.

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

I can definitely confirm that reverting that PR resolves the flood of valgrind issues I'm seeing. I'll open a PR that folks can experiment with to see if it resolves the original issue discussed here.

@jasnell
Copy link
Member

jasnell commented Jul 17, 2019

PR here: #28739

@theninj4
Copy link
Author

theninj4 commented Jul 17, 2019

I've pulled down @jasnell branch, built it and tested it - bad news I'm afraid:

node ➤ git log -n 1
commit ef5c8a8145d1cbb3e949e06bf717e6b1cd64ed29 (HEAD -> master, jasnell/revert-28211)
Author: James M Snell <jasnell@gmail.com>
Date:   Wed Jul 17 13:03:55 2019 -0700

    Revert "deps: updated openssl upgrade instructions"
    
    This reverts commit 5990c4d453b8fc8453a26566bb7ac680a76bd83f.
node ➤ ./out/Release/node --version
v13.0.0-pre
node ➤ ./out/Release/node --max-old-space-size=40000 ../../test.js
5,643,000 @ 14,219MB + 223,671ms
6,459,000 @ 16,270MB + 280,029msterminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
[1]    11431 abort (core dumped)  ./out/Release/node --max-old-space-size=40000 ../../test.js

The performance issue is still present, and it still runs out of memory.

@NeoPhi
Copy link

NeoPhi commented Aug 15, 2019

We recently switched from using 12.4.0 to 12.7.0 and have seen changes in our GC profile. Our 95th GC pause went from 8ms in 12.4.0 to 15ms in 12.7.0 under a similar workload.

@theninj4 were you able to track down anything related to the V8 upgrade that may point to the root cause?

@theninj4
Copy link
Author

I haven't, no. The behaviour I'm seeing may well be more of a feature than a regression - I very much doubt my use case of rapidly loading tens of GB of data onto the heap is one that the V8 team are interested in. I've stopped loading my huge data set onto the heap and I've moved it to Redis, where I found 12.5.0 is more performant than 12.4.0 at marshalling the data in+out of Redis.

@theninj4
Copy link
Author

The latest changes in v12.9.0 specifically around JSON parse performance have been amazing. As such, it's now bizarrely more performant to shuffle all my data in/out of Redis than it is to try and wrestle with the heap + garbage collection. I'll close this issue 👍

@jdmarshall
Copy link

jdmarshall commented Nov 12, 2020

@NeoPhi did you get anywhere with this? I think I am seeing a similar problem. Wish @theninj4 hadn't closed this when he found a workaround :(

I think we may be seeing more aggressive GC happening in node 12. Max memory stays lower for a similar workload, but that could also be due to throttling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

6 participants