Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation failure from V8: JavaScript heap out of memory #354

Closed
kevinlul opened this issue Aug 13, 2023 · 13 comments
Closed

Memory allocation failure from V8: JavaScript heap out of memory #354

kevinlul opened this issue Aug 13, 2023 · 13 comments
Labels
bug Something isn't working production Incidents from the running Bastion#1870 instance

Comments

@kevinlul
Copy link
Contributor

Between 1e175ea (image, deployed July 9 8:16 UTC) and 6f5d636 (image, deployed Aug 13 9:46 UTC), something was introduced, likely in the base Docker Node image, that causes Node.js to crash after a few hours of operation.

This change has been rolled back in production.

git diff in case. (Maybe an updated dependency has some type of memory leak.)

@kevinlul kevinlul added bug Something isn't working production Incidents from the running Bastion#1870 instance labels Aug 13, 2023
@kevinlul
Copy link
Contributor Author

kevinlul commented Aug 13, 2023

First crash: bot was ready at 8:48 UTC. Crashes after almost seven hours at 15:36.

<--- Last few GCs --->
[1:0x63ff3a0] 20965805 ms: Mark-sweep 989.1 (1014.8) -> 988.0 (1014.8) MB, 2570.9 / 0.0 ms  (average mu = 0.484, current mu = 0.519) allocation failure; scavenge might not succeed
[1:0x63ff3a0] 20971142 ms: Mark-sweep 989.0 (1014.8) -> 988.1 (1015.0) MB, 3132.6 / 0.0 ms  (average mu = 0.447, current mu = 0.413) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb83f50 node::Abort() [node]
 2: 0xa94834  [node]
 3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xf42265  [node]
 6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 7: 0xf2ee4e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 8: 0xf30217 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xf113ea v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x12d674f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x17035b9  [node]

@kevinlul
Copy link
Contributor Author

Second crash: bot restarted at 15:37 UTC. Crashes after almost four hours at 19:19.

<--- Last few GCs --->
[1:0x73ee3a0] 13339141 ms: Mark-sweep 989.4 (1014.8) -> 988.2 (1014.8) MB, 2584.3 / 0.0 ms  (average mu = 0.638, current mu = 0.518) allocation failure; scavenge might not succeed
[1:0x73ee3a0] 13344466 ms: Mark-sweep 989.3 (1014.8) -> 988.4 (1015.0) MB, 3224.2 / 0.0 ms  (average mu = 0.536, current mu = 0.394) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb83f50 node::Abort() [node]
 2: 0xa94834  [node]
 3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xf42265  [node]
 6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 7: 0xf2ee4e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 8: 0xf30217 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xf10760 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
10: 0xf081d4 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
11: 0xf0a762 v8::internal::FactoryBase<v8::internal::Factory>::NewDescriptorArray(int, int, v8::internal::AllocationType) [node]
12: 0x11a4eb7 v8::internal::DescriptorArray::CopyUpTo(v8::internal::Isolate*, v8::internal::Handle<v8::internal::DescriptorArray>, int, int) [node]
13: 0x118dfc3 v8::internal::Map::EnsureDescriptorSlack(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, int) [node]
14: 0x1190ffa v8::internal::Map::ShareDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Descriptor*) [node]
15: 0x11916b4 v8::internal::Map::CopyWithField(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Representation, v8::internal::TransitionFlag) [node]
16: 0x1193367 v8::internal::Map::TransitionToDataProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::StoreOrigin) [node]
17: 0x1181122 v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::StoreOrigin) [node]
18: 0x119eef2 v8::internal::Object::TransitionAndWriteDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
19: 0x11b7223 v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin, v8::internal::EnforceDefineSemantics) [node]
20: 0x11b8076 v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
21: 0x12e04b5 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
22: 0x1152264 v8::internal::JSReceiver::SetOrCopyDataProperties(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertiesEnumerationMode, v8::base::ScopedVector<v8::internal::Handle<v8::internal::Object> > const*, bool) [node]
23: 0x12e433d v8::internal::Runtime_SetDataProperties(int, unsigned long*, v8::internal::Isolate*) [node]
24: 0x17035b9  [node]

@kevinlul
Copy link
Contributor Author

The preview bot has not been affected by this, but it's only in a couple of servers and sees very little traffic.

@kevinlul
Copy link
Contributor Author

kevinlul added a commit that referenced this issue Aug 15, 2023
https://nodejs.org/en/docs/guides/diagnostics/memory/using-gc-traces

#354: identify whether a base image change is responsible for the memory allocation crashes
@kevinlul kevinlul pinned this issue Aug 15, 2023
@kevinlul
Copy link
Contributor Author

Original master moved to https://github.com/DawnbrandBots/bastion-bot/tree/master-354. New commits to master will have --trace-gc to identify the root cause of the crash. 07be126 should identify whether a base image change is responsible for the memory allocation crashes.

@kevinlul
Copy link
Contributor Author

Also needed to cherry-pick ad348f2 ea1a711 d646a1b due to the IP change

@kevinlul
Copy link
Contributor Author

From over 13 hours in operation, it's most likely not due to a base image change. No crashes. Memory usage is stable with several good mark-sweep cycles.

@kevinlul
Copy link
Contributor Author

Proceeding to cherry-pick all non-dependency code changes for tonight. d2ca81d is now the last-known good commit.

1e175ea...master-354

@kevinlul
Copy link
Contributor Author

Still stable! Leaving this to run another day while I cherry-pick all dependency updates minus the http2-wrapper force update 6a22f75

@kevinlul
Copy link
Contributor Author

The leak might just be in http2-wrapper: szmarczak/http2-wrapper#96 szmarczak/http2-wrapper#97

@kevinlul
Copy link
Contributor Author

Cases from newer versions of got that by default use the newer http2-wrapper: sindresorhus/got#1516 sindresorhus/got#2160 sindresorhus/got#2225 sindresorhus/got#2288

@kevinlul
Copy link
Contributor Author

Crash occurred again with a95b43f (deployment, rollback)

Discord.js just released 14.13.0 suggesting that it is responsible for the leak. See also discordjs/discord.js#9759 discordjs/discord.js#9763

@kevinlul
Copy link
Contributor Author

The run with a95b43f was characterized by a high number of scavenge attempts per actual bot trigger processed and elevated memory usage. Presumably, this is the garbage collector desperately hunting for memory as its all consumed by the Discord.js message caches.

kevinlul added a commit that referenced this issue Aug 19, 2023
@kevinlul kevinlul unpinned this issue Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working production Incidents from the running Bastion#1870 instance
Projects
None yet
Development

No branches or pull requests

1 participant