Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM - JavaScript heap out of memory when trying to create resources #11476

Closed
vtereshyn opened this issue Nov 28, 2022 · 19 comments
Closed

OOM - JavaScript heap out of memory when trying to create resources #11476

vtereshyn opened this issue Nov 28, 2022 · 19 comments
Assignees
Labels
area/language-host Runtime that executes user programs area/sdks Pulumi language SDKs blocked kind/bug Some behavior is incorrect or out of spec language/javascript resolution/fixed This issue was fixed
Milestone

Comments

@vtereshyn
Copy link

vtereshyn commented Nov 28, 2022

What happened?

We use pulumi and pulumi/azure-native to create Frontdoor resources. After upgrading to the latest Typescript, Pulumi, and Node JS versions and moving project to the EcmaScript modules, we started getting OOM whenever we ran pulumi preview or pulumi up

Not sure if the issue is in @pulumi/azure-native or @pulumi/pulumi package, but this is a huge blocker for us. We can't proceed with other stacks because they have references to this stack which is broken.

Please note I tried to do the same with Common JS instead of ESM, old Typescript, and Node JS versions, but the issue is still present.

Steps to reproduce

(Repository contains README.md file with instructions)

Expected Behavior

Pulumi should create resources without OOM

Actual Behavior

Getting OOM error:

<--- Last few GCs --->
    [84542:0x7fe6c8078000]   336061 ms: Scavenge (reduce) 16096.1 (16429.9) -> 16095.9 (16430.9) MB, 36.1 / 0.0 ms  (average mu = 0.105, current mu = 0.014) allocation failure;
    [84542:0x7fe6c8078000]   354901 ms: Mark-sweep (reduce) 16096.9 (16430.9) -> 16096.5 (16431.4) MB, 18766.4 / 0.0 ms  (+ 203.0 ms in 45 steps since start of marking, biggest step 6.4 ms, walltime since start of marking 19127 ms) (average mu = 0.060, curren
    <--- JS stacktrace --->
    FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Output of pulumi about

CLI          
Version      3.48.0
Go Version   go1.19.3
Go Compiler  gc

Plugins
NAME          VERSION
azure-native  1.87.0
azure-native  1.87.0
nodejs        unknown

Host     
OS       darwin
Version  12.6.1
Arch     x86_64

This project is written in nodejs: executable='/Users/.nvm/versions/node/v18.12.0/bin/node' version='v18.12.0'

Current Stack: org/test

Found no resources associated with org/test

Found no pending operations associated with org/test

Additional context

Link to repository: https://github.com/vtereshyn/pulumi-fw-issue

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@vtereshyn vtereshyn added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 28, 2022
@XBeg9
Copy link

XBeg9 commented Nov 28, 2022

@mikhailshilkov any suggestions here, we are kind of blocked due to it :( if you need more info happy to provide it here or over Slack. Thanks!

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

After upgrading to the latest Typescript, Pulumi, and Node JS versions and moving project to the EcmaScript modules, we started getting OOM whenever we ran pulumi preview or pulumi up

That's quite a lot changed all at once. Can you make one change at a time and repro which part of that overall change actually causes the OOM errors to start?

@vtereshyn
Copy link
Author

vtereshyn commented Nov 29, 2022

After upgrading to the latest Typescript, Pulumi, and Node JS versions and moving project to the EcmaScript modules, we started getting OOM whenever we ran pulumi preview or pulumi up

That's quite a lot changed all at once. Can you make one change at a time and repro which part of that overall change actually causes the OOM errors to start?

As I mentioned before, this issue appeared with Common JS, Node JS v16.17.1 and the old Typescript version. Here is a branch from my repository with CJS instead of ESM, the old Node JS version, but the latest Pulumi - https://github.com/vtereshyn/pulumi-fw-issue/tree/node_16.17.1-cjs

@vtereshyn
Copy link
Author

Here is last working version – https://github.com/vtereshyn/pulumi-fw-issue/tree/last_working_version

"@pulumi/azure-native": "1.67.0",
"@pulumi/pulumi": "3.17.1"

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

Checking that out seems to run, but if I update it to:

    "@pulumi/azure-native": "1.87.0",
    "@pulumi/pulumi": "3.48.0",
    "ts-node": "10.9.1",
    "typescript": "4.9.3"

It still seems to work fine.

I suspect this is that pulumi isn't handling esm modules correctly, but I don't think it's ever done that afaik.

@Frassle Frassle added area/sdks Pulumi language SDKs language/javascript area/language-host Runtime that executes user programs and removed needs-triage Needs attention from the triage team labels Nov 29, 2022
@vtereshyn
Copy link
Author

Have you checked my repository or tried on yours? We have other stacks that are working fine. But the stack with Azure Firewall breaks other stacks that have dependency on it. In that stack we have OOM @Frassle

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

I checked out your repo at the last_working_version branch and ran a pulumi preview.

@vtereshyn
Copy link
Author

vtereshyn commented Nov 29, 2022

Sounds good. But upgrading dependencies (not CJS to ESM and Typescript version) will break preview

UPD: just checked: updated pulumi dependencies in last_working_version branch and it doesn't work

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

As above I can run pulumi preview on last_working_version + the changes:

    "@pulumi/azure-native": "1.87.0",
    "@pulumi/pulumi": "3.48.0",
    "ts-node": "10.9.1",
    "typescript": "4.9.3"

Are you saying that if you update last_working_version to those versions (with no other changes) that it starts OOMing on preview?

@vtereshyn
Copy link
Author

vtereshyn commented Nov 29, 2022

@Frassle yes, exactly

when you changed version to make it "clean" please remove node_modules, .pnpm-store and dist. and install dependencies from scratch

image

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

Right ok, I'll do some more checks haven't used pnpm before might of missed something with that.

@Frassle Frassle self-assigned this Nov 29, 2022
@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

I think this is due to #8627 which was added in 3.21.0. We're currently working on a replacement for this (#11206).

I don't think you'll be able to update past 3.20 with the current code until that's fixed.
Alternatively rework the code to not parent resources to other resources.

@vtereshyn
Copy link
Author

@Frassle does it mean that ESM support is available after 3.21.0, but all past versions are broken and won't work with parent?

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

It does look that way, 3.21.1 added es module support but 3.21.0 broke parents (specifically it consumes huge amounts of memory as it tries to calculate the cartesian product of all aliases up the parent chain, and azure-native resources have a lot of type aliases)

@vtereshyn
Copy link
Author

vtereshyn commented Nov 29, 2022

@Frassle is there a way to prioritize the PR you mentioned? Looks like it is a huge thing, but it hasn't been fixed for about 27 minor versions (from 3.21.0 to 3.48.0)

@Frassle
Copy link
Member

Frassle commented Nov 29, 2022

CC @kpitzen as his been working on it.
It hasn't been fixed in 27 minor revisions because it was initially noticed that there was a performance issue, and then fixing it in a way that was still correct but performance has ended up being a lot of work. We've been chipping away at fixing this for a few months now, it is getting done.

@Frassle Frassle removed their assignment Dec 6, 2022
@RobbieMcKinstry
Copy link
Contributor

RobbieMcKinstry commented Jan 31, 2023

Just to give an update here, @kpitzen landed a fix for alias explosion (linked issue: #11062) in NodeJS.

@vtereshyn can you upgrade to v3.44.3 or newer and give that a try? I think we'll be able to close this issue.

@vtereshyn
Copy link
Author

vtereshyn commented Feb 2, 2023

@RobbieMcKinstry I used v3.48.0 as described in the issue. I can upgrade to the latest one and test it one more time.

UPD: yes, looks like it has been resolved in one of the latest versions. Here is my pulumi about:

❯ pulumi about  
CLI          
Version      3.53.1
Go Version   go1.19.5
Go Compiler  gc

Plugins
NAME          VERSION
azure-native  1.87.0
nodejs        unknown

Host     
OS       darwin
Version  12.6.2
Arch     x86_64

@RobbieMcKinstry RobbieMcKinstry added the resolution/fixed This issue was fixed label Feb 2, 2023
@RobbieMcKinstry
Copy link
Contributor

🎉 🎉 🎉 I'm very happy this issue has been resolved! I'm sorry this experience was so frustrating. This was a gnarly bug, and it took a lot of mental effort from @kpitzen to fix. In fact, the work to repair this bug in other languages is still ongoing (the fixes for Go, Python, and Java are all landing in the next few weeks thanks to @Zaid-Ajaj).

@lukehoban lukehoban added this to the 0.84 milestone Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/language-host Runtime that executes user programs area/sdks Pulumi language SDKs blocked kind/bug Some behavior is incorrect or out of spec language/javascript resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

5 participants