Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase in npm ERR! cb() never called! errors #33424

Closed
desrosj opened this issue Jul 14, 2021 · 16 comments
Closed

Increase in npm ERR! cb() never called! errors #33424

desrosj opened this issue Jul 14, 2021 · 16 comments
Assignees
Labels
[Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests.

Comments

@desrosj
Copy link
Contributor

desrosj commented Jul 14, 2021

Description

There has been an increase in failures running npm install and npm ci in the last 2 months. This has mainly been visible in GitHub Action workflows, but some folks have encountered this locally.

Restarting the workflow or re-running npm install usually fixes the issue.

This is an issue to track this problem, document findings, and hopefully find a solution.

@desrosj desrosj self-assigned this Jul 14, 2021
@desrosj
Copy link
Contributor Author

desrosj commented Jul 14, 2021

Here's a knowledge dump for the actions I've taken so far, and findings.

Actions so far:

The TLDR of the cb() never called! error is that every occurrence of it is unique. It’s essentially a catch all error for a number of different issues that could be in either NPM, or a dependency.

This page in the NPM wiki exactly what is going on. There’s also a very long-term issue on the NPM GH repo (npm/cli#417). It seems that it’s pretty common in projects with a large list of dependencies.

There has been some exploration of cache keys used within GHA workflows being the cause for this because one way to potentially fix the error is to force clean the cache. I think we can rule that out. I think this is more likely a network issue, or a bug within one or more of our dependencies that happens on occasion.

For NPM caching, it’s actually handled entirely by actions/setup-node now as of #33190. The only workflow still manually using actions/cache is the project management workflow. That’s because that workflow runs npm install within a subdirectory, so only a small number of dependencies are installed. If the cache key is updated from that workflow, we lose the caching benefits on all other workflows.

@gziolo
Copy link
Member

gziolo commented Jul 15, 2021

It's worth noting that we are still using npm 6. It's hard to tell if it will get any better when we set npm 7 on CI.

@talldan talldan added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label Jul 16, 2021
@talldan
Copy link
Contributor

talldan commented Sep 6, 2021

An observation is that I've been seeing this error in incredible frequency with this PR - #34533.

Notably, that PR contains changes to the package-lock.json because it changes the dependencies of one of the packages. I wonder if these kind of changes could be a trigger for the problem in our CI.

@gziolo
Copy link
Member

gziolo commented Sep 6, 2021

@talldan, you linked this issue. Is it related to the fix to the mobile dependency that is a forks of a package?

@talldan
Copy link
Contributor

talldan commented Sep 6, 2021

So I did! I've edited my comment.

@gziolo
Copy link
Member

gziolo commented Sep 8, 2021

It looks every time package-lock.json gets updated this errors strikes back. The most recent example is eb4c8e3.

Screen Shot 2021-09-08 at 17 05 20

Screen Shot 2021-09-08 at 17 06 57

All checks on the PR were green.

@desrosj
Copy link
Contributor Author

desrosj commented Sep 13, 2021

Because this has only been happening when the package-lock.json file is updated (or so it seems), I think we can rule out caching issues.

When the lock file changes, the cache key changes and a 100% fresh npm install takes place. So I think this is definitely something inside NPM that occurs with large dependency trees.

In chatting with @gziolo just now, he mentioned that there were a few dependencies that were loaded through git+ (he pointed to the react-native-editor package. My understanding is that when this versioning format is used, an actual GIT clone is performed to retrieve the needed source.

Since most of the references to the error message seem to suggest network issues, I'm thinking we should try to eliminate all git+ format dependencies to troubleshoot and try to eliminate this problem.

As en experiment, we could create a fork, change the versions to standard x.y.z format, and try making several updates to the lock file to try and reproduce.

@talldan
Copy link
Contributor

talldan commented Sep 14, 2021

One observation recently is that it does seem to be happening more regularly with React Native jobs (check recent failures to trunk to see this trend). Those workflows don't seem to do anything different to other workflows when it comes to installing npm dependencies, but they do run on Mac OS instead of Linux.

Those workflows have repeatedly failed to do a fresh npm install. It seems like for some reason Mac OS is more likely to fail than Linux, so I wondered if that operating system results in more happening during an install, or if the hardware might be not quite as capable.

@gziolo
Copy link
Member

gziolo commented Sep 14, 2021

I'm trying to migrate npm v6 to v7 in #33892. In the process, I discovered a few issues between React Native dependencies that are tracked in #34801. As a temporary step, I also decided to remove appium from the dev dependencies and remove all references to @wordpress/react-native-editor. It helped me to move forward locally, but I'm still trying to fix other issues. It makes me wonder if we need to investigate that area first.

@gziolo
Copy link
Member

gziolo commented Sep 14, 2021

I think was able to narrow down the issue in #34809 to package dependencies in @wordpress/react-native-editor package. When you remove all dependencies that point to git URLs then there are no cb() errors reported even though the lock files was updated.

@youknowriad
Copy link
Contributor

I justed wanted to share that these npm install issues are getting too frequent and are making our release process a lot harder (due to breakage...), would be cool if we find a way forward.

How can we improve the situation here? @fluiddot @hypest @dcalhoun

@hypest
Copy link
Contributor

hypest commented Sep 16, 2021

would be cool if we find a way forward.

Indeed @youknowriad. As we speak, @ceyhun is working on a spike to replace the npm deps defined as git-dependencies, since it looks like that helps with the cb() never called! error.

@desrosj
Copy link
Contributor Author

desrosj commented Oct 1, 2021

Wanted to check in to see if the issue is happening less frequently after #34886. Happy to dig further and try more things if it's still an issue.

I'm not heavily involved day to day in this repo, but it does seem that the issue has improved some.

@gziolo
Copy link
Member

gziolo commented Oct 4, 2021

I'm back to my regular work schedule. I will check this week if #34886 unblocks npm 7 migration.

@fluiddot
Copy link
Contributor

fluiddot commented Oct 4, 2021

Wanted to check in to see if the issue is happening less frequently after #34886. Happy to dig further and try more things if it's still an issue.

I'm not heavily involved day to day in this repo, but it does seem that the issue has improved some.

The changes introduced in #34886 should have fixed the cb() never called! entirely, actually after checking the recent commits of the trunk branch, I see it hasn't been reproduced since it was merged 🎊 . In fact, I think we could close this issue unless there's something else that we would like to double-check before considering the issue solved.

@gziolo gziolo closed this as completed Oct 4, 2021
@desrosj
Copy link
Contributor Author

desrosj commented Nov 3, 2021

Just noting for reference that I have seen the error a couple times over the last 2-3 weeks. One instance: https://github.com/WordPress/gutenberg/runs/4089434918?check_suite_focus=true#step:4:4.

They do seem to be more random, and accompanied by other failures referencing separate network issues though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests.
Projects
None yet
Development

No branches or pull requests

6 participants