Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify caching of npm / Yarn / pnpm dependencies #304

Closed
borekb opened this issue Jul 23, 2021 · 9 comments
Closed

Clarify caching of npm / Yarn / pnpm dependencies #304

borekb opened this issue Jul 23, 2021 · 9 comments

Comments

@borekb
Copy link

borekb commented Jul 23, 2021

I was excited to see the blog post GitHub Actions: Setup-node now supports dependency caching. However, it's not clear to me what this caching actually means. From the README:

README.md

Caching yarn dependencies:

steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v2
  with:
    node-version: '14'
    cache: 'yarn'
- run: yarn install
- run: yarn test

Yarn caching handles both yarn versions: 1 or 2.

Caching can mean many things when it comes to Yarn 2, for example, these questions come to mind:

  • Is the node_modules folder cached?
  • When Yarn runs in the PnP mode, is .yarn/cache cached instead?
  • Or is it the local cache of downloaded tarballs (~/.cache/yarn) that is cached?
  • How is the caching done? Is it a local filesystem that is re-attached for future runs of the job or is the "slow" approach of @actions/cache that just uploads / downloads files to some S3-like blob storage? (I think we tested it and it wasn't even faster than downloading dependencies from registry.npmjs.com over and over.)

Some clarifications would be appreciated 🙏.

@maxim-lobanov
Copy link
Contributor

Hey @borekb , caching means exactly the same what actions/cache does.
We have integrated actions/cache functionality into actions/setup-node to simplify configuration and using for new customers.

We cache global package manager cache on machine (npm config get cache & yarn cache dir). Exactly the same like actions/cache docs recommend.
If you are interesting in some technical details, I think you can find them in ADR.

I think we tested it and it wasn't even faster than downloading dependencies from registry.npmjs.com over and over

It depends on number of dependencies in project. For small projects with 5-10 dependencies, cache doesn't bring much benefits (that fact that VMs can have a bit different connection speed, this random factor will neutralize caching profit.
For projects with big number of dependencies, actions/cache can give significant advantage.
We have posted some measurement results here.

@borekb
Copy link
Author

borekb commented Jul 25, 2021

Thanks, that is great info, I was mostly unsure about what a "dependency" means here. Caching the cache makes sense 👍.

Two possible suggestions for README:

  • Clarify the relation with actions/cache
  • What is being cached

... or maybe just point to the ADR that contains it all.

Thanks again!

@dominikg
Copy link

dominikg commented Jul 25, 2021

Also got questions regarding caching (please let me know it if doesnt belong here and i'll open a separate ticket):

Is it ok to use the same cache for different node versions?

I notice you don't use node-version as part of the cache key:

const primaryKey = `node-cache-${platform}-${packageManager}-${fileHash}`;
.

I see 2 potential issues with this:

  1. different runners of a matrix might want to store with the same key at the same time
  2. if a dependency had a node version specific post-install script, a runner from a different node version could end up with something meant for a different version

how can we tell if a cache hit occured?

If the cache was hit, installation can be done in offline mode, eg for pnpm pnpm install --frozen-lockfile --offline

@maxim-lobanov
Copy link
Contributor

@borekb makes sense to me. Let me think how we can improve readme

@dominikg , since we cache "global cache" instead "node_modules", Node.js version doesn't matter and cache is identical.
This way we cache only "download" step for packages and not "install" step. It is how actions/cache recommends.

The caching results of "post-install" scripts is not recommended because they can have side affects (change anything on machine outside of "node_modules" folder) or produce different results based on VM state (Image version on VM is updated every week and if result of some scripts depend on machine state, build can be broken after machine update)

@Shinigami92
Copy link

What about optionally pass a manual cache key?

I'm willing to contribute this feature if it gets accepted

@maxim-lobanov
Copy link
Contributor

@Shinigami92 , it is not something that we would like to support in scope of setup-node for now. I would suggest to use actions/cache directly for this use-case.

@akd-io
Copy link

akd-io commented Jul 30, 2021

Do I understand currectly that setup-node doesn't cache node_modules but instead the different cache directories?

If so, wouldn't that be a good argument for taking a look at #103 again?

@sandstrom
Copy link

sandstrom commented Aug 24, 2021

I recently tried to switch from using (actions/cache)[https://github.com/actions/cache] over to actions/setup-node using the new cache functionality.

However, it doesn't seem to share the cache across branches (which actions/cache does), because the first run on every branch will take ~5x longer now.

"The cache is scoped to the key and branch. The default branch cache is available to other branches."
https://github.com/actions/cache#cache-scopes

Do you have any plans on changing this?

With actions/cache, this is the logic we used:

- name: Cache node modules
  uses: actions/cache@v2
  env:
    cache-name: cache-node-modules
  with:
    path: ~/.npm
    key: ${{ runner.os }}-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-${{ env.cache-name }}-
      ${{ runner.os }}-

Let me know if this is better as a separate issue.

@dmitry-shibanov
Copy link
Contributor

Hello @akd-io. You're right. It does not cache node_modules directory. Action uses npm config get cache to get global cache directory where all dependencies are cached. I think we can accept this pull request because as it was mentioned not everyone can have committed package-lock.json.

Hello @sandstrom. As I checked, the default branch cache is available to other branches in setup-node. Could you please create the separate issue and provide repro steps. I can suppose that on branches you change lock files, that's why action can't get previous primary key.

For now I'm closing the issue. If you have any other question feel free to ping me in the thread or create separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@borekb @sandstrom @dominikg @Shinigami92 @maxim-lobanov @dmitry-shibanov @akd-io and others