CI: speed up CI time by improving yarn install and caches (iteration 1 / >30%) #16581

belgattitude · 2023-05-01T09:11:08Z

Makes the minimal test action run around +/-12 minutes (whereas before [25-35] minutes).
Make the full test action run in+/- 55min cold cache (whereas before 1h20/1h40 warm cache?)
More resilient to npm outages (on warm cache).
yarn.lock changes does not invalidate the cache fully (adds and prunes what's changed). A new P/R will generally start with warm cache.
Less pressure on github cache budget

Reference benchmarks

COLD cache: see CI: speed up CI time by improving yarn install and caches (iteration 1 / >30%) #16581 (comment)
WARM CACHE: see CI: speed up CI time by improving yarn install and caches (iteration 1 / >30%) #16581 (comment)

What does it do?

Try to improve ci install time.

To make it work, the .yarn/cache/*.zip should be cached/restored as well.

There's no one-solution for every use cases, so I've created a separate install action based on experience.

Links:

To get a fast install: .yarn/cache + install_state + **/node_modules/ is ideal.

See (from https://github.com/belgattitude/httpx/actions/runs/4850630708/jobs/8643715240)

(1) required: yarn cache .yarn/cache.zip (resusable even on lock changes)
(2) optional: install_state (invalided on lock/os/node-version change)
(3) optional: node_modules/** (invalided on lock/os/node-version change)

As caching the node_modules folder(s) is heavy in term of compression/decompression time (+ invalidation on lock changes / architectures / libc...). Sometimes it's preferred to not enable it. Especially when using YARN_COMPRESSION_LEVEL=0

But looking at your workflows, it might make sense to save everything (so it can be reused between steps). It's heavier in term of cache size and compression/decompression.... might need to try both options to get measurements (install time but also timespend in storing/restoring the cache)

I'll comment directly in the code

Why is it needed?

Not really needed, but can be interesting to try out. Nice for 🌳 ♻️

How to test it?

CI should pass, install cache should bring some speed up.

Task list:

Fix invalid params in setup/node 980c6f1

Fixes Warning: Unexpected input(s) 'path', 'key', valid inputs are ['always-auth', 'node-version', 'node-version-file', 'architecture', 'check-latest', 'registry-url', 'scope', 'token', 'cache', 'cache-dependency-path']

Create and use a separate/custom action for install, based on https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a
Add a cache cleanup workflow, see https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a#cleanup-caches

strapi-cla · 2023-05-01T09:11:13Z

All committers have signed the CLA.

.github/actions/yarn-nm-install/action.yml

.github/workflows/adminBundleSize.yml

.github/workflows/clean-up-pr-caches.yml

belgattitude · 2023-05-01T09:48:41Z

Working with measurements will require to run the CI. Open to discuss

belgattitude · 2023-05-01T09:54:02Z

.github/actions/yarn-nm-install/action.yml

+
+    - name: 📥 Install dependencies
+      shell: bash
+      run: yarn install --immutable --inline-builds


added --inline-build on ci cause otherwise some logs are stored in a file... (node-gyp...)

.github/actions/yarn-nm-install/action.yml

belgattitude · 2023-05-01T09:57:26Z

Also note that yarn 4.0.0-rc.43 will give a small perf boost (I found out that latest 4rc are really stable).

Open to try as well. Especially the supportedArchitectures option that seems to fit here. Can be done in a separate P/R if wanted (need to tune the action/cache keys).

belgattitude · 2023-05-02T18:26:57Z

Just to keep a reference:

The test workflow - https://github.com/strapi/strapi/actions/runs/4852553521

I've just merged main into this PR.

@gu-stav Can you approve the workflows again ?

I'd like to see what a (full) warm cache situation gives and tune on what I find out

gu-stav

This looks very promising. Thanks for all the work you put in already! Two smaller questions.

Overall I don't see any problems and I think this is good to merge, but I'd like different opinions from the team.

gu-stav · 2023-05-05T09:06:34Z

.github/actions/yarn-nm-install/action.yml

+      uses: actions/cache@v3
+      with:
+        path: '**/node_modules'
+        key: yarn-nm-cache-${{ runner.os }}-${{ steps.yarn-config.outputs.CURRENT_NODE_VERSION }}-${{ steps.yarn-config.outputs.CURRENT_BRANCH }}-${{ hashFiles('yarn.lock', '.yarnrc.yml') }}


Just wondering if that is a bit over-optimized. Can we assume that CURRENT_NODE_VERSION and hashFiles is enough? Why do you think CURRENT_BRANCH should be part?

Note that based on the strapi repo. The node-modules folder isn't cached if not requested otherwise. False by default. In other words it's not enabled. We just cache the yarn zip archives

This is based on a previous benchmark made in this branch.

The reason node version is added is to be future proof. Imagine someone adds a library that will run a node-gyp postinstall build to create a binary.... for example sharp when vips-dev isn't installed.... there's other examples in the wild, I've chosen sharp cause you have it. It saves the compiled binary in the node modules folder and... 💥

But the most interesting reason to keep this... keep inline with the original gist that comes with doc.

https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a

In a next iteration, I'll plan to add more speed up. I prefer to restart from a situation I know well

Gu-stav is right @belgattitude I think that CURRENT_BRANCH is not needed since nodeversion and packages.yml already solves this what would mean less cold starts

What @Boegie19 said :) CURRENT_NODE_VERSION looks useful to me - just confused about CURRENT_BRANCH :)

Ah yes, haven't explained this...

The idea is that yarn/pnpm works in multiple phases:

Resolution - seems affected by presence of install_state

Fetch - only affected by zip cache presence

Link - seems affected by presence of node_modules/install_state

So why it's there ?

Over time in a repo... it's always nice to re-asses cost/benefit perf between the two possible strategies

Just cache *.zip

Cache *.zip + install-state + node_modules.

And that depends mostly on the packages installed in the repo (ie you install esbuild, vite, vitest at some point and the balance is different)

In the strapi repo

The strategy that makes sense is to just save *.zip

As in one of my repos, it's better to save everything

See https://github.com/belgattitude/nextjs-monorepo-example/actions/runs/4888683373/jobs/8726574932 (brings a 1 minute with strategy 1 install to 25seconds install) - not totally correct but that's the idea...

The drawback node_modules/install-state cannot leak between branches (long to explain). The best for now is to add the branch to the key.

Yes it's over-optimization in the current strapi repo... and it might look hard-to-read :) With another repo this optim is valid. (mitgh make sense in the future)

But the main idea is to not diverge too much from the reference gist: https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a

This is optimization after-all, better to have a reference doc

I see you point. What do you think would be the danger in sharing a cache across branches? We don't bundle/ inline any secrets, so I don't see a problem there - but maybe for caches (thinking e.g. of eslint / prettier) that might be shared and shouldn't?

Not a security concern. Link step and whatever will happen there + tons of things... but too long to explain today. As it's not enabled... let's keep it as close to reference gist if you agree

Wdyt ? Or you want more details ?

Take your time - I'd like to know before we merge what the reason for that is, because otherwise we won't be able to get rid of it again.

@Boegie19 @gu-stav

is right @belgattitude I think that CURRENT_BRANCH is not needed since nodeversion and packages.yml already solves this what would mean less cold starts

I see... It seems to cause a lot of questionning. Let's say it's for ensuring stability / correctness / reproducibility.

I've added it based on experience and my current understanding. The thing is without that you can have really hard to debug situations.

When saving node_modules + instal state... Yarn (will/might/can) skip totally the link step (the postinstall won't be run - in all situations). I used (will/might/can) because it depends on other factors.

Some examples

By default: prisma will generate your db types on postinstall based on the defined schema definition. Lock file does not change, cache is valid, you change the schema... then the postinstall is not run. The generated types are not in sync...

Some of the packages depends on deasync (ie xdm -mdx...) same thing.

And yes it's always fragile because some tools starts to cache things in node_modules/.cache (ie older turbo version, eslint, prettier...) or depend too much on that

The idea is : if full cache is enabled at some point -> avoid those edge-cases from the start (cause they will happen).

But I understand your points. So based previous points. What do you think of proceeding by steps:

Merge this first iteration (safe)

I'll prepare the iteration 2 (that will probably change the lock file and bring new light to the whole picture)

I'll make a 3rd iteration - cleanup to make this action as simple as possible based on the strapi usage

Is it something that would answer your concerns ?

Thank you for that explanation. It starts to make sense to me. I will leave this thread open for the other engineers to read, because I'd like more opinions before we merge this :)

.github/workflows/adminBundleSize.yml

belgattitude · 2023-05-05T10:29:23Z

Can someone approve the worflows ? Thx to #16581 (comment). And it will bring a small speed up too

Boegie19 · 2023-05-05T10:44:40Z

@belgattitude Mind adding the .github/workflows to

strapi/.github/filters.yaml

Lines 1 to 14 in e94f5ee

    
           backend: 
        
             - 'packages/**/package.json' 
        
             - 'packages/**/server/**/*.(js|ts)' 
        
             - 'packages/**/strapi-server.js' 
        
             - 'packages/{utils,generators,cli,providers}/**' 
        
             - 'packages/core/*/{lib,bin,ee}/**' 
        
             - 'api-tests/**' 
        
           frontend: 
        
             - 'packages/**/package.json' 
        
             - 'packages/**/admin/src/**' 
        
             - 'packages/**/admin/ee/admin/**' 
        
             - 'packages/**/strapi-admin.js' 
        
             - 'packages/core/helper-plugin/**' 
        
             - 'packages/admin-test-utils/**'

This will ensure tests will be ran properly on tooling changes

belgattitude · 2023-05-05T10:55:03Z

@belgattitude Mind adding the .github/workflows to

I didn't get what you asked for :)

Boegie19 · 2023-05-05T10:55:45Z

Don't worry about it I should have been more clear.

Also I am so happy with this PR faster CI so less time to wait to see if tests failed yes or no and this should also help massively with your github worker congestion problems

.github/actions/yarn-nm-install/action.yml

Boegie19 · 2023-05-05T11:11:27Z

@belgattitude Just to confirm you get now what I asked here right?

@belgattitude Mind adding the .github/workflows to

strapi/.github/filters.yaml

Lines 1 to 14 in e94f5ee

backend:

- 'packages/**/package.json'

- 'packages/**/server/**/*.(js|ts)'

- 'packages/**/strapi-server.js'

- 'packages/{utils,generators,cli,providers}/**'

- 'packages/core/*/{lib,bin,ee}/**'

- 'api-tests/**'

frontend:

- 'packages/**/package.json'

- 'packages/**/admin/src/**'

- 'packages/**/admin/ee/admin/**'

- 'packages/**/strapi-admin.js'

- 'packages/core/helper-plugin/**'

- 'packages/admin-test-utils/**'

This will ensure tests will be ran properly on tooling changes

belgattitude · 2023-05-05T11:32:02Z

Just to confirm you get now what I asked here right?

Still don't get it. I have this file in this branch and it's the same that the one in main and on the other branch.

I miss something, do I have to do something on this branch ?

Mind adding the .github/workflows to

The thing that I don't get is what is adding the workflow ?

Co-authored-by: Gustav Hansen <gu@stav.dev>

Boegie19 · 2023-05-05T14:36:29Z

@belgattitude
Basicly we have
strapi/.github/filters.yaml
this file is used for the filters to say what will be run you need to add the strapi/.github/workflows/**.* to the file so that it will always run all tests for workflow changes

derrickmehaffy · 2023-05-05T19:08:14Z

In order for them to be ran (what Boegie is saying) is you need to add the workflows to this file on both frontend and backend:

strapi/.github/filters.yaml

Lines 1 to 14 in 13502af

    
           backend: 
        
             - 'packages/**/package.json' 
        
             - 'packages/**/server/**/*.(js|ts)' 
        
             - 'packages/**/strapi-server.js' 
        
             - 'packages/{utils,generators,cli,providers}/**' 
        
             - 'packages/core/*/{lib,bin,ee}/**' 
        
             - 'api-tests/**' 
        
           frontend: 
        
             - 'packages/**/package.json' 
        
             - 'packages/**/admin/src/**' 
        
             - 'packages/**/admin/ee/admin/**' 
        
             - 'packages/**/strapi-admin.js' 
        
             - 'packages/core/helper-plugin/**' 
        
             - 'packages/admin-test-utils/**'

belgattitude · 2023-05-05T19:13:38Z

Got it I'll do. Thx

…-improve-install-time

belgattitude · 2023-05-06T11:13:36Z

Done in 3df8e70.

BTW very nice to way to do regroup rules. congrats

belgattitude · 2023-05-06T11:22:18Z

Hey guys, my time is quite limited but I'd like to give you the picture cause it starts to make sense to me

This iteration - safe change. bring >30% (ready now)
Second iteration - >45% yarn.lock modifs / compression level - Ci: improve install time iteration 2 #16638 (possibly in days)
Third iteration - > 55% optimize the lock and stability.
Fourth (optional) Re-assess pipline/steps workflow (deps) to reduce installs required (when applicable and based on feedback of course)
Five - Attempt to coordinate with setup/node and bring feature parity (in Performance opportunity for yarn 3 cache actions/setup-node#325). I feel this gonna have more impact on 🌳 :)
Last one: When stable in setup/node: remove this (months)

In parralel I'm exploring ways to even reduce more. But I'll have to patch yarn at some point. I try to get a safe way to bypass resolution/link stages. (months I guess). Quite exciting: on my tests done with a comparable setup (a bit larger) - I can make warm cache 20s install.I mean wish to get there :)

belgattitude · 2023-05-06T11:24:30Z

.github/filters.yaml

@@ -1,11 +1,15 @@
 backend:
+  - '.github/actions/yarn-nm-install/*.yml'


Shouldn't exactly be part of this P/R... but asked in #16581 (comment).

Added the yarn-nm-install folder too. Why ? I'll need it for second iteration. But don't worry it's part of a plan: #16581 (comment). tldr: the idea is to bring all of this in @setup/node by default. I'm trying to coordinate this in parallel

joshuaellis

Code looks good, I'm wondering if we should add some information to our contributor docs about how all this works incase in the future we need to make changes, there's a lot of valuable information scattered around the PR at the moment we could use.

gu-stav · 2023-05-10T07:18:17Z

@joshuaellis I thought something similar - should we add some conclusions of this PR as code comments to the action / workflow files to keep them co-located with the code? For the contributor docs we could maybe take a more high-level view and describe the CI pipeline and it's overall idea in general.

fix: @setup/node invalid parameters

980c6f1

belgattitude changed the title ~~fix: @setup/node invalid parameters~~ WIP : idea to speed up CI May 1, 2023

belgattitude added 5 commits May 1, 2023 11:13

feat: initial install action for yarn nm linker

8a55588

ci: use monorepo install action

8c76e63

ci: add cleanup cache

c01b01b

ci: add option to cache node_modules

c3961d3

ci: fix codefactor issue

86d2ab4