New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: speed up CI time by improving yarn install and caches (iteration 1 / >30%) #16581
Conversation
Working with measurements will require to run the CI. Open to discuss |
|
||
- name: 📥 Install dependencies | ||
shell: bash | ||
run: yarn install --immutable --inline-builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added --inline-build on ci cause otherwise some logs are stored in a file... (node-gyp...)
Also note that yarn 4.0.0-rc.43 will give a small perf boost (I found out that latest 4rc are really stable). Open to try as well. Especially the supportedArchitectures option that seems to fit here. Can be done in a separate P/R if wanted (need to tune the action/cache keys). |
Just to keep a reference: The test workflow - https://github.com/strapi/strapi/actions/runs/4852553521 I've just merged main into this PR. @gu-stav Can you approve the workflows again ? I'd like to see what a (full) warm cache situation gives and tune on what I find out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very promising. Thanks for all the work you put in already! Two smaller questions.
Overall I don't see any problems and I think this is good to merge, but I'd like different opinions from the team.
uses: actions/cache@v3 | ||
with: | ||
path: '**/node_modules' | ||
key: yarn-nm-cache-${{ runner.os }}-${{ steps.yarn-config.outputs.CURRENT_NODE_VERSION }}-${{ steps.yarn-config.outputs.CURRENT_BRANCH }}-${{ hashFiles('yarn.lock', '.yarnrc.yml') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if that is a bit over-optimized. Can we assume that CURRENT_NODE_VERSION
and hashFiles
is enough? Why do you think CURRENT_BRANCH
should be part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that based on the strapi repo. The node-modules folder isn't cached if not requested otherwise. False by default. In other words it's not enabled. We just cache the yarn zip archives
This is based on a previous benchmark made in this branch.
The reason node version is added is to be future proof. Imagine someone adds a library that will run a node-gyp postinstall build to create a binary.... for example sharp when vips-dev isn't installed.... there's other examples in the wild, I've chosen sharp cause you have it. It saves the compiled binary in the node modules folder and... 💥
But the most interesting reason to keep this... keep inline with the original gist that comes with doc.
https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a
In a next iteration, I'll plan to add more speed up. I prefer to restart from a situation I know well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gu-stav is right @belgattitude I think that CURRENT_BRANCH is not needed since nodeversion and packages.yml already solves this what would mean less cold starts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @Boegie19 said :) CURRENT_NODE_VERSION
looks useful to me - just confused about CURRENT_BRANCH
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, haven't explained this...
The idea is that yarn/pnpm works in multiple phases:
- Resolution - seems affected by presence of install_state
- Fetch - only affected by zip cache presence
- Link - seems affected by presence of node_modules/install_state
So why it's there ?
Over time in a repo... it's always nice to re-asses cost/benefit perf between the two possible strategies
- Just cache *.zip
- Cache *.zip + install-state + node_modules.
And that depends mostly on the packages installed in the repo (ie you install esbuild, vite, vitest at some point and the balance is different)
In the strapi repo
- The strategy that makes sense is to just save *.zip
- As in one of my repos, it's better to save everything
See https://github.com/belgattitude/nextjs-monorepo-example/actions/runs/4888683373/jobs/8726574932 (brings a 1 minute with strategy 1 install to 25seconds install) - not totally correct but that's the idea...
The drawback node_modules/install-state cannot leak between branches (long to explain). The best for now is to add the branch to the key.
Yes it's over-optimization in the current strapi repo... and it might look hard-to-read :) With another repo this optim is valid. (mitgh make sense in the future)
But the main idea is to not diverge too much from the reference gist: https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a
This is optimization after-all, better to have a reference doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you point. What do you think would be the danger in sharing a cache across branches? We don't bundle/ inline any secrets, so I don't see a problem there - but maybe for caches (thinking e.g. of eslint / prettier) that might be shared and shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a security concern. Link step and whatever will happen there + tons of things... but too long to explain today. As it's not enabled... let's keep it as close to reference gist if you agree
Wdyt ? Or you want more details ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take your time - I'd like to know before we merge what the reason for that is, because otherwise we won't be able to get rid of it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is right @belgattitude I think that CURRENT_BRANCH is not needed since nodeversion and packages.yml already solves this what would mean less cold starts
I see... It seems to cause a lot of questionning. Let's say it's for ensuring stability / correctness / reproducibility.
I've added it based on experience and my current understanding. The thing is without that you can have really hard to debug situations.
When saving node_modules + instal state... Yarn (will/might/can) skip totally the link step (the postinstall won't be run - in all situations). I used (will/might/can) because it depends on other factors.
Some examples
-
By default: prisma will generate your db types on postinstall based on the defined schema definition. Lock file does not change, cache is valid, you change the schema... then the postinstall is not run. The generated types are not in sync...
-
Some of the packages depends on deasync (ie xdm -mdx...) same thing.
-
And yes it's always fragile because some tools starts to cache things in node_modules/.cache (ie older turbo version, eslint, prettier...) or depend too much on that
The idea is : if full cache is enabled at some point -> avoid those edge-cases from the start (cause they will happen).
But I understand your points. So based previous points. What do you think of proceeding by steps:
- Merge this first iteration (safe)
- I'll prepare the iteration 2 (that will probably change the lock file and bring new light to the whole picture)
- I'll make a 3rd iteration - cleanup to make this action as simple as possible based on the strapi usage
Is it something that would answer your concerns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for that explanation. It starts to make sense to me. I will leave this thread open for the other engineers to read, because I'd like more opinions before we merge this :)
Can someone approve the worflows ? Thx to #16581 (comment). And it will bring a small speed up too |
@belgattitude Mind adding the .github/workflows to Lines 1 to 14 in e94f5ee
This will ensure tests will be ran properly on tooling changes |
I didn't get what you asked for :) |
Don't worry about it I should have been more clear. Also I am so happy with this PR faster CI so less time to wait to see if tests failed yes or no and this should also help massively with your github worker congestion problems |
@belgattitude Just to confirm you get now what I asked here right?
|
Still don't get it. I have this file in this branch and it's the same that the one in main and on the other branch. I miss something, do I have to do something on this branch ?
The thing that I don't get is what is adding the workflow ? |
Co-authored-by: Gustav Hansen <gu@stav.dev>
@belgattitude |
In order for them to be ran (what Boegie is saying) is you need to add the workflows to this file on both frontend and backend: Lines 1 to 14 in 13502af
|
Got it I'll do. Thx |
Done in 3df8e70. BTW very nice to way to do regroup rules. congrats |
Hey guys, my time is quite limited but I'd like to give you the picture cause it starts to make sense to me
In parralel I'm exploring ways to even reduce more. But I'll have to patch yarn at some point. I try to get a safe way to bypass resolution/link stages. (months I guess). Quite exciting: on my tests done with a comparable setup (a bit larger) - I can make warm cache 20s install.I mean wish to get there :) |
@@ -1,11 +1,15 @@ | |||
backend: | |||
- '.github/actions/yarn-nm-install/*.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't exactly be part of this P/R... but asked in #16581 (comment).
Added the yarn-nm-install folder too. Why ? I'll need it for second iteration. But don't worry it's part of a plan: #16581 (comment). tldr: the idea is to bring all of this in @setup/node by default. I'm trying to coordinate this in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, I'm wondering if we should add some information to our contributor docs about how all this works incase in the future we need to make changes, there's a lot of valuable information scattered around the PR at the moment we could use.
@joshuaellis I thought something similar - should we add some conclusions of this PR as code comments to the action / workflow files to keep them co-located with the code? For the contributor docs we could maybe take a more high-level view and describe the CI pipeline and it's overall idea in general. |
Reference benchmarks
What does it do?
Try to improve ci install time.
To make it work, the
.yarn/cache/*.zip
should be cached/restored as well.There's no one-solution for every use cases, so I've created a separate install action based on experience.
Links:
To get a fast install:
.yarn/cache
+install_state
+**/node_modules/
is ideal.See (from https://github.com/belgattitude/httpx/actions/runs/4850630708/jobs/8643715240)
.yarn/cache.zip
(resusable even on lock changes)As caching the node_modules folder(s) is heavy in term of compression/decompression time (+ invalidation on lock changes / architectures / libc...). Sometimes it's preferred to not enable it. Especially when using YARN_COMPRESSION_LEVEL=0
But looking at your workflows, it might make sense to save everything (so it can be reused between steps). It's heavier in term of cache size and compression/decompression.... might need to try both options to get measurements (install time but also timespend in storing/restoring the cache)
I'll comment directly in the code
Why is it needed?
Not really needed, but can be interesting to try out. Nice for 🌳 ♻️
How to test it?
CI should pass, install cache should bring some speed up.
Task list:
Create and use a separate/custom action for install, based on https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a
Add a cache cleanup workflow, see https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a#cleanup-caches