Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI GitHub workflows use #7614

Closed
3 tasks done
oscard0m opened this issue Oct 12, 2021 · 13 comments
Closed
3 tasks done

CI GitHub workflows use #7614

oscard0m opened this issue Oct 12, 2021 · 13 comments
Labels
type:ci CI related issue

Comments

@oscard0m
Copy link

New Feature / Enhancement Checklist

Current Limitation

GitHub CI workflows using setup-node not getting benefit from cache property, which should improve execution times.

Feature / Enhancement Description

Start getting benefit from this cache prop.

Example Use Case

See PR

Alternatives / Workarounds

3rd Party References

https://github.blog/changelog/2021-07-02-github-actions-setup-node-now-supports-dependency-caching/

@parse-github-assistant
Copy link

Thanks for opening this issue!

  • 🎉 We are excited about your ideas for improvement!

@mtrezza mtrezza added the type:ci CI related issue label Oct 12, 2021
@mtrezza mtrezza linked a pull request Oct 12, 2021 that will close this issue
2 tasks
@mtrezza
Copy link
Member

mtrezza commented Oct 12, 2021

GitHub CI workflows using setup-node not getting benefit from cache property, which should improve execution times.

  • How would that improve execution times vs. actions/cache?
  • How does the cache parameter interplay with matrix execution? - Currently we are are creating a cache based on matrix

@oscard0m
Copy link
Author

oscard0m commented Oct 12, 2021

GitHub CI workflows using setup-node not getting benefit from cache property, which should improve execution times.

  • How would that improve execution times vs. actions/cache?

If the workflow has a by-default usage I guess is an equivalent benefit. It also depend in the number of dependencies the project have (details here).

Since it seems the workflow does not re-use the cache between node versions, it seems there could be room for improvement here? (not sure if you have a use case where you explicitly need to NOT re-use the cache between node versions in your CI). Here's the ADR where it is explained on detail.

  • How does the cache parameter interplay with matrix execution? - Currently we are are creating a cache based on matrix

My understanding is that it re-uses the cache between versions:

The action follows actions/cache guidelines, and caches global cache on the machine instead of node_modules, so cache can be reused between different Node.js versions.

source: https://github.com/actions/setup-node/blob/main/docs/advanced-usage.md#caching-packages-dependencies

@mtrezza

@mtrezza
Copy link
Member

mtrezza commented Oct 12, 2021

From the docs:

GitHub will remove any cache entries that have not been accessed in over 7 days. There is no limit on the number of caches you can store, but the total size of all caches in a repository is limited to 5 GB. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 5 GB.

It seems that it doesn't matter whether we have separate cache entries per matrix permutation. The cache is available, whether it's 1 cache file or N cache files, so I would not expect any performance improvement.

I am suspicious of having a single cache for all matrix permutations. Matrix caching is explicitly described in an example, using actions/cache. I do not think that matrix caching is done automatically "behind the scenes" using the shortcut actions/setup-node:cache. In fact I think the parameter serves as a shortcut for simple caching scenarios, but for complex ones, one has to explicitly use actions/cache. Just my guess - it would be good if you could find any hint on this.

@oscard0m
Copy link
Author

From the docs:

GitHub will remove any cache entries that have not been accessed in over 7 days. There is no limit on the number of caches you can store, but the total size of all caches in a repository is limited to 5 GB. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 5 GB.

It seems that it doesn't matter whether we have separate cache entries per matrix permutation. The cache is available, whether it's 1 cache file or N cache files, so I would not expect any performance improvement.

Nice find! I didn't know this 👍🏽 . As you pointed, if the cache expires every 7 days + has a capacity of 5Gb probably there is not much performance improvement going to happen. I guess it depends in the number of dependencies of the project and the frequency the CI runs.

I am suspicious of having a single cache for all matrix permutations. Matrix caching is explicitly described in an example, using actions/cache. I do not think that matrix caching is done automatically "behind the scenes" using the shortcut actions/setup-node:cache. In fact I think the parameter serves as a shortcut for simple caching scenarios, but for complex ones, one has to explicitly use actions/cache. Just my guess - it would be good if you could find any hint on this.

Yep, from what I read, seems that it is not caching per version as you do with actions/cache. As you point, this cache option is thought for covering basic scenarios:

We don't pursue the goal to provide wide customization of caching in scope of actions/setup-node action. The purpose of this integration is covering ~90% of basic use-cases. If user needs flexible customization, we should advice them to use actions/cache directly. (source)


For my understanding and learning, is there a particular use case I'm missing on why you need different cache folder per node version?

@mtrezza
Copy link
Member

mtrezza commented Oct 12, 2021

Maybe a different node version produces a different cache? Again, no reference, one would probably have to research npm caching compatibility across versions. actions/cache is the suggested way for matrix caching by GitHub. This is the status quo that works, so we don't need an argument to justify the status quo. To change this, we would need an argument against the status quo.

@oscard0m
Copy link
Author

Maybe a different node version produces a different cache? Again, no reference, one would probably have to research npm caching compatibility across versions.

I found this: actions/setup-node#304 (comment)
My understanding is that the cache is the same one (for download). @maxim-lobanov @dominikg if you have some free time to give us some thoughts on this, we would LOVE it!

actions/cache is the suggested way for matrix caching by GitHub.

The example I see is with Operating System matrix for pip, not sure if that would apply to setup-node. These are different caches we are talking about in the end, right?


Just posted a question In GitHub's Community to get some help: https://github.community/t/actions-setup-node-cache-with-matrix-of-node-versions/206940

@dominikg
Copy link

Maxim's response in the issue you linked is pretty clear I think. As far as I understand it only the content of pnpm global store is cached which is not affected by node version and safe to share between different node versions.

@mtrezza
Copy link
Member

mtrezza commented Oct 16, 2021

@oscard0m thanks for investigating.

The suggested change of removing the node version won't bring any benefit as I understand. The benefit of keeping it however is that we have a clear cache separation between node versions. Even if that is not needed now, it seems to be a more sustainable approach because we cannot tell whether the Node.ja team will keep cache compatibility across node versions, at least I haven't found any reference.

If we find an argument in favor of removing the separation by node version, let's reevaluate.

@oscard0m
Copy link
Author

@oscard0m thanks for investigating.

Thanks for appreciating it and also thanks for all the time and care you are taking in this discussion. I'm learning a lot 💪🏽

The suggested change of removing the node version won't bring any benefit as I understand.

On my understanding there would be a benefit. To not download dependencies already downloaded by other executions of that same step in the pipeline with another node version.

For example, taking this matrix as example, the benefit would be for all the succeeding executions of the pipeline using node v14.18.0, but also node v12.22.6 and node v15.14.0 too. It should safe execution time as far as I understand.

I tried to check the execution times for the opened PR but it requires an approval to get checks running so... no real information to compare with.

The benefit of keeping it however is that we have a clear cache separation between node versions. Even if that is not needed now, it seems to be a more sustainable approach because we cannot tell whether the Node.ja team will keep cache compatibility across node versions, at least I haven't found any reference.

If we find an argument in favor of removing the separation by node version, let's reevaluate.

I don't know how npm and its cacache works internally and I don't know the plans for upcoming versions. So... no strong arguments on this.

If you don't see any benefit and you think is not safe to rely on the same npm downloaded dependencies, we can close this issue an revisit it in the future if more strong arguments come up on this topic.

@mtrezza
Copy link
Member

mtrezza commented Oct 17, 2021

On my understanding there would be a benefit. To not download dependencies already downloaded by other executions of that same step in the pipeline with another node version.

The dependencies won't be downloaded, they are cached already. They are just not cached in 1 single cache but N individual caches by node version. The only difference would be that the different caches wouldn't have to be loaded, but I expect them to be loaded fast, because that's the purpose of a cache as they are loaded from GitHub's internal storage.

My concern is that bugs due to caching can be difficult to analyze, so if node requires caching per version in the future, we have the effort of investigating and reverting this workflow change across multiple repos. Because for consistency we would not make this proposed change only in this repo but a few other repos too, so there would be some effort involved.

However, let me make a test run using a single cache and compare the execution time. If the time is significantly better, we would have a pro-argument.

@mtrezza
Copy link
Member

mtrezza commented Oct 17, 2021

Below is a performance comparison between the CI run times of this PR vs. status quo. As we can see, there is no noticeable performance improvement. The tests usually run +/- 1 min, so the conclusion would be that the two run at the same speed.

I'm closing this PR, as the original issue was incorrect (we were already using cache), and the suggestion to not separate the cache per node version does not yield any improvement, but potentially opens up issues if node caches become incompatible across node versions in the future.

@oscard0m Thanks for opening this issue and your investigations. If you come across a pro-argument for using a single cache, please let us know and we will gladly reconsider.


This PR:

Screen Shot 2021-10-17 at 20 41 23
Screen Shot 2021-10-17 at 21 02 37

Status quo:
Screen Shot 2021-10-17 at 21 02 04
Screen Shot 2021-10-17 at 21 02 23

@mtrezza mtrezza closed this as completed Oct 17, 2021
@oscard0m
Copy link
Author

Below is a performance comparison between the CI run times of this PR vs. status quo. As we can see, there is no noticeable performance improvement. The tests usually run +/- 1 min, so the conclusion would be that the two run at the speed.

I'm closing this PR, as the original issue was incorrect (we were already using cache), and the suggestion to not separate the cache per node version does not yield any improvement, but potentially opens up issues if node caches become incompatible across node versions in the future.

@oscard0m Thanks for opening this issue and your investigations. If you come across a pro-argument for using a single cache, please let us know and we will gladly reconsider.


This PR:

Screen Shot 2021-10-17 at 20 41 23

Screen Shot 2021-10-17 at 21 02 37

Status quo:

Screen Shot 2021-10-17 at 21 02 04

Screen Shot 2021-10-17 at 21 02 23

Thanks for the research and experimentation on this. I learned a lot with you @mtrezza !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:ci CI related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants