Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRAP caching causes GitHub Actions Cache to churn #2030

Open
sayhiben opened this issue Dec 12, 2023 · 6 comments
Open

TRAP caching causes GitHub Actions Cache to churn #2030

sayhiben opened this issue Dec 12, 2023 · 6 comments

Comments

@sayhiben
Copy link

I'd like to discuss alternative storage solutions for TRAP caching, as the current implementation causes our monorepo to exhaust our available cache. Each run appears to consume about 100mb of space, and our current merge-to-default velocity means we run out of cache in about a day.

I am aware of the ability to disable trap_caching within the init action, however I would also like to continue using trap caching at some point in the future.

I have successfully leveraged GitHub Packages (i.e., GHCR) for caching Docker Images and other build artifacts in the past. Would that be another possible path forward for this useful feature?

@sayhiben sayhiben mentioned this issue Dec 12, 2023
3 tasks
@aeisenberg
Copy link
Contributor

Thanks for raising this issue and the suggestion. We are considering making some changes to the trap caching feature so that it only runs on the default branch. This will ensure that each repo will cache no more than one set of TRAP files. TRAP caching only really makes sense for when a repo runs codeql analysis on a schedule. Most repos only use scheduled runs on the default branch, so this should be workable for most repos.

For your monorepo, is the issue that you are running analyses in PR branches when the code to be analyzed hasn't changed? If so, perhaps you can use path expressions to avoid running codeql in this case.

@sayhiben
Copy link
Author

sayhiben commented Dec 12, 2023

Almost all our our trap cache is from our default branch, which we merge to more than 60 times a day on average. Even if this only ran on the default branch, it would continue to exhaust the available space in the GHA cache due to the differing cache keys

If the desire is to continue using the GHA Cache, and there's only a point in caching the most recent run, we'll have to choose an arbitrary cache key and manually invalidate the old cache when we want to update it. Otherwise, each CodeQL run against our default branch will add another 100mb to our usage (edit: i'd expect race conditions to emerge if multiple CodeQL workflow runs attempt to manage the cache simultaneously, however)

Example screenshot from our default branch attached
image

@sayhiben
Copy link
Author

Another possible approach might be to save the trap cache as an artifact, then pull the most recent artifact from the workflow run on the default branch when restoring the cache, though I seem to recall that arbitrary artifacts access might need a PAT, which would be something of a dealbreaker...

@aeisenberg
Copy link
Contributor

Our hope is that there will only ever be one cached item and it would contain the most recent TRAP on the default branch. Before uploading the new TRAP cache, the old TRAP cache would be removed. (There would be a few moments when there is no cache since the old one was deleted and the new one isn't uploaded yet, but hopefully that window will be small.)

@sayhiben
Copy link
Author

Ah, then the idea is to update the cache key to remain consistent and use the API to remove the old cache so the key can be updated? That should work

@aeisenberg
Copy link
Contributor

Yes. That's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants