Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Local gradle build cache collisions #2778

Merged
merged 2 commits into from May 22, 2020

Conversation

rnorth
Copy link
Member

@rnorth rnorth commented May 21, 2020

by ensuring that find_gradle_jobs and check jobs always use different cache keys

#1874 surfaced a glitch in the GitHub Actions caching of our CI jobs. This manifested as PR tests being skipped.

For background/interest:

  1. We had two tests failing on a PR due to merge conflicts
  2. I pushed a commit that fixed one of them (I should have noticed there were two failures, but instead believed there was just one)
2020-05-21T15:11:02.8202572Z :postgresql:test (Thread[Daemon worker,5,main]) started.
2020-05-21T15:11:03.6192039Z Gradle Test Executor 1 started executing tests.
2020-05-21T15:11:04.8192531Z 
2020-05-21T15:11:04.8221625Z > Task :postgresql:test
2020-05-21T15:11:04.8223267Z Build cache key for task ':postgresql:test' is cbf31443f435cc983e30334fb9fbf5df
2020-05-21T15:11:04.8223770Z Task ':postgresql:test' is not up-to-date because:
2020-05-21T15:11:04.8223916Z   No history is available.
2020-05-21T15:11:04.8224363Z Did not find cache item 'cache/cbf31443f435cc983e30334fb9fbf5df' in S3 bucket
  1. That commit got built, but the test task skipped entirely due to caching:
2020-05-21T15:27:30.6515151Z > Task :postgresql:test FROM-CACHE
2020-05-21T15:27:30.6515923Z Build cache key for task ':postgresql:test' is c8ae3ed5e2e3e487194cf26b06391391
2020-05-21T15:27:30.6516556Z Task ':postgresql:test' is not up-to-date because:
2020-05-21T15:27:30.6516931Z   No history is available.
2020-05-21T15:27:30.6517546Z Loaded cache entry for task ':postgresql:test' with cache key c8ae3ed5e2e3e487194cf26b06391391
2020-05-21T15:27:30.6518183Z :postgresql:test (Thread[Execution worker for ':',5,main]) completed. Took 0.073 secs.
2020-05-21T15:27:30.6518575Z :postgresql:check (Thread[Daemon worker,5,main]) started.
  1. So, PR check state was all green and I merged it
  2. Then master branch rebuilt and the failing test re-emerged

Our belief was that this could relate to the find_gradle_jobs CI job - it works by disabling the gradle test executor so that it can quickly (simulating a full check) find out which gradle tasks need to be executed.

We initially believed that we could be accidentally pushing the cached result of these no-op test tasks to the remote gradle cache in S3, but the configuration is solid to avoid this problem.

We subsequently realised that actually the leakage is occurring via GitHub Actions caching: the final restore_key for the check job would match the key output during the find_gradle_jobs, and thus there is a possibility that the local gradle cache could be shared.

Quite simply, the no-op test task was being put into the local gradle cache, and some % of the time was being used as a signal that tests were already executed.

by ensuring that find_gradle_jobs and check jobs always use different cache keys
@rnorth rnorth merged commit 74a88ad into master May 22, 2020
@rnorth rnorth deleted the prevent-local-gradle-cache-collisions branch May 22, 2020 08:36
quincy pushed a commit to quincy/testcontainers-java that referenced this pull request May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants