Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache compression - cross OS support #984

Closed
Phantsure opened this issue Nov 14, 2022 · 21 comments
Closed

Cache compression - cross OS support #984

Phantsure opened this issue Nov 14, 2022 · 21 comments
Assignees

Comments

@Phantsure
Copy link
Contributor

Phantsure commented Nov 14, 2022

Problems

There have been multiple reported issues related to compression of caches. The issue we are looking to solve:

  • Cross-OS compatibility: Currently on windows cache uses a different compression algorithm (gzip) as compared to on linux | mac (zstd). This leads to different versions for caches created on different platforms. Therefore caches created on windows might not be recoverable on linux | mac. For more details on cache version see this.

Proposal

We are looking to solve both the problems as follows::

  • Change the default tar used on windows runners to GNUtar. This is already suggested as a workaround for people in these problems. Same tooling will ensure that cache can be reused across all three OSes.
  • Fallback to BSDtar with zstd on windows. BSDtar is already present on Windows runners by default but it does not use zstd due to the issue of compression hanging with large caches. In our testing, we found that performing archiving and compression as separate processes (instead of calling tar --use-compress-program) does not have the hang problem for caches of size up to 2GB.

Reasoning to choose GNUtar over BSDtar as default

BSDtar has some implementation problems. That’s the reason it stopped being used in MacOS for our action. For more details see actions/toolkit#552.

Related issues which should get fixed with this proposal

We have consolidated issues related to the above problems here. Feel free to provide feedback regarding these in this issue itself.

@mattjohnsonpint
Copy link

mattjohnsonpint commented Nov 16, 2022

I'm glad this is being addressed, but I'm not sure the stated proposal is the right path forward. Or rather, it may be not the right path for all users.

I don't believe that cross-OS compatibility of a single cache archive should be the primary driver, as it's quite common to put the runner.os in the cache key, as seen in the basic example in the docs. In my case, we need a different set of packages cached and restored for each OS, so we wouldn't use one common cache anyway.

Also, it sounds as if the only plan for speeding things up on Windows is to switch to GNU tar. As I stated previously, and reported by others here and here - this has not resolved the problem. Cache restore is still very slow, even with GNU tar.

I think the part that is missing from the proposal is that I believe there should be an option to choose the archive format, and that should be exposed all the way up to the cache action. Personally, I would like to use .7z as the format on Windows. Or perhaps just .zip, but still using 7-Zip to do the work because it's much faster than anything that ships with Windows itself, and it's already pre-installed on the GitHub Actions runner images.

@mattjohnsonpint
Copy link

BTW - 7-Zip does support tar and tar.gz formats. And like I said, it's pre-installed on both windows-2019 and windows-2022 runner images. I'm not certain if it would be faster than BSD or GNU tar or not. Perhaps you could experiment and see?

@lvpx
Copy link
Contributor

lvpx commented Nov 18, 2022

@mattjohnsonpint appreciate your feedback and suggestions on this. The stated proposal mainly addresses customer pain points surrounding reducing build time by leveraging cache across their runners. This proposal should also provide some performance improvements for windows as now the compression algo in use will be zstd instead gzip. We are also exploring other solutions for windows runner performance including 7-zip to provide better performance on windows when cross-os caching is not a concern.

@Phantsure Phantsure changed the title Cache compression - cross OS support and better Windows performance Cache compression - cross OS support Nov 28, 2022
@Phantsure
Copy link
Contributor Author

Phantsure commented Dec 5, 2022

New beta release v3.1.0-beta

We have released a beta version of cache under actions/cache@v3.1.0-beta for users to test for functionality. This should cache using GNU tar with zstd as compression and has BSD tar with zstd as fallback on windows. Functionality on linux | mac should remain same.

Expected behaviour

All the mentioned issues in above description should be resolved and not happening with this release. Performance might vary for different files on windows. No changes should happen to mac | linux cache performance

Call outs

  • Old caches on windows wouldn't be recovered as different compression algorithm is used in beta.
  • Don't expect any or major performance improvement

colinrotherham added a commit to alphagov/govuk-frontend that referenced this issue Dec 7, 2022

Verified

This commit was signed with the committer’s verified signature. The key has expired.
colinrotherham Colin Rotherham
We’re seeing “small file” `tar.exe` performance issues actions/cache#984
@Phantsure
Copy link
Contributor Author

👋🏼 Created a discussion for any feedback: #1019

@Safihre
Copy link

Safihre commented Dec 7, 2022

You closed all the performance related issues and linked them to this, however, you say the patch will not resolve performance problems..
So either fix the performance as well, or have a seperate tracker for the performance problems.

Personally I've seen much more complaints about performance than requests for cross os caches.... So just like @mattjohnsonpint I am surprised about the priorities here.

@bishal-pdMSFT
Copy link
Contributor

@Safihre that's correct. We started by thinking to tackle both cross-os and performance problems together but realized that performance needs a more in-depth and targeted approach and needs more time. That's why we will first ship out the cross-os support. We have a separate internal tracker for performance problem. Will create a public tracker as well once we more details to share.

@Phantsure
Copy link
Contributor Author

Update

Tag v3.1.0-beta has been updated to account for old caches saved by gzip compression on windows and fallback to gzip while recovering and decompressing. So old caches should be recoverable and new caches would be compressed using zstd.

Call out

  • While BSD and GNU tars are most compatible there might be some onetime corruption as old caches were saved using BSD tar and recovered using GNU tar.

@Phantsure
Copy link
Contributor Author

Update

Have released all the changes to v3. Now cache would be able to work cross os. Go ahead to release notes for details on what new is added: https://github.com/actions/cache/blob/main/RELEASES.md#321
Keeping this issue open for sometime for feedback

@Phantsure
Copy link
Contributor Author

Update

We have reverted the changes due to multiple complains regarding symlinks not working. If a cache containing symlink is saved on linux|mac and restored on windows then GNU tar not able to recover those symlinks. We are actively working to resolve this issue, till then have reverted the changes.

Related issue: #1043

@Phantsure
Copy link
Contributor Author

Phantsure commented Jan 5, 2023

Update

Currently we are fixing a problem related to symlinks creation on windows during a tarball extraction which was created on linux|mac. Common example is caching a node_modules folder. Original issue which was reported and lead to reverting original changes.

Solution:

We found that symlink creation is handled differently for different platforms and filesystems. We encountered two solutions to handle this:

  • Use --dereference with tar on creation to remove any symlinks present while creation:
    • Pros: Removed symlinks entirely while creation so no problem while handling
    • Cons: Size of tar would increase as this would create copies of same file.
  • Add environment variable MSYS=winsymlinks:nativestrict to handle symlinks as per windows symlink: Blog. This works for any tar based on msys2 platform, which is our current tar from git-for-windows.
    • Pros: Creates a symlink for any unix based symlink so should work as usual.
    • Cons: Might break if symlink created is still not handled by code. So user should still understand that symlinks on cross os caches might break.

We have chosen to go with 2nd solution and would love hear feedback from community.

@rfay
Copy link

rfay commented Jan 5, 2023

I believe that MSYS=winsymlinks:nativestrict is a great way to go, assuming using git-bash of course. Older windows systems also have to have the "Developer mode" set for this to work, I'm not sure whether that is still a problem.

@bishal-pdMSFT
Copy link
Contributor

Hosted runners do run with administrative privileges so this should be fine

@Phantsure
Copy link
Contributor Author

Update

We have released new changes with symlink issue fixed and allowing to use cross-os cache as an opt-in feature by adding enableCrossOsArchive input as true to cache action.
Note: All previous workflows should work as usual with cache miss for first time on windows due version changes done. This was done to keep previous workflows on windows cache separate as compared to cross-os cache.

@mkht
Copy link

mkht commented Jan 10, 2023

I encountered a problem when restoring a cache saved by windows-latest with a self-hosted runner in Windows Server 2016.
After some research I found here and was able to solve the problem by installing GNU tar and zstd on the runner.

It would be helpful if it be documented that these installations are required (or recommended?) when using cache actions in a self-hosted runner.

@lvpx
Copy link
Contributor

lvpx commented Jan 10, 2023

Hi @mkht, thank you for this feedback. We had these documented previously in the workarounds section, we'll have these added to Pre-requisites moving forward for better visibility.

@maybeec
Copy link
Contributor

maybeec commented Jan 12, 2023

@mkht I think GNU tar already comes with Git installation: #576 (comment)

At least for my workflows this works fine: https://github.com/devonfw-actions/java-maven-setup/blob/main/action.yml#L53

@mkht
Copy link

mkht commented Jan 12, 2023

Hi @maybeec , thanks for the advice.

As odd as it may seem, Git is not installed on my self-hosted runner.
To keep the maintenance of the runner easily, I only install the minimum number of applications that are necessary.

As far as I know, there is no mention of Git being required for the self-hosted runner to work. The lack of Git has not caused any fatal problems in the past.
https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#requirements-for-self-hosted-runner-machines

@lvpx
Copy link
Contributor

lvpx commented Jan 12, 2023

Hi @mkht, so we have added gnutar and zstd as pre-requisites to the README. Closing this issue. Thank you everyone for the feedback and suggestions.

@alena-bot
Copy link

Проблема с hht/404

@alena-bot
Copy link

Кто нибудь может помочь? пожалуйста

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants