Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: pip and conda caching in all workflows #16518

Merged
merged 7 commits into from Jul 4, 2022
Merged

Conversation

tupui
Copy link
Member

@tupui tupui commented Jul 1, 2022

actions/setup-python@v4 has a caching feature, which uses under the hood actions/cache.

Before this PR, only one workflow was using actions/cache.

[skip azp] [skip circle]
@tupui tupui added the CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure label Jul 1, 2022
@tupui tupui requested a review from rgommers July 4, 2022 12:40
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of cache-dependency-path: 'environment.yml' doesn't make sense to me, since that .yml file has little to do with the pip installs. Reading https://github.com/actions/setup-python#caching-packages-dependencies doesn't help me either. Shouldn't you just remove those lines @tupui?

@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

The usage of cache-dependency-path: 'environment.yml' doesn't make sense to me, since that .yml file has little to do with the pip installs. Reading actions/setup-python#caching-packages-dependencies doesn't help me either. Shouldn't you just remove those lines @tupui?

It does not work as the caching mechanism needs a file to look at. By default it looks for requirements.txt and as we don't have one, it fails. See previous runs which failed.

So I thought that the best compromise was to use environment.yml since we use it's content for one job. It should be representative of what we need to have in an env. And in the end it's just for caching.

@rgommers
Copy link
Member

rgommers commented Jul 4, 2022

But isn't that just worse than what we had? The ideal caching mechanism would always keep the installed wheels between jobs, and those would be specific to (a) platform, and (b) Python version. All those files are in ~/.cache/pip. Now it's keyed instead on something that is not specific to platform or Python version, so does it all end up in one large cache?

That the cache gets invalidated when we touch environment.yml is less important, but also not ideal.

id: cache
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm: this did work, right? Or are you trying to change this because it didn't?

Copy link
Member Author

@tupui tupui Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't that just worse than what we had? The ideal caching mechanism would always keep the installed wheels between jobs, and those would be specific to (a) platform, and (b) Python version. All those files are in ~/.cache/pip. Now it's keyed instead on something that is not specific to platform or Python version, so does it all end up in one large cache?

If I am reading the PR description that added this correctly (actions/setup-python#266), I think it does exactly what we had: ${{ runner.os }}-pip and adds the hash of the file.

id: cache
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip
Copy link
Member Author

@tupui tupui Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgommers
Copy link
Member

rgommers commented Jul 4, 2022

If I am reading the PR description that added this correctly (actions/setup-python#266), I think it does exactly what we had: ${{ runner.os }}-pip and adds the hash of the file.

Ah that is helpful, better than the docs. I agree, it does the right thing.

Looks like cache reuse isn't visible yet. Cache is saved in some jobs (e.g., https://github.com/scipy/scipy/runs/7178309878?check_suite_focus=true) but not all - and not restored/used yet. Can you push another commit and check that this actually works then?

[skip azp] [skip circle]
@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

I think it works now. All jobs fetch some cache except the dbg but it's because it's not using the Python action.

@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

The thing I am seeing is for conda. We update the env even if we have a cache hit (this is where we lose some minutes now). In the doc they invalidate the cache from time to time to mitigate this. I would bump every week or day. What do you think?

https://github.com/conda-incubator/setup-miniconda#caching-environments

@rgommers
Copy link
Member

rgommers commented Jul 4, 2022

We update the env even if we have a cache hit (this is where we lose some minutes now). In the doc they invalidate the cache from time to time to mitigate this. I would bump every week or day. What do you think?

Yes, daily seems good. The two missing bits seem to be ${{ steps.get-date.outputs.today }} and if: steps.cache.outputs.cache-hit != 'true'

[skip azp] [skip circle]
@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

Yes, done.

@rgommers
Copy link
Member

rgommers commented Jul 4, 2022

That didn't quite work:

Warning: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.

We can merge this PR minus the last commit, and do the conda caching separately if you prefer.

[skip azp] [skip circle]
@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

Indeed, this should do it. The path changed apparently as can be seen in the log. I used what's in the doc of the action and it matches the env variables.

env:
[8](https://github.com/scipy/scipy/runs/7183919673?check_suite_focus=true#step:8:8)
    CONDA: /Users/runner/miniconda3

  Prefix: /Users/runner/miniconda3/envs/scipy-dev

[skip azp] [skip circle]
@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

Seems to work fine. With the new commit we correctly skip the env update.
Screenshot 2022-07-04 at 20 36 41

@rgommers rgommers changed the title CI: pip caching in all workflows CI: pip and conda caching in all workflows Jul 4, 2022
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM now, thanks @tupui!

@rgommers rgommers merged commit fe66218 into scipy:main Jul 4, 2022
@rgommers rgommers added this to the 1.10.0 milestone Jul 4, 2022
@tupui tupui deleted the gh_pip_caching branch July 4, 2022 19:27
@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

Thanks Ralf! Small drop saved in our ocean of CI, but still 😅

@rgommers
Copy link
Member

rgommers commented Jul 4, 2022

Small drop saved in our ocean of CI, but still 😅

Yes indeed! I'm looking forward to our big bucket of drops saved when we can remove all the setup.py-based jobs:)

@tupui
Copy link
Member Author

tupui commented Jul 4, 2022

If you want I can give it a stab for the GH jobs on linux/macOS. Or are we waiting for some missing pieces?

@rgommers
Copy link
Member

rgommers commented Jul 4, 2022

Yes, there's a number of missing pieces - I'm hoping 3 months after the 1.9.0 release we can do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants