Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate CI to GH Actions #2964

Merged
merged 155 commits into from Feb 10, 2021
Merged

Migrate CI to GH Actions #2964

merged 155 commits into from Feb 10, 2021

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Dec 28, 2020

Phew, this has been fun 😄! Hopefully it provides a solid start to end up with a CI that is useful, comprehensive and extensible to future needs. Sorry about the insane amount of commits, but it took a bit to get all the pieces in place!

This will close #2919

Summary

This PR provides CI for the following OS / architectures:

  • Linux x64
  • MacOS
  • Windows
  • Linux PPC (emulated with QEMU/Docker)
  • Linux ARM (emulated with QEMU/Docker)

The matrix gets expanded with:

  • CUDA versions from 10.0 and above are provided.
  • Python 3.6 to 3.9, when applicable (only CPU builds for now, to speed things up).
  • Docs only job that does not run tests.
  • Compilers, when applicable. I made sure to use system compilers whenever possible, but there are entries that cover conda-forge compilers too, since we want to guarantee those pass the full suite.

Other features:

  • CCache should be enabled in all runs
  • Steps have been isolated and/or grouped as much as possible to easily identify the failing parts.

Cron jobs run nightly, by default on all matrix entries. This can be changed if needed.

Comments

  • Secrets:
    • CACHE_VERSION: needed in case cache needs to be cleared. Do not set to something like 0 because GHA will mask it believing it's sensitive material, making the output very difficult to read. Better use something like plkghj; short, meaningless and hard to find in a useful message. Cache expires after a week anyway, so no need to keep a correlative scheme here.
    • S3 login details for docs deployment must be defined as Github secrets. I haven't been able to test this part due to lack of access.
  • Docs:
    • This PR also fixes PDF builds for the documentation (check changes in developer.rst).
    • There are some 404s and dead links that need to be updated or removed.
  • PowerPC:
    • Several critical failures with GCC 9 (conda-forge). All OK with GCC7.
    • Some pytest warnings with division by zero.

Before merge

  • Update README badges
  • Tests targeting conda-forge should use conda-forge-pinning version restraints
  • Provide needed secrets (see above)
  • Check cache is working (can also be done / fixed in separate PR if needed)
  • Check results are consistent with Azure / Travis
  • Remove Azure and Travis from repo
  • Some tests fail with recent Gromacs, which is why it is pinned to 2018.*. This should be addressed in a different PR so we can unpin it.

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 1, 2021

This one is actually broken: http://nihroadmap.nih.gov/bioinformatics

@swails
Copy link
Contributor

swails commented Feb 1, 2021

Any idea why the docs build is still failing? It's reporting the following as a broken link:

Per @jaimergp, I've observed websites that have blacklisted IP ranges for CI services like Travis. The PDB is actually one of them, I think.

@peastman
Copy link
Member

peastman commented Feb 1, 2021

I'm getting back to the OpenCL tests. I think the error in TestOpenCLNonbondedForce is just an overly strict tolerance. The accuracy of energy accumulation depends on the number of threads. For CPU OpenCL, it only uses a very small number of threads. When that's combined with single precision, the accuracy comes out less than what the test is expecting.

So that just leaves TestOpenCLDrudeSCFIntegrator to figure out.

@peastman
Copy link
Member

peastman commented Feb 2, 2021

Figured it out. The error didn't happen on computers with more than two cores, which is what made it hard to reproduce. The fixes for both are in #3003.

@peastman
Copy link
Member

peastman commented Feb 2, 2021

The fix is merged.

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 2, 2021

This one is actually broken: http://nihroadmap.nih.gov/bioinformatics

I think this is now https://commonfund.nih.gov/bioinformatics, as hinted by archive.org.

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 3, 2021

Ohhh it's all green! 😍

@peastman
Copy link
Member

peastman commented Feb 3, 2021

Yay! Shall I merge this?

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 3, 2021

Well, depends on whether you want to deal with this now or in the future:

  • Docs deployment (this block). We haven't checked if that works because I set it up so it only runs when master has received a push. I don't know if you deploy to S3 as PRs are merged, or only on releases.
  • Gromacs is pinned to v2018 because 2019 and 2020 fail. I guess we can deal with that in a different PR.
  • I didn't implement any automatic way to track conda-forge pinnings in our environment file, but this is not as critical now that I figured out how to build packages that run the full CI, so we can always ensure everything is ok on the conda-forge side during the RC phase.

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 3, 2021

Oh, and I guess we should also delete Azure and Travis from here, or do you want to run them side by side for a while?

@peastman
Copy link
Member

peastman commented Feb 3, 2021

I don't know if you deploy to S3 as PRs are merged, or only on releases.

They should get pushed to the development folder on every merged PR, and then we copy them to latest when we do a release.

Gromacs is pinned to v2018 because 2019 and 2020 fail. I guess we can deal with that in a different PR.

That seems fine.

I didn't implement any automatic way to track conda-forge pinnings in our environment file, but this is not as critical now that I figured out how to build packages that run the full CI, so we can always ensure everything is ok on the conda-forge side during the RC phase.

Up to you what you think is best.

@jchodera
Copy link
Member

jchodera commented Feb 3, 2021

🎉🎉🎉

@peastman
Copy link
Member

peastman commented Feb 3, 2021

I don't think we need to run Travis and Azure in parallel for long, but let's wait until this is merged and working before removing them.

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 3, 2021

They should get pushed to the development folder on every merged PR, and then we copy them to latest when we do a release.

Ok, that's the behavior I implemented, perfect! I'll need you to add these GitHub secrets (S3 authentication) for this to work. They are added here as plain text, then GitHub will take care of the rest.

Up to you what you think is best.

I'll give it a thought, but for sure this is not blocking now. I'll open an issue so I can have it assigned and I'll revisit it in the future. Same for Gromacs.

@peastman
Copy link
Member

peastman commented Feb 9, 2021

Where exactly do I get those values from? I only have limited experience with AWS. I think AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are supposed to be from a security credential created for the openmm.org user? I can see the access key ID, but it says there's no way to retrieve the secret key if you don't already have it. @jchodera do you have it? If not we can delete the existing credential and create a new one, but I don't know what I'll break by doing that.

Is AWS_S3_BUCKET just the name of the bucket containing the docs? I don't think that's secret, is it?

@jchodera
Copy link
Member

@peastman : I'll work with @jaimergp to handle this for you tomorrow.

@jchodera
Copy link
Member

@peastman @jaimergp : I've just populated AWS_S3_BUCKET, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY as repository secrets. You should be good to go!

@peastman
Copy link
Member

Thanks! I'll go ahead and merge this, and we can see what happens on the next PR.

@peastman peastman merged commit 6f8534d into openmm:master Feb 10, 2021
@jaimergp
Copy link
Contributor Author

Yay! 🎉

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 10, 2021

Looks like S3 deployed without errors, but there are two link checks that are rejected:

    [0] https://docs.conda.io/
    [0] https://docs.continuum.io/anaconda/install

They should be added to the ignore list. The Continuum one might need to be replaced with by https://docs.anaconda.org/anaconda/install/.

@peastman
Copy link
Member

It looks like this worked correctly on #3018, including pushing the docs to AWS when it was merged. Shall I go ahead and turn off the Travis and Azure builds?

@jaimergp
Copy link
Contributor Author

I think so! We can always roll back from history if needed.

@jaimergp
Copy link
Contributor Author

I guess that the [0] link errors were only temporary due to some network shenanigans.

@peastman
Copy link
Member

Done! For the moment I haven't changed any files. I just disabled the Travis webhook and the Azure app. If we need to reenable them, it's just a few clicks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changes in Travis CI pricing model
4 participants