Migrate CI to GH Actions (#2964)

* Add draft GHA-based CI for Linux * Fix events * Dup key * Fix path * Update env vars * More fixes * Fix env vars again * add doxygen * add sudo * Typo * upgrade CUDA * escape newlines * more backslashes * fix CUDA_APT * more env vars fixes * fix missing file * acumulate env_vars * build python wrappers only if requested * add pytest * use $GITHUB_ENV https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#environment-files * use ${GITHUB_WORKSPACE} * Pin older gromacs * add windows * Provide default values for unbound vars * fixes * this is powershell not cmd * update envs * win fix * make windows use cmd * Change install prefix * fixes * better env files * Increase timeout threshold * set xcode to 10.x * specify action version * fix sdk in macos * split jobs a bit more * build wrappers even if tests failed for core * revert & resort build/test * fix run logic? * install envs in D:\ * add more cudas * allow longer test times * sudo that * fix cuda ver checks * another little fix * one more * missing package * missing dev packages * missing backslash * add nvprof * factor scripts out * export CUDA_PATH * no quotes in env var * add more cudas * fix ci name * fix flags * typo * missing parenthesis * add cuda 11.2 urls * add retry loops for online installations * add library existence tests * verbose * fix sets * CPU and PME are not built in GPU variants; do not test for those * quote? * fix windows checks * add macos opencl * disable opencl tests on macos (but build anyway) * add docs * cd into build for docs! * install then cd * pin sphinxcontrib-bibtex * we need tex in the system * split docs into a separate job * simplify retrying * simplify retrying 2 * do install * fix tlmgr installation * more tex packages * one more * one more * add textcomp to docs * usepackage[utf8]{inputenc} * switch to xelatex? * add xetex * more fonts * do not use xindy * tables can't contain blocks and use tabularcolumns at the same time https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html?highlight=tabular#directive-tabularcolumns * build libs in docs too * update docs deps; bring in pdfs * kill server after 404 checks * chage 404 checker * explicit locations needed * cumulative exit codes * override set -e * update README badges * add ppc / arm * missing vbar * not it * one too many extensions * do not test gromacs * manage workspace * source conda.sh * add python version to matrix vars * source first thing * disable unbound checks * add compilers * some more cuda stuff * add CPU only ppc64le * link through stubs * One more way to obtain HOST_USER_ID? * let cmake find cf's opencl * Try with GCC7 * add timeouts for docker based runs * add several attempts for stochastic failures * add tests with conda forge compilers * do not parallelize pytest in docker runs * exclude some known slow tests * enable ccache * forgot conda shell * No need for macos sdk retries * \ -> ^ * fix timestampt windows * export env vars for current step too * unneeded quotes windows? * disable compression on windows? * add ccache in docker too * group commands * fix syntax error * fix version spec * Increase timeout in Docker runs * heh, it's double colon * fix cache timestamp in windows * escape % with %% * ccache in docs too * don't use wrapper package in windows; call vcvarsall directly * More docker variants * Handle ccache env vars in yaml * Re-enable parallel pytest in Docker runs * Delete unwanted azure ymls that I didn't end up using * add some comments * Update badge URL * is that path messing with windows ccache? * add CI-README * Reduce CI matrix * increase timeouts * Add GCC7 on PPC QEmu again (temporarily, just for debugging) * add docker instructions for local debugging * fix start_docker_locally * skip some url checks in docs * Use new package name * update link
openmm · Feb 10, 2021 · 6f8534d · 6f8534d
1 parent 3aa4bb8
commit 6f8534d
Show file tree

Hide file tree

Showing 19 changed files with 1,257 additions and 6 deletions.
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
diff --git a/README.md b/README.md
@@ -1,4 +1,5 @@
-[![Build Status](https://travis-ci.org/openmm/openmm.svg?branch=master)](https://travis-ci.org/openmm/openmm?branch=master)
+[![GH Actions Status](https://github.com/openmm/openmm/workflows/CI/badge.svg)](https://github.com/openmm/openmm/actions?query=branch%3Amaster+workflow%3ACI)
+[![Conda](https://img.shields.io/conda/v/conda-forge/openmm.svg)](https://anaconda.org/conda-forge/openmm)
 [![Anaconda Cloud Badge](https://anaconda.org/conda-forge/openmm/badges/downloads.svg)](https://anaconda.org/conda-forge/openmm)
 
 ## OpenMM: A High Performance Molecular Dynamics Library

diff --git a/devtools/CI-README.md b/devtools/CI-README.md
@@ -0,0 +1,139 @@
+<!-- Authored by Jaime Rodríguez-Guerra, Chodera Lab. December 2020 -->
+
+# Our Continuous Integration setup
+
+OpenMM can be described as a C++ library with wrappers available in different programming languages (Python, C, Fortran). The heavy lifting is performed by the backend platforms, which can be based on CPU, CUDA and/or OpenCL (and possibly more in the future). All of this is supported for different operating systems and architectures. As a result, the CI setup can get a bit involved, but this document will try to clarify how it works and what we support.
+
+## Implementation overview
+
+OpenMM's CI runs mainly on GitHub Actions, with one separate Jenkins box running the GPU tests (generously provided by Jason Swails).
+
+The build matrix covers:
+
+- Operating systems and architecture:
+  - Linux x64
+  - MacOS Intel
+  - Windows
+  - Linux ppc64le (PowerPC)
+  - Linux aarch64 (ARM)
+- Python
+  - CPython 3.6, 3.7, 3.8, 3.9
+- CUDA versions
+  - 10.0 and above (Linux x64, Linux ppc64le, Windows)
+- OpenCL implementations
+  - Nvidia (tested along CUDA)
+  - AMD 3.0
+- Sysroots and C++ Compilers
+  - Linux: System's GCC 7 and whatever conda-forge is pinning (GCC 9 as of writing)
+  - MacOS: System's, targetting 10.9 SDK
+  - Windows: VS2019
+
+Before I describe the pipelines, I will clarify some concepts and idiosyncrasies in GitHub Actions
+
+- The configuration file lives on `.github/workflows/CI.yml`. This directory can host more than one YML _workflow_, each describing a set of event that will trigger a run.
+- The workflow specifies a set of triggers (key `on`) and a list of `jobs` to run. We run the `CI` workflow for:
+  - Pushes to `master`
+  - Pull requests targetting `master`
+  - Nightlies
+- Currently, the workflow contains four jobs: `unix`, `windows`, `docker`, `docs`. Each job can be run several times, depending on the configuration of `jobs.*.strategy.matrix`. All those jobs replicas will run in parallel and individually. The [`Actions > Summary`](https://github.com/openmm/openmm/actions/runs/451301350) overview can help visualize this.
+- Within each job, you find `steps`. A step can either run a script on a `shell` or use a GitHub `action` to perform a task.
+  - For example, cloning the repo or setting up Miniconda are both independent GitHub _actions_. You will recognize this because they contain the keyword `uses:`.
+  - Running CMake is a shell step, which uses `run:`.
+  - Note 1: Each step is run a new shell session. Environment variables won't survive across steps, unless you add them to the `$GITHUB_ENV` file: `echo "VARIABLE=VALUE" >> ${GITHUB_ENV}`. You can also use step `outputs` but that's more involved and rarely needed.
+  - Note 2: Due to the design of `conda-incubator/setup-miniconda`, all subsequent steps that rely on a conda environment require us to specify an OS-dependent custom shell. Do remember this if you need to add more steps in the job!
+- Steps can be run or skipped based on conditions expressed inside an `if:` key. This is how we control whether we need to install CUDA or not, for example. Jobs can have `if` check, if needed.
+- Steps can define environment variables in their `env:` key, but they will only be available in that step. A `job` can do it too, and these will be available for all steps.
+
+## Details per operating system
+
+The different implementations are very similar to what we do on Linux x64, so I will explain this one on detail and the rest will only comment on the relevant differences.
+
+### Linux x64
+
+- Part of the `unix` pipeline.
+- Runs on `ubuntu-latest`, as provided by GitHub Actions.
+- Uses `conda-incubator/setup-miniconda` to setup the bundled Miniconda and install a conda environment available that provides the building and testing dependencies (CMake, Swig, the adequate Python version, etc). These environment files are located under `devtools/ci/gh-actions/conda-envs`, per operating system.
+- Depending on the matrix configuration, we also install CUDA and/or AMD's OpenCL. These conditional steps are evaluated using GHA's builtin `if` mechanism. Ideally we would install this within the conda environment, but sometimes they are not available (licensing issues, etc(), so we delegate that to the system packages or vendor installers.
+  - For CUDA, we check whether `cuda-version` is not empty, and pass it to `devtools/ci/gh-actions/scripts/install_cuda.sh` as an environment variable.
+  - For OpenCL, we check whether `OPENCL` is `true` and run `devtools/ci/gh-actions/scripts/install_amd_opencl.sh`. This relies on a installer located in a S3 bucket. This could be refactored to install different OpenCL implementations (ROCm, Intel, etc).
+- Some matrix entries require us to install the conda forge compilers, which are used instead of the system's if present.
+- Now we need to configure and download the CCache contents. The keys are built off the matrix name, and a `YYYYDDMM-HHMMSS` timestamp. A secret `CACHE_VERSION` is also included so one can bump the cache by modifying this secret in the repository settings. The configuration is done through environment variables defined at the beginning of the job (key `jobs.unix.env`).
+- CMake is finally invoked, targetting the conda environment as destination (`CONDA_PREFIX`). Additional flags are passed from the matrix configuration. This is how we enable or disable features per matrix entry.
+- CCache performance is assessed.
+- Then we build the C++ libraries and Python wrappers, but separately. This way we can visually check which part failed more easily. Tests are also run separately for the same reason. Whether Python is built and/or tested is checked through the contents of `CMAKE_FLAGS`.
+
+### MacOS Intel
+
+- Part of the `unix` pipeline.
+- Runs on `macos-latest`.
+- Uses `conda-incubator/setup-miniconda`, pointing to the relevant environment file.
+- Neither CUDA nor OpenCL installation scripts are run. Instead, we download and install the 10.9 SDK using `devtools/ci/gh-actions/scripts/install_macos_sdk.sh`. This is done so we can mimic what Conda Forge does in their feedstocks. Check the scripts comments for more info.
+- Everything else is the same.
+
+### Windows
+
+- Sole member of the `windows` pipeline.
+- Runs on `windows-latest`.
+- Uses `conda-incubator/setup-miniconda`, pointing to the relevant environment file.
+- Installs CUDA with the Nvidia installers using `devtools/ci/gh-actions/scripts/install_cuda.bat`, which requires an environment variable `CUDA_VERSION`, exported from the corresponding matrix entry. Again, this only runs if `matrix.cuda-version` is not empty.
+- Everything else is the same.
+
+### PowerPC & ARM
+
+- Part of the `docker` pipeline.
+- These run on a Docker image on top of `ubuntu-latest`. The Docker image itself depends on the architecture chosen (ppc64le, aarch64) and what CUDA version we want. These are provided by Conda Forge, so they have `conda` preinstalled and ready to go.
+- Since it's a different architecture, we need to configure QEMU first. This is done automatically with a Docker image, mimicking what Conda Forge does.
+- We start the Docker image. The working directory (`$GITHUB_WORKSPACE`) is mounted with read/write permissions on `/home/conda/workspace`, so we can communicate back with the host using files, and also use CCache.
+- The Docker image will run `devtools/ci/gh-actions/scripts/run_steps_inside_docker_image.sh`. This script mostly does what you saw for Linux x64, with some differences:
+  - We don't need to install CUDA or setup Miniconda, because they are preinstalled in the Docker image.
+  - We patch some dependencies from the environment file because they are not available for this architecture. To save one conda environment solve, we also patch the Python version in the environment file.
+  - These images don't come with a system compiler, so we specify one in the matrix configuration:
+    - If `compilers` contains a value that starts with `devtoolset-`, we understand we want a CentOS devtoolse. So far, we specify `devtoolset-7`.
+    - If `compilers` is any other thing, we understand that's a (space-separated series of) Conda packages. Since Conda Forge provides a metapackage named `compilers` that will install all of them for the current platform, we use that one. That's why some entries have a `compilers: compilers` entry.
+  - Everything else runs as usual.
+- Do note that the whole Docker run is a single GitHub Actions step, so it's not as visually appealing. I tried my best to group the commands with the `::group::` syntax so it's easier to follow, but it's not the same.
+- If the script runs successfully, it will create an empty file. We test for existence after the Docker run to make sure.
+
+> Note: Since these use software emulation, they are really slow. Still, they can run successfully within the 6h GHA provides. If GHA upgrades to better CI machines with hardware based virtualization, they might be able to run with close-to-native performance.
+
+### Docs
+
+This is a Linux-x64 pipeline optimized for building the documentation only. It's provided as a separate entry because I didn't want to overcomplicate the `if:` logic in the `unix` pipeline. It's essentially the same, but:
+
+- It uses a different environment file in `setup-miniconda`.
+- It only builds the docs, and their dependencies. No tests, for example.
+- It contains a deployment step, which will copy the contents to the S3 bucket _only_ when run on `master`, ignoring cron jobs. The required secrets must be defined in the repository settings with the following exact key names. Just copy paste the values there. GitHub will encrypt and mask them.
+  - `AWS_S3_BUCKET`
+  - `AWS_ACCESS_KEY_ID`
+  - `AWS_SECRET_ACCESS_KEY`
+- It will also check for dead links using a Node package. This is run _after_ deployment so it won't prevent that, but it will still signal the job as failed if the docs contain broken links.
+
+## Shortcomings
+
+There are some limitations when compared to other CI services, but I guess this list will be shorter over time:
+
+- Cache cannot be invalidated directly. Instead, I included a secret `CACHE_VERSION` that is part of the cache key. If you change the value of this secret, it will functionally prevent access to the previous cache. It also expires every 7 days. Note that since this trick uses a secret, the value of `CACHE_VERSION` will be masked in the log output. As a result, make sure to use something short but meaningless and difficult to find in the wild (e.g. `pqgbhl` instead of `0`).
+- There's no `ci skip` functionality (yet).
+
+## Extra content
+
+### How to debug PowerPC / ARM locally
+
+From the root of the repository, run the following script. There are
+some variables you might want to edit (PPC vs ARM, Python version, etc).
+Take a look to the script first in that case.
+
+```bash
+bash devtools/ci/gh-actions/start_docker_locally.sh
+```
+
+You will be inside the Docker image after a few moments. The repo root has
+been mounted to `/home/conda/workspace`.
+
+Run this other script to reproduce the CI steps exactly. Do NOT `source` scripts,
+since a failure will exit Docker altogether. Always use new `bash` processes
+to avoid starting from scratch.
+
+```bash
+bash /home/conda/workspace/devtools/ci/gh-actions/scripts/run_steps_inside_docker_image.sh
+```
diff --git a/devtools/ci/gh-actions/conda-envs/build-macos-latest.yml b/devtools/ci/gh-actions/conda-envs/build-macos-latest.yml
@@ -0,0 +1,20 @@
+name: build
+channels:
+- conda-forge
+- bioconda
+dependencies:
+# build
+- cmake
+- ccache
+# host
+- python
+- cython
+- swig
+- fftw
+- numpy
+- doxygen 1.8.14
+# test
+- pytest
+- pytest-xdist
+- pytest-timeout
+- gromacs 2018.*
diff --git a/devtools/ci/gh-actions/conda-envs/build-ubuntu-latest.yml b/devtools/ci/gh-actions/conda-envs/build-ubuntu-latest.yml
@@ -0,0 +1,22 @@
+name: build
+channels:
+- conda-forge
+- bioconda
+dependencies:
+# build
+- cmake
+- make
+- ccache
+# host
+- python
+- cython
+- swig
+- fftw
+- numpy
+- ocl-icd-system
+- doxygen 1.8.14
+# test
+- pytest
+- pytest-xdist
+- pytest-timeout
+- gromacs 2018.*
diff --git a/devtools/ci/gh-actions/conda-envs/build-windows-latest.yml b/devtools/ci/gh-actions/conda-envs/build-windows-latest.yml
@@ -0,0 +1,22 @@
+name: build
+channels:
+- conda-forge
+- defaults
+dependencies:
+# build
+- jom
+- cmake
+- ccache
+- m2-coreutils
+# host
+- python
+- cython
+- swig
+- fftw
+- numpy
+- doxygen 1.8.14
+- khronos-opencl-icd-loader
+# test
+- pytest
+- pytest-xdist
+- pytest-timeout
diff --git a/devtools/ci/gh-actions/conda-envs/docs.yml b/devtools/ci/gh-actions/conda-envs/docs.yml
@@ -0,0 +1,19 @@
+name: build
+channels:
+- conda-forge
+dependencies:
+# build
+- cmake
+- ccache
+# host
+- python
+- pip
+- numpy
+- cython
+- swig
+- doxygen 1.8.14
+- pip:
+  - sphinx==2.3.1
+  - sphinxcontrib-bibtex<2.0.0
+  - sphinxcontrib-lunrsearch
+  - sphinxcontrib-autodoc_doxygen
diff --git a/devtools/ci/gh-actions/scripts/install_amd_opencl.sh b/devtools/ci/gh-actions/scripts/install_amd_opencl.sh
@@ -0,0 +1,24 @@
+# This script installs AMD's SDK 3.0 to provide their OpenCL implementation
+# * Installation path will be ${GITHUB_WORKSPACE}/AMDAPPSDK
+
+set -euxo pipefail
+
+
+wget -q --retry-connrefused --waitretry=1 --read-timeout=20 --timeout=15 --tries 5 \
+    http://s3.amazonaws.com/omnia-ci/AMD-APP-SDKInstaller-v3.0.130.135-GA-linux64.tar.bz2
+tar -xjf AMD-APP-SDK*.tar.bz2
+
+AMDAPPSDK=${GITHUB_WORKSPACE}/AMDAPPSDK
+export OPENCL_VENDOR_PATH=${AMDAPPSDK}/etc/OpenCL/vendors
+
+mkdir -p ${OPENCL_VENDOR_PATH}
+sh AMD-APP-SDK*.sh --tar -xf -C ${AMDAPPSDK}
+echo libamdocl64.so > ${OPENCL_VENDOR_PATH}/amdocl64.icd
+
+export LD_LIBRARY_PATH=${AMDAPPSDK}/lib/x86_64:${LD_LIBRARY_PATH:-}
+chmod +x ${AMDAPPSDK}/bin/x86_64/clinfo
+${AMDAPPSDK}/bin/x86_64/clinfo
+sudo apt-get install -y libgl1-mesa-dev
+
+echo "OPENCL_VENDOR_PATH=${OPENCL_VENDOR_PATH}" >> ${GITHUB_ENV}
+echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" >> ${GITHUB_ENV}