Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2]: CI Overhaul (and make v2 actually _pass_ the CI) #714

Merged
merged 18 commits into from
Apr 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ max-line-length = 100
max-complexity = 18
per-file-ignores =
__init__.py: F401
chemprop/v2/nn/predictors.py: F405
chemprop/v2/metrics.py: F405
chemprop/nn/predictors.py: F405
chemprop/nn/metrics.py: F405
114 changes: 114 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# ci.yml
#
# Continuous Integration for Chemprop - checks build, code formatting, and runs tests for all
# proposed changes and on a regular schedule
#
# Note: this file contains extensive inline documentation to aid with knowledge transfer.

name: Continuous Integration

on:
# run on pushes/pull requests to/against main or v2/dev
push:
branches: [main]
JacksonBurns marked this conversation as resolved.
Show resolved Hide resolved
pull_request:
branches: [main, v2/dev]
# run this in the morning on weekdays to catch dependency issues
schedule:
- cron: "0 8 * * 1-5"
# allow manual runs
workflow_dispatch:

# cancel previously running tests if new commits are made
# https://docs.github.com/en/actions/examples/using-concurrency-expressions-and-a-test-matrix
concurrency:
group: actions-id-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
build:
name: Check Build
runs-on: ubuntu-latest
steps:
# clone the repo, attempt to build
- uses: actions/checkout@v4
- run: python -m pip install build
- run: python -m build .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate all the inline documentation because I'm not familiar with most of this. What is the difference between building the repo and running tests? I would think that you need to build the repo before you can run tests, but I don't see a build step in "Execute Tests".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'build' step refers to the literal python build command. If it fails we couldn't build the Python package at the end or install Chemprop to run the tests.

The test running doesn't use build but just installs the package. We could just go straight to running the tests, but it's kinda nice to have one expected point of failure for each step in the CI (like, the build only fails in one place, the tests only fail in one place, etc).


lint:
name: Check Formatting
needs: build
runs-on: ubuntu-latest
steps:
# clone the repo, run black and flake8 on it
- uses: actions/checkout@v4
- run: python -m pip install black==23.* flake8
- run: black --check .
- run: flake8 .

test:
name: Execute Tests
needs: lint
runs-on: ${{ matrix.os }}
defaults:
run:
# run with a login shell (so that the conda environment is activated)
# and echo the commands we run as we do them (for debugging purposes)
shell: bash -el {0}
strategy:
# if one platform/python version fails, continue testing the others
fail-fast: false
matrix:
# test on all platforms with both supported versions of Python
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.11, 3.12]
steps:
- uses: actions/checkout@v4
# use a version of the conda virtual environment manager to set up an
# isolated environment with the Python version we want
- uses: mamba-org/setup-micromamba@main
with:
environment-name: temp
condarc: |
channels:
- defaults
- conda-forge
channel_priority: flexible
create-args: |
python=${{ matrix.python-version }}
- name: Install dependencies
shell: bash -l {0}
run: |
python -m pip install nbmake
python -m pip install ".[dev,docs,test]"
- name: Test with pytest
shell: bash -l {0}
run: |
pytest -v tests
- name: Test notebooks
shell: bash -l {0}
run: |
pytest --nbmake examples/training.ipynb
pytest --nbmake examples/predicting.ipynb
pytest --nbmake examples/convert_v1_to_v2.ipynb
pytest --nbmake examples/training_regression_multicomponent.ipynb
pytest --nbmake examples/predicting_regression_multicomponent.ipynb

# GitHub allows you to set checks as required before PRs can be merged, which is annoying to do
# for matrix jobs like the one above which have six actual jobs. Instead we have a summary job
# like this, which we mark as required.
# Copied in part from:
# https://github.com/orgs/community/discussions/26822#discussioncomment-3305794
ci-report-status:
name: report CI status
needs: test
runs-on: ubuntu-latest
steps:
- run: |
result="${{ needs.test.result }}"
if test $result == "success"; then
exit 0
else
exit 1
fi

62 changes: 0 additions & 62 deletions .github/workflows/tests.yml

This file was deleted.

5 changes: 0 additions & 5 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,6 @@ build:
tools:
python: "3.11"
jobs:
# We need to install torch and torch-scatter separately as a pre-install step
# due to an issue with torch-scatter
pre_install:
- python -m pip install --upgrade --no-cache-dir torch
- python -m pip install --upgrade --no-cache-dir torch-scatter
post_install:
- python -m pip install --upgrade --upgrade-strategy only-if-needed --no-cache-dir ".[docs]"

Expand Down
6 changes: 2 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,11 @@ RUN conda create --name chemprop_env python=3.11* && \
# in a Dockerfile build script)
SHELL ["conda", "run", "--no-capture-output", "-n", "chemprop_env", "/bin/bash", "-c"]

# Follow the installation instructions (but with fixed versions, for stability of this image) then clear the cache
# Follow the installation instructions then clear the cache
ADD chemprop chemprop
ENV PYTHONPATH /opt/chemprop
ADD LICENSE.txt pyproject.toml README.md .
RUN python -m pip install torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu && \
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-2.2.0+cpu.html && \
python -m pip install . && \
RUN python -m pip install . && \
python -m pip cache purge

# when running this image, open an interactive bash terminal inside the conda environment
Expand Down
2 changes: 2 additions & 0 deletions chemprop/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from . import data, featurizers, models, nn, utils, conf, exceptions, schedulers

__all__ = ["data", "featurizers", "models", "nn", "utils", "conf", "exceptions", "schedulers"]

__version__ = "2.0.0-rc.1"
5 changes: 3 additions & 2 deletions chemprop/cli/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ def add_common_args(parser: ArgumentParser) -> ArgumentParser:
"-n",
"--num-workers",
type=int,
default=8,
help="Number of workers for the parallel data loading (0 means sequential).",
default=0,
help="""Number of workers for parallel data loading (0 means sequential).
Warning: setting num_workers>0 can cause hangs on Windows and MacOS.""",
)
dataloader_args.add_argument("-b", "--batch-size", type=int, default=64, help="Batch size.")

Expand Down
1 change: 1 addition & 0 deletions chemprop/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def construct_parser():

return parser


def main():
parser = construct_parser()
args = parser.parse_args()
Expand Down
3 changes: 1 addition & 2 deletions chemprop/cli/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import json
from copy import deepcopy
import pandas as pd
from rdkit import Chem

from lightning import pytorch as pl
from lightning.pytorch.loggers import TensorBoardLogger, CSVLogger
Expand Down Expand Up @@ -439,7 +438,7 @@ def add_train_args(parser: ArgumentParser) -> ArgumentParser:
default=0,
help="Random seed to use when splitting data into train/val/test sets. When :code`num_folds > 1`, the first fold uses this seed and all subsequent folds add 1 to the seed. Also used for shuffling data in :code:`MolGraphDataLoader` when :code:`shuffle` is True.",
)

parser.add_argument(
"--pytorch-seed",
type=int,
Expand Down
21 changes: 20 additions & 1 deletion chemprop/cli/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,23 @@
from .actions import LookupAction
from .command import Subcommand
from .parsing import build_data_from_files, make_datapoints, make_dataset, get_column_names
from .utils import *
from .utils import pop_attr, _pop_attr, _pop_attr_d, validate_loss_function

__all__ = [
"bounded",
"LookupAction",
"Subcommand",
"build_data_from_files",
"make_datapoints",
"make_dataset",
"get_column_names",
"actions",
"args",
"command",
"parsing",
"utils",
"pop_attr",
"_pop_attr",
"_pop_attr_d",
"validate_loss_function",
]
19 changes: 19 additions & 0 deletions chemprop/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,22 @@
)
from .samplers import ClassBalanceSampler, SeededSampler
from .splitting import split_component, SplitType

__all__ = [
"BatchMolGraph",
"TrainingBatch",
"collate_batch",
"collate_multicomponent",
"MolGraphDataLoader",
"MoleculeDatapoint",
"ReactionDatapoint",
"MoleculeDataset",
"ReactionDataset",
"Datum",
"MulticomponentDataset",
"MolGraphDataset",
"ClassBalanceSampler",
"SeededSampler",
"split_component",
"SplitType",
]
2 changes: 1 addition & 1 deletion chemprop/data/datapoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ class MoleculeDatapoint(_DatapointMixin, _MoleculeDatapointMixin):
concatenated to bond-level features *before* message passing"""
V_d: np.ndarray | None = None
"""A numpy array of shape ``V x d_vd``, where ``V`` is the number of atoms in the molecule, and
``d_vd`` is the number of additional descriptors that will be concatenated to atom-level
``d_vd`` is the number of additional descriptors that will be concatenated to atom-level
descriptors *after* message passing"""

def __post_init__(self, mfs: list[MoleculeFeaturizer] | None):
Expand Down
8 changes: 4 additions & 4 deletions chemprop/data/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,13 +250,13 @@ def normalize_inputs(

match key:
case "X_d":
X = None if np.all(self.X_d == None) else self.X_d
X = None if all(i is None for i in self.X_d) else self.X_d
case "V_f":
X = None if np.all(self.V_fs == None) else np.concatenate(self.V_fs, axis=0)
X = None if all(i is None for i in self.V_fs) else np.concatenate(self.V_fs, axis=0)
case "E_f":
X = None if np.all(self.E_fs == None) else np.concatenate(self.E_fs, axis=0)
X = None if all(i is None for i in self.E_fs) else np.concatenate(self.E_fs, axis=0)
case "V_d":
X = None if np.all(self.V_ds == None) else np.concatenate(self.V_ds, axis=0)
X = None if all(i is None for i in self.V_ds) else np.concatenate(self.V_ds, axis=0)
case None:
return [self.normalize_inputs(k, scaler) for k in VALID_KEYS - {None}]
case _:
Expand Down
1 change: 0 additions & 1 deletion chemprop/data/splitting.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import logging
from enum import auto
from typing import Sequence
from random import Random
import numpy as np
from astartes import train_test_split, train_val_test_split
from astartes.molecules import train_test_split_molecules, train_val_test_split_molecules
Expand Down
41 changes: 39 additions & 2 deletions chemprop/featurizers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,41 @@
from .atom import MultiHotAtomFeaturizer, AtomFeaturizer
from .bond import MultiHotBondFeaturizer, BondFeaturizer
from .molgraph import *
from .molecule import *
from .molgraph import (
MolGraph,
MoleculeMolGraphFeaturizer,
SimpleMoleculeMolGraphFeaturizer,
RxnMolGraphFeaturizer,
CondensedGraphOfReactionFeaturizer,
CGRFeaturizer,
RxnMode,
)
from .molecule import (
MoleculeFeaturizer,
MorganFeaturizerMixin,
BinaryFeaturizerMixin,
CountFeaturizerMixin,
MorganBinaryFeaturizer,
MorganCountFeaturizer,
MoleculeFeaturizerRegistry,
)

__all__ = [
"MultiHotAtomFeaturizer",
"AtomFeaturizer",
"MultiHotBondFeaturizer",
"BondFeaturizer",
"MolGraph",
"MoleculeMolGraphFeaturizer",
"SimpleMoleculeMolGraphFeaturizer",
"RxnMolGraphFeaturizer",
"CondensedGraphOfReactionFeaturizer",
"CGRFeaturizer",
"RxnMode",
"MoleculeFeaturizer",
"MoleculeFeaturizerRegistry",
"MorganFeaturizerMixin",
"BinaryFeaturizerMixin",
"CountFeaturizerMixin",
"MorganBinaryFeaturizer",
"MorganCountFeaturizer",
]
2 changes: 1 addition & 1 deletion chemprop/featurizers/molecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def __call__(self, mol: Chem.Mol) -> np.ndarray:


@MoleculeFeaturizerRegistry("morgan_binary")
class MorganBinaryFeaturzer(MorganFeaturizerMixin, BinaryFeaturizerMixin, MoleculeFeaturizer):
class MorganBinaryFeaturizer(MorganFeaturizerMixin, BinaryFeaturizerMixin, MoleculeFeaturizer):
pass


Expand Down