Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Benchmarking overhaul and pin Flake8 <6 (#220)
* Better benchmarking infrastructure - see SciTools/iris#4571, SciTools/iris#4583. * Minor improvements to benchmark data generation messages. * Better benchmark imports. * Better strategy for data realisation and ASV. * Introduced on_demand_benchmark decorator - see SciTools/iris#4621. * Simplify benchmark structure following 878b7a3. * Added a benchmarks README mirroring SciTools/iris. * CHANGELOG entry. * Flake8 fixes. * Bump Nox cache. * Cirrus benchmarks pass in CIRRUS_BASE_SHA. * Benchmark README Conda package cache tips. * Reset Nox cache. * New Nox cache. * Remove licence header from asv_delegated_conda.py. * Always re-create Nox benchmark environment (to avoid CI problems). * Pin Flake8 <6. * Always re-create Nox benchmark environment (to avoid CI problems).
- Loading branch information
1 parent
10cb843
commit 88d2c98
Showing
17 changed files
with
599 additions
and
364 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# iris-esmf-regrid Performance Benchmarking | ||
|
||
iris-esmf-regrid uses an | ||
[Airspeed Velocity](https://github.com/airspeed-velocity/asv) | ||
(ASV) setup to benchmark performance. This is primarily designed to check for | ||
performance shifts between commits using statistical analysis, but can also | ||
be easily repurposed for manual comparative and scalability analyses. | ||
|
||
The benchmarks are run as part of the CI (the `benchmark_task` in | ||
[`.cirrus.yml`](../.cirrus.yml)), with any notable shifts in performance | ||
raising a ❌ failure. | ||
|
||
## Running benchmarks | ||
|
||
`asv ...` commands must be run from this directory. You will need to have ASV | ||
installed, as well as Nox (see | ||
[Benchmark environments](#benchmark-environments)). | ||
|
||
[iris-esmf-regrid's noxfile](../noxfile.py) includes a `benchmarks` session | ||
that provides conveniences for setting up before benchmarking, and can also | ||
replicate the CI run locally. See the session docstring for detail. | ||
|
||
### Environment variables | ||
|
||
* `DATA_GEN_PYTHON` - required - path to a Python executable that can be | ||
used to generate benchmark test objects/files; see | ||
[Data generation](#data-generation). The Nox session sets this automatically, | ||
but will defer to any value already set in the shell. | ||
* `BENCHMARK_DATA` - optional - path to a directory for benchmark synthetic | ||
test data, which the benchmark scripts will create if it doesn't already | ||
exist. Defaults to `<root>/benchmarks/.data/` if not set. Note that some of | ||
the generated files, especially in the 'SPerf' suite, are many GB in size so | ||
plan accordingly. | ||
* `ON_DEMAND_BENCHMARKS` - optional - when set (to any value): benchmarks | ||
decorated with `@on_demand_benchmark` are included in the ASV run. Usually | ||
coupled with the ASV `--bench` argument to only run the benchmark(s) of | ||
interest. Is set during the Nox `sperf` session. | ||
|
||
### Reducing run time | ||
|
||
Before benchmarks are run on a commit, the benchmark environment is | ||
automatically aligned with the lock-file for that commit. You can significantly | ||
speed up any environment updates by co-locating the benchmark environment and your | ||
[Conda package cache](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-package-directories-pkgs-dirs) | ||
on the same [file system](https://en.wikipedia.org/wiki/File_system). This can | ||
be done in several ways: | ||
|
||
* Move your iris-esmf-regrid checkout, this being the default location for the | ||
benchmark environment. | ||
* Move your package cache by editing | ||
[`pkgs_dirs` in Conda config](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-package-directories-pkgs-dirs). | ||
* Move the benchmark environment by **locally** editing the environment path of | ||
`delegated_env_commands` and `delegated_env_parent` in | ||
[asv.conf.json](asv.conf.json). | ||
|
||
## Writing benchmarks | ||
|
||
[See the ASV docs](https://asv.readthedocs.io/) for full detail. | ||
|
||
### Data generation | ||
**Important:** be sure not to use the benchmarking environment to generate any | ||
test objects/files, as this environment changes with each commit being | ||
benchmarked, creating inconsistent benchmark 'conditions'. The | ||
[generate_data](./benchmarks/generate_data.py) module offers a | ||
solution; read more detail there. | ||
|
||
### ASV re-run behaviour | ||
|
||
Note that ASV re-runs a benchmark multiple times between its `setup()` routine. | ||
This is a problem for benchmarking certain Iris operations such as data | ||
realisation, since the data will no longer be lazy after the first run. | ||
Consider writing extra steps to restore objects' original state _within_ the | ||
benchmark itself. | ||
|
||
If adding steps to the benchmark will skew the result too much then re-running | ||
can be disabled by setting an attribute on the benchmark: `number = 1`. To | ||
maintain result accuracy this should be accompanied by increasing the number of | ||
repeats _between_ `setup()` calls using the `repeat` attribute. | ||
`warmup_time = 0` is also advisable since ASV performs independent re-runs to | ||
estimate run-time, and these will still be subject to the original problem. A | ||
decorator is available for this - `@disable_repeat_between_setup` in | ||
[benchmarks init](./benchmarks/__init__.py). | ||
|
||
### Scaling / non-Scaling Performance Differences | ||
|
||
When comparing performance between commits/file-type/whatever it can be helpful | ||
to know if the differences exist in scaling or non-scaling parts of the Iris | ||
functionality in question. This can be done using a size parameter, setting | ||
one value to be as small as possible (e.g. a scalar `Cube`), and the other to | ||
be significantly larger (e.g. a 1000x1000 `Cube`). Performance differences | ||
might only be seen for the larger value, or the smaller, or both, getting you | ||
closer to the root cause. | ||
|
||
### On-demand benchmarks | ||
|
||
Some benchmarks provide useful insight but are inappropriate to be included in | ||
a benchmark run by default, e.g. those with long run-times or requiring a local | ||
file. These benchmarks should be decorated with `@on_demand_benchmark` | ||
(see [benchmarks init](./benchmarks/__init__.py)), which | ||
sets the benchmark to only be included in a run when the `ON_DEMAND_BENCHMARKS` | ||
environment variable is set. Examples include the SPerf benchmark | ||
suite for the UK Met Office NG-VAT project. | ||
|
||
## Benchmark environments | ||
|
||
We have disabled ASV's standard environment management, instead using an | ||
environment built using the same Nox scripts as Iris' test environments. This | ||
is done using ASV's plugin architecture - see | ||
[asv_delegated_conda.py](asv_delegated_conda.py) and the extra config items in | ||
[asv.conf.json](asv.conf.json). | ||
|
||
(ASV is written to control the environment(s) that benchmarks are run in - | ||
minimising external factors and also allowing it to compare between a matrix | ||
of dependencies (each in a separate environment). We have chosen to sacrifice | ||
these features in favour of testing each commit with its intended dependencies, | ||
controlled by Nox + lock-files). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,27 @@ | ||
{ | ||
"version": 1, | ||
"project": "esmf_regrid", | ||
"repo": "..", | ||
"environment_type": "nox-conda", | ||
"pythons": [], | ||
"branches": ["main"], | ||
"benchmark_dir": "benchmarks", | ||
"env_dir": ".asv-env", | ||
"results_dir": ".asv-results", | ||
"html_dir": ".asv-html", | ||
"project_url": "https://github.com/SciTools-incubator/iris-esmf-regrid", | ||
"repo": "..", | ||
"environment_type": "conda-delegated", | ||
"show_commit_url": "https://github.com/SciTools-incubator/iris-esmf-regrid/commit/", | ||
"plugins": [".nox_asv_plugin"], | ||
"branches": ["upstream/main"], | ||
|
||
"benchmark_dir": "./benchmarks", | ||
"env_dir": ".asv/env", | ||
"results_dir": ".asv/results", | ||
"html_dir": ".asv/html", | ||
"plugins": [".asv_delegated_conda"], | ||
|
||
// The command(s) that create/update an environment correctly for the | ||
// checked-out commit. | ||
// Interpreted the same as build_command, with following exceptions: | ||
// * No build-time environment variables. | ||
// * Is run in the same environment as the ASV install itself. | ||
"delegated_env_commands": [ | ||
"PY_VER=3.10 nox --envdir={conf_dir}/.asv/env/nox01 --session=tests --install-only --no-error-on-external-run --verbose" | ||
], | ||
// The parent directory of the above environment. | ||
// The most recently modified environment in the directory will be used. | ||
"delegated_env_parent": "{conf_dir}/.asv/env/nox01" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
""" | ||
ASV plug-in providing an alternative :class:`asv.plugins.conda.Conda` | ||
subclass that manages the Conda environment via custom user scripts. | ||
""" | ||
|
||
from os import environ | ||
from os.path import getmtime | ||
from pathlib import Path | ||
from shutil import copy2, copytree, rmtree | ||
from tempfile import TemporaryDirectory | ||
|
||
from asv import util as asv_util | ||
from asv.config import Config | ||
from asv.console import log | ||
from asv.plugins.conda import Conda | ||
from asv.repo import Repo | ||
|
||
|
||
class CondaDelegated(Conda): | ||
""" | ||
Manage a Conda environment using custom user scripts, run at each commit. | ||
Ignores user input variations - ``matrix`` / ``pythons`` / | ||
``conda_environment_file``, since environment is being managed outside ASV. | ||
Original environment creation behaviour is inherited, but upon checking out | ||
a commit the custom script(s) are run and the original environment is | ||
replaced with a symlink to the custom environment. This arrangement is then | ||
re-used in subsequent runs. | ||
""" | ||
|
||
tool_name = "conda-delegated" | ||
|
||
def __init__( | ||
self, | ||
conf: Config, | ||
python: str, | ||
requirements: dict, | ||
tagged_env_vars: dict, | ||
) -> None: | ||
""" | ||
Parameters | ||
---------- | ||
conf : Config instance | ||
python : str | ||
Version of Python. Must be of the form "MAJOR.MINOR". | ||
requirements : dict | ||
Dictionary mapping a PyPI package name to a version | ||
identifier string. | ||
tagged_env_vars : dict | ||
Environment variables, tagged for build vs. non-build | ||
""" | ||
ignored = ["`python`"] | ||
if requirements: | ||
ignored.append("`requirements`") | ||
if tagged_env_vars: | ||
ignored.append("`tagged_env_vars`") | ||
if conf.conda_environment_file: | ||
ignored.append("`conda_environment_file`") | ||
message = ( | ||
f"Ignoring ASV setting(s): {', '.join(ignored)}. Benchmark " | ||
"environment management is delegated to third party script(s)." | ||
) | ||
log.warning(message) | ||
requirements = {} | ||
tagged_env_vars = {} | ||
conf.conda_environment_file = None | ||
|
||
super().__init__(conf, python, requirements, tagged_env_vars) | ||
self._update_info() | ||
|
||
self._env_commands = self._interpolate_commands(conf.delegated_env_commands) | ||
# Again using _interpolate_commands to get env parent path - allows use | ||
# of the same ASV env variables. | ||
env_parent_interpolated = self._interpolate_commands(conf.delegated_env_parent) | ||
# Returns list of tuples, we just want the first. | ||
env_parent_first = env_parent_interpolated[0] | ||
# The 'command' is the first item in the returned tuple. | ||
env_parent_string = " ".join(env_parent_first[0]) | ||
self._delegated_env_parent = Path(env_parent_string).resolve() | ||
|
||
@property | ||
def name(self): | ||
"""Get a name to uniquely identify this environment.""" | ||
return asv_util.sanitize_filename(self.tool_name) | ||
|
||
def _update_info(self) -> None: | ||
"""Make sure class properties reflect the actual environment being used.""" | ||
# Follow symlink if it has been created. | ||
actual_path = Path(self._path).resolve() | ||
self._path = str(actual_path) | ||
|
||
# Get custom environment's Python version if it exists yet. | ||
try: | ||
get_version = ( | ||
"from sys import version_info; " | ||
"print(f'{version_info.major}.{version_info.minor}')" | ||
) | ||
actual_python = self.run(["-c", get_version]) | ||
self._python = actual_python | ||
except OSError: | ||
pass | ||
|
||
def _prep_env(self) -> None: | ||
"""Run the custom environment script(s) and switch to using that environment.""" | ||
message = f"Running delegated environment management for: {self.name}" | ||
log.info(message) | ||
env_path = Path(self._path) | ||
|
||
def copy_asv_files(src_parent: Path, dst_parent: Path) -> None: | ||
"""For copying between self._path and a temporary cache.""" | ||
asv_files = list(src_parent.glob("asv*")) | ||
# build_root_path.name usually == "project" . | ||
asv_files += [src_parent / Path(self._build_root).name] | ||
for src_path in asv_files: | ||
dst_path = dst_parent / src_path.name | ||
if not dst_path.exists(): | ||
# Only caching in case the environment has been rebuilt. | ||
# If the dst_path already exists: rebuilding hasn't | ||
# happened. Also a non-issue when copying in the reverse | ||
# direction because the cache dir is temporary. | ||
if src_path.is_dir(): | ||
func = copytree | ||
else: | ||
func = copy2 | ||
func(src_path, dst_path) | ||
|
||
with TemporaryDirectory(prefix="delegated_asv_cache_") as asv_cache: | ||
asv_cache_path = Path(asv_cache) | ||
# Cache all of ASV's files as delegated command may remove and | ||
# re-build the environment. | ||
copy_asv_files(env_path.resolve(), asv_cache_path) | ||
|
||
# Adapt the build_dir to the cache location. | ||
build_root_path = Path(self._build_root) | ||
build_dir_original = build_root_path / self._repo_subdir | ||
build_dir_subpath = build_dir_original.relative_to(build_root_path.parent) | ||
build_dir = asv_cache_path / build_dir_subpath | ||
|
||
# Run the script(s) for delegated environment creation/updating. | ||
# (An adaptation of self._interpolate_and_run_commands). | ||
for command, env, return_codes, cwd in self._env_commands: | ||
local_envs = dict(environ) | ||
local_envs.update(env) | ||
if cwd is None: | ||
cwd = str(build_dir) | ||
_ = asv_util.check_output( | ||
command, | ||
timeout=self._install_timeout, | ||
cwd=cwd, | ||
env=local_envs, | ||
valid_return_codes=return_codes, | ||
) | ||
|
||
# Replace the env that ASV created with a symlink to the env | ||
# created/updated by the custom script. | ||
delegated_env_path = sorted( | ||
self._delegated_env_parent.glob("*"), | ||
key=getmtime, | ||
reverse=True, | ||
)[0] | ||
if env_path.resolve() != delegated_env_path: | ||
try: | ||
env_path.unlink(missing_ok=True) | ||
except IsADirectoryError: | ||
rmtree(env_path) | ||
env_path.symlink_to(delegated_env_path, target_is_directory=True) | ||
|
||
# Check that environment exists. | ||
try: | ||
env_path.resolve(strict=True) | ||
except FileNotFoundError: | ||
message = f"Path does not resolve to environment: {env_path}" | ||
log.error(message) | ||
raise RuntimeError(message) | ||
|
||
# Restore ASV's files from the cache (if necessary). | ||
copy_asv_files(asv_cache_path, env_path.resolve()) | ||
|
||
# Record new environment information in properties. | ||
self._update_info() | ||
|
||
def checkout_project(self, repo: Repo, commit_hash: str) -> None: | ||
"""Check out the working tree of the project at given commit hash.""" | ||
super().checkout_project(repo, commit_hash) | ||
self._prep_env() | ||
log.info(f"Environment {self.name} updated to spec at {commit_hash[:8]}") |
Oops, something went wrong.