Skip to content

Test Policy

Masashi Shibata edited this page Jan 12, 2023 · 3 revisions

Code quality

Similar to the main module, optuna/optuna, the test code under optuna/tests is carefully maintained to meet certain quality requirements. That is:

  • the test code is based on common principles of software design such as SOLID and DRY;
  • the test code is readable; for example, the name of a testing method sufficiently describes the purpose;
  • the test code is non-redundant; for example, multiple test cases should not exist for a single equivalence class.

In addition, the testing code is designed to avoid the Fragile Test problem. Below are typical anti-patterns that make the tests fragile:

  • a test case has side effects to other test cases;
  • a test case invokes private methods/variables encapsulated in another class;
  • a test case depends on unstable APIs or libraries;
  • etc.

See also XUnit Test Patterns for tips to improve the quality of test code.

Conventions of file name and structure

Under optuna/tests, the filenames and directory structure reflect the main optuna/optuna module. The test cases are placed in the most reasonable place. For example, the test cases of optuna.logging are written in optuna/tests/test_logging.py. Test cases specific to optuna.storages._rdb.models are not written in optuna/test/storage_test/test_storages.py.

If multiple classes/functions have duplicated test patterns, they are merged and placed in the common vertex module, and parametrized with pytest.mark.parametrize. For example, the common tests of sampler classes are placed in optuna/tests/sampler_tests/test_samplers.py. A common, parameterized test case should avoid conditional test logic that deals with specific classes/functions.

Analysis of input values and edge cases

With sufficient boundary value analysis and equivalence partitioning, test cases are written for each equivalence class. Not only the happy paths, but edge cases and error conditions are carefully analyzed. Below are typical edge cases in Optuna tests:

  • typical edge cases with Python:
    • None, empty list, empty string, etc.
  • typical edge cases with numerical calculation:
    • NaN, inf, -inf, negative values, etc.
  • Optuna specific edge cases:
    • a study contains no trial;
    • should_prune() is invoked without any reported intermediate value;
    • sample_independent() is invoked when BaseDistribution.single() is True;
    • etc.

Private APIs

In Optuna, private methods and functions are also tested. Note that, unreasonable testing on the private API may cause the Fragile Tests problem especially when the API is still unstable. If an API is unstable or hard to test, it should be refactored with more testable design and architecture, instead of adding unreasonable, fragile test cases.

Testing utilities

Copying and pasting test code directly harms the code quality. To keep the code DRY, duplicated logic is abstracted and put in the optuna.testing module.

Reproducibility

A seed argument is provided if a sampler, pruner, or any other class/function has randomness. The reproducibility is tested in its unit tests. In principle, the reproducibility tests focus on only single-worker scenarios. Multi-worker scenarios need not be tested, because Optuna does not guarantee any reproducibility of parallel optimization. In addition, the reproducibility of optimization performance is not covered with unit tests but with the performance benchmarks.

Coverage

A coverage report is uploaded to CodeCov for each pull request. Although we do not have a strict acceptance criteria of coverage scores, those values should be carefully checked to reconsider test cases and related software design, especially in the following cases:

  • the overall coverage score drastically decreases;
  • a class/module has extremely low coverage;
  • test cases do not cover specific branches of an if-statement.

Performance Testing

The optuna/benchmarks directory contains the scripts to evaluate optimization performance of sampling/pruning algorithms. Using those scripts, we try to ensure that no performance degradation occurs, whenever a pull request has any impact on existing Optuna algorithms. In addition, when a new algorithm is implemented, we compare its performances with those of existing algorithms and carefully discuss the advantages of merging the new one.

Visualization Modules

In Optuna, optuna.visualization modules are tested with unit testing and visual regression testing. Unit testing generally focuses on the verification of a small unit of source code, like a method or a function. Visual regression testing is better suited for verifying the entire logic, including visualization libraries such as plotly and matplotlib.