Skip to content

tillahoffmann/doit_interface

Repository files navigation

🎯 doit interface

image

image

image

This package provides a functional interface for reducing boilerplate in dodo.py of the pydoit build system. In short, all tasks are created and managed using a .Manager. Most features<features> are exposed using python context manager, e.g., grouping tasks.

Basic usage

example

>>> import doit_interface as di

>>> # Get a default manager (or create your own to use as a context manager). >>> manager = di.Manager.get_instance()

>>> # Create a single task. >>> manager(basename="create_foo", actions=["touch foo"], targets=["foo"]) {'basename': 'create_foo', 'actions': ['touch foo'], 'targets': ['foo'], ...}

>>> # Group multiple tasks. >>> with di.group_tasks("my_group") as my_group: ... member = manager(basename="member") >>> my_group <doit_interface.contexts.group_tasks object at 0x...> named my_group with 1 task

Note

The default manager obtained by calling .Manager.get_instance has a number of default contexts enabled:

  1. .SubprocessAction.use_as_default to use .SubprocessAction by default for string actions.
  2. .create_target_dirs to create target directories if they are missing.
  3. .normalize_dependencies such that task objects can be used as file and task dependencies.

It also injects a default DOIT_CONFIG configuration variable if the filename is dodo.py.

If you want to override this default behavior, you can create a dedicated manager and call .Manager.set_default_instance or modify the .Manager.context_stack of the default manager.

Features

Traceback for failed tasks

The .DoitInterfaceReporter provides more verbose progress reports and points you to the location where a failing task was defined. The DOIT_CONFIG is used by default if you use .Manager.get_instance to get a .Manager.

reporter

>>> DOIT_CONFIG = {"reporter": DoitInterfaceReporter} >>> manager(basename="false", actions=["false"]) {'basename': 'false', 'actions': ['false'], 'meta': {'filename': '...', 'lineno': 1}}

$ doit
EXECUTE: false
FAILED: false (declared at ...:1)
...

Group tasks

Group tasks to easily execute all of them using .group_tasks. Tasks can be added to groups using a context manager (as shown below) or by calling the group to add an existing task. Groups can be nested arbitrarily.

group_tasks

>>> with group_tasks("vgg16") as vgg16: ... train = manager(basename="train", actions=[...]) ... validate = manager(basename="validate", actions=[...]) >>> vgg16 <doit_interface.contexts.group_tasks object at 0x...> named vgg16 with 2 tasks

Automatically create target directories

Use .create_target_dirs to automatically create directories for each of your targets. This can be particularly useful if you generate nested data structures, e.g., for machine learning results based on different architectures, seeds, optimizers, learning rates, etc.

create_target_dirs

>>> with create_target_dirs(): ... task = manager(basename="bar", targets=["foo/bar"], actions=[...]) >>> task["actions"] [(<function create_folder at 0x...>, ['foo']), ...]

Share default values across tasks

Use .defaults to share default values across tasks, such as file_dep.

defaults

>>> with defaults(file_dep=["data.pt"]): ... train = manager(basename="train", actions=[...]) ... validate = manager(basename="validate", actions=[...]) >>> train["file_dep"] ['data.pt'] >>> validate["file_dep"] ['data.pt']

Use tasks as file_dep or task_dep

.normalize_dependencies normalizes file and task dependencies such that task objects can be used as dependencies (in addition file and task names).

normalize_dependencies

>>> with normalize_dependencies(): ... base_task = manager(basename="base", name="output", targets=["output.txt"]) ... file_dep_task = manager(basename="file_dep_task", file_dep=[base_task]) ... task_dep_task = manager(basename="task_dep_task", task_dep=[base_task]) >>> file_dep_task["file_dep"] ['output.txt'] >>> task_dep_task["task_dep"] ['base:output']

Add prefixes to paths or other attributes

Path prefixes can be added using the .path_prefix context if file dependencies or targets share common directories. General prefixes are also available using .prefix.

path_prefix

>>> with path_prefix(targets="outputs", file_dep="inputs"): ... manager(basename="task", targets=["out.txt"], file_dep=["in1.txt", "in2.txt"]) {'basename': 'task', 'targets': ['outputs/out.txt'], 'file_dep': ['inputs/in1.txt', 'inputs/in2.txt'], ...}

Subprocess action

The .SubprocessAction lets you spawn subprocesses akin to doit.action.CmdAction yet with a few small differences. First, it does not capture output of the subprocess which is helpful for development but may add too much noise for deployment. Second, it supports Makefile style variable substitutions and f-string substitutions for any attribute of the parent task. Third, it allows for global environment variables to be set that are shared across all, e.g., to limit the number of OpenMP threads. You can use it by default for string-actions using the .SubprocessAction.use_as_default context.

Interface

doit_interface