Skip to content

Kedro new project creation ‐ how it works

Ahdra Merali edited this page Jan 23, 2024 · 5 revisions

Kedro Project Creation - The Developer Docs

The kedro new command allows users to create a new project. This project can be customised to suit the user's needs; they can provide their specifications through several different paths:

Argument Through interactive flow Through CLI flag Through config file
Project name Yes Yes; if not provided, interactive flow will be triggered Yes; if not provided, error is thrown.
Tools Yes Yes; if not provided, interactive flow will be triggrered Yes; if not provided, default value of none will be used
Example pipeline Yes Yes; if not provided, interactive flow will be triggered Yes; if not provided, default value of no will be used
Starter No Yes; cannot be used with tools or example No
Checkout No Yes; cannot be used without starter, project version used if not provided
Directory No Yes; cannot be used without a starter, cannot be used with Kedro starter alias
Config No Yes No

Invoking the command will trigger the following execution path:

image Link to the Miro board

Let's explore this in a little more detail.

Validate CLI flags

As noted in the table above, some CLI flags cannot be used in combination which each other. At this stage in the execution, we check for the presence for any of the following invalid CLI flag combinations:

  • --checkout AND NO --starter
  • --directory AND NO --starter
  • --starter AND (--tools OR --example)
  • --directory AND --starter IF starter provided is one of Kedro starters

After this validation the directory and path to project template are updated according to the inputs, bringing us to the next step:

Setup cookiecutter

First, we fetch the path to a cookiecutter template project directory. In this template project, we look at any prompts.yml in the template and collect the prompts required for the project. If the user's desired project name, tools selection, or example code selection has already been provided through the command flags, we validate them and delete the respective prompts from the collection.

With the collection of necessary prompts, the execution proceeds to the next step.

Get the cookiecutter context

To proceed, we must first check if a config file is included. If one is included, we don't need to launch the interactive flow.

If a config file is provided

  1. Validate the file can be loaded
  2. Validate tools or example_pipeline selection wasn't included in config if starter was provided
  3. Validate all necessary prompt values are provided in the config file
  4. Validate the output directory is valid, if specified
  5. Validate the provided project name matches the format expected
  6. Validate the example pipeline selection matches the format expected, and parse to either "True" or "False"
  7. Validate the tools selected are all valid tools, and that if none or all were selected, they were not selected with any other tools
  8. Parse the validated selection to full readable names

If a config file isn't provided

  1. For each prompt, get the user's input. Each input is validated against the relevant regex specified in prompts.yml
  2. If tools are provided, parse any ranges into a list of numbers, validating that any ranges are correctly specified (smaller to larger number), and that the end of the range isn't outside the range of available tools
  3. Convert the list of numbers to tools names
  4. Parse any example pipeline selection to either "True" or "False"

Update cookiecutter's extra_context with CLI values

Currently, any values provided by CLI flag will overwrite any provided in config (remember user prompts won't ask for any input if values were provided in the CLI). Tools provided via CLI are parsed into a list of the full tool names.

Set default for required fields

Though not required by cookiecutter for our project creation, we require some values to be populated in the new project's pyproject.toml for telemetry purposes. This includes the project's Kedro version, the tools selection, and the example pipeline selection. As the user has no way to specify the former, and is not always required to specify the latter two, we set default values to be used instead.

Tip

When making changes expected in pyproject.toml, make sure to update the expected values in ProjectMetadata() accordingly

Note

The default value for tools, str(["None"]), may strike you as odd, and similarly, the values passed as the tools selection to cookiecutter are all string-wrapped lists. This is done because cookiecutter treats lists as possible options, only populating the placeholders in pyproject.toml with one item from the list. Instead, to pass the whole list through, we wrap it in a string, and unwrap it when it's populated in the placeholder.

Collect cookiecutter arguments and create project

After collecting all the project specifications, we ensure that in the case that a starter was selected, any specified directory and checkout values are passed to cookiecutter to ensure the correct project template is used for creating the project. Additionally, any tools and example pipeline selection will determine which template is used. We collect the path to the correct template project and the specified arguments for cookiecutter, and call cookiecutter() to create the project.

Post-project creation hook.

With cookiecutter, you can specify hooks to run before or after its project creation execution. We make use of the post project generation hook to make changes to our generated project. The template project includes all files and requirements necessary for all tools we provide, before completing the project generation we must ensure it is modified in line with what the user requires.

  1. We go through every tools option and check if they are included in the user's selection. If they are not included, we remove the related setup for that tool in the generated project
  2. We sort the requirements in the generated project to be in alphabetical order

Note

We previously created sort requirements as the first iteration would inject the necessary requirements. Now that we opt for removal, is this step still necessary?

Print success messages

Finally, our generated project is now ready and suited to the user's specifications. We print a success message. If no starter was used, we also print the selections for tools and example pipeline. The process then finishes here.

Clone this wiki locally