Skip to content

Latest commit

 

History

History
75 lines (49 loc) · 3.58 KB

dependencies.md

File metadata and controls

75 lines (49 loc) · 3.58 KB

Dependencies

Both pip install kedro and conda install -c conda-forge kedro install the core Kedro module, which includes the CLI tool, project template, pipeline abstraction, framework, and support for configuration.

When you create a project, you then introduce additional dependencies for the tasks it performs.

Project-specific dependencies

You can specify a project's exact dependencies in the pyproject.toml file, as well as any development dependencies in requirements.txt, to make it easier for you and others to run your project in the future and to avoid version conflicts downstream. This can be achieved with the help of pip-tools.

To install pip-tools in your virtual environment, run the following command:

pip install pip-tools

To add or remove dependencies to a project, edit the requirements.txt file, then run the following:

pip-compile --output-file=<project_root>/requirements.txt --input-file=<project_root>/requirements.txt

This will pip compile the requirements listed in the requirements.txt file into a requirements.lock that specifies a list of pinned project dependencies (those with a strict version). You can also use this command with additional CLI arguments such as --generate-hashes to use pip's Hash Checking Mode or --upgrade-package to update specific packages to the latest or specific versions. Check out the pip-tools documentation for more information.

The `requirements.txt` and `pyproject.toml` files contain "source" requirements, while `requirements.lock` contains the compiled version of those and requires no manual updates.

To further update the project requirements, modify the requirements.txt file (not requirements.lock) and re-run the pip-compile command above.

Install project-specific dependencies

To install the project-specific dependencies, navigate to the root directory of the project and run:

pip install -r requirements.txt

Workflow dependencies

To install all the dependencies recorded in Kedro's setup.py, run:

pip install "kedro[all]"

Install dependencies related to the Data Catalog

The Data Catalog is your way of interacting with different data types in Kedro. The modular dependencies in this category include pandas, numpy, pyspark, matplotlib, pillow, dask, and more.

Install dependencies at a group-level

Data types are broken into groups e.g. pandas, spark and pickle. Each group has a collection of data types e.g.pandas.CSVDataSet, pandas.ParquetDataSet and more. You can install dependencies for an entire group of dependencies as follows:

pip install "kedro-datasets[<group>]"

This installs Kedro and dependencies related to the data type group. An example of this could be a workflow that depends on the data types in pandas. Run pip install "kedro-datasets[pandas]" to install Kedro and the dependencies for the data types in the pandas group.

Install dependencies at a type-level

To limit installation to dependencies specific to a data type:

pip install "kedro-datasets[<group>.<dataset>]"

For example, your workflow might require use of the pandas.ExcelDataSet, so to install its dependencies, run pip install "kedro-datasets[pandas.ExcelDataSet]".