Both pip install kedro
and conda install -c conda-forge kedro
install the core Kedro module, which includes the CLI tool, project template, pipeline abstraction, framework, and support for configuration.
When you create a project, you then introduce additional dependencies for the tasks it performs.
You can specify a project's exact dependencies in the pyproject.toml
file,
as well as any development dependencies in requirements.txt
,
to make it easier for you and others to run your project in the future and to avoid version conflicts downstream.
This can be achieved with the help of pip-tools
.
To install pip-tools
in your virtual environment, run the following command:
pip install pip-tools
To add or remove dependencies to a project, edit the requirements.txt
file, then run the following:
pip-compile --output-file=<project_root>/requirements.txt --input-file=<project_root>/requirements.txt
This will pip compile the requirements listed in
the requirements.txt
file into a requirements.lock
that specifies a list of pinned project dependencies
(those with a strict version). You can also use this command with additional CLI arguments such as --generate-hashes
to use pip
's Hash Checking Mode or --upgrade-package
to update specific packages to the latest or specific versions.
Check out the pip-tools
documentation for more information.
The `requirements.txt` and `pyproject.toml` files contain "source" requirements, while `requirements.lock` contains the compiled version of those and requires no manual updates.
To further update the project requirements, modify the requirements.txt
file (not requirements.lock
) and re-run the pip-compile
command above.
To install the project-specific dependencies, navigate to the root directory of the project and run:
pip install -r requirements.txt
To install all the dependencies recorded in Kedro's setup.py
, run:
pip install "kedro[all]"
The Data Catalog is your way of interacting with different data types in Kedro. The modular dependencies in this category include pandas
, numpy
, pyspark
, matplotlib
, pillow
, dask
, and more.
Data types are broken into groups e.g. pandas
, spark
and pickle
. Each group has a collection of data types e.g.pandas.CSVDataSet
, pandas.ParquetDataSet
and more. You can install dependencies for an entire group of dependencies as follows:
pip install "kedro-datasets[<group>]"
This installs Kedro and dependencies related to the data type group. An example of this could be a workflow that depends on the data types in pandas
. Run pip install "kedro-datasets[pandas]"
to install Kedro and the dependencies for the data types in the pandas
group.
To limit installation to dependencies specific to a data type:
pip install "kedro-datasets[<group>.<dataset>]"
For example, your workflow might require use of the pandas.ExcelDataSet
, so to install its dependencies, run pip install "kedro-datasets[pandas.ExcelDataSet]"
.