GitOps for Kubeflow Pipelines

This repo demonstrates how GitOps can be used with Kubeflow Pipelines from deployKF.

NOTE:

This repo is about using GitOps to manage pipelines definitions and pipeline schedules NOT the platform itself.

This repo only supports Kubeflow Pipelines compiled in V1 mode.

Steps

This repository is logically grouped into four steps:

Render Pipelines: demonstrates how to render pipelines
Run Pipelines: demonstrates how run the rendered pipelines
Schedule Pipelines: demonstrates how to schedule the rendered pipelines
Automatic Reconciliation: demonstrates how to automatically reconcile the schedule configs

Real-World Usage

Unlike this demo, in the real world you typically store pipeline definitions and schedules in separate repositories.

For example, you may have the following repositories:

Repository	Purpose	Demo Steps Used
`ml-project-1`	pipeline definitions for "ml project 1"	"Step 1: Render Pipelines" "Step 2: Run Pipelines"
`ml-project-2`	pipeline definitions for "ml project 2"
`ml-project-3`	pipeline definitions for "ml project 3"
`kfp-schedules`	schedules for all pipelines	"Step 3: Run Schedule Pipelines" "Step 4: Automatic Reconciliation"

Repository Contents

This repository contains the following content:

Directory	Description
`/.github/workflows/`	reference GitHub Actions workflows
`/common_python/`	shared Python code
`/common_scripts/`	shared Bash scripts
`/step-1--render-pipelines/`	examples/scripts for rendering pipelines
`/step-2--run-pipelines/`	examples/scripts for running rendered pipelines
`/step-3--schedule-pipelines/`	examples/scripts for scheduling rendered pipelines

Step 1: Render Pipelines

The Kubeflow Pipelines SDK is a Python DSL which compiles down to Argo Workflow resources, the Kubeflow Pipelines backend is able to execute compiled pipelines on a Kubernetes cluster on a schedule.

To manage pipeline definitions/schedules with GitOps, we need a reliable way to render the pipelines from their "dynamic Python representation" into their "static YAML representation".

Example

You will find the following items under /step-1--render-pipelines/example_pipeline_1/:

File/Directory	Description
`./pipeline.py`	A Python script containing a pipeline definition. This script exposes an argument named `--output-folder`, which specifies where the rendered pipeline should be saved.
`./render_pipeline.sh`	A Bash script which invokes `pipeline.py` in a reproducible way, with static arguments. This script uses shared code from `/common_python/` and `/common_scripts/` to ensure the rendered pipeline is only updated if the pipeline definition actually changes (rendered pipelines contain their build time).
`./RENDERED_PIPELINE/`	A directory containing the output of `render_pipeline.sh`. This directory contains the following items: a `workflow.yaml` file containing the rendered Argo `Workflow` resource a `params/` directory containing a file for each pipeline parameter
`./example_component.yaml`	A YAML file containing the definition of a reusable kubeflow component. This component is used by `pipeline.py` to define a step in the pipeline.

WARNING:

It is NOT recommended to run pipeline.py directly, but rather to use scripts like render_pipeline.sh that ensure the rendered pipeline is only updated if the pipeline definition actually changes.

TIP:

If each run of render_pipeline.sh results in a different rendered pipeline, your pipeline definition is not deterministic, for example, it might be using datetime.now() in the definition itself, rather than within a step.

If a step in your pipeline requires the current date/time, you may use the Argo Workflows "variables" feature to set a step's inputs:

{{workflow.creationTimestamp.RFC3339}} becomes the run-time of the workflow ("2030-01-01T00:00:00Z")

{{workflow.creationTimestamp.<STRFTIME_CHAR>}} becomes the run-time formatted by a single strftime character

TIP: custom time formats can be created using multiple variables, {{workflow.creationTimestamp.Y}}-{{workflow.creationTimestamp.m}}-{{workflow.creationTimestamp.d}} becomes "2030-01-01"

TIP:

Additional arguments may be added to pipeline.py so that the same pipeline definition can render multiple variants:

If you do this, you will need to create a separate render_pipeline.sh script for each variant, for example, render_pipeline_dev.sh, render_pipeline_test.sh, render_pipeline_prod.sh.

These scripts should be configured to render the pipeline into a separate directory, for example, RENDERED_PIPELINE_dev/, RENDERED_PIPELINE_test/, RENDERED_PIPELINE_prod/.

GitHub Actions

We provide the following GitHub Actions as reusable workflow templates under /.github/workflows/:

Workflow Template	Description
`./_check-pipelines-are-rendered.yaml`	Takes a list named `pipeline_render_scripts` with paths to scripts like `render_pipeline.sh`, and runs them to prevent merging PRs which forget to run them. See `./check-pipelines-are-rendered.yaml` for an example of calling this workflow.

Step 2: Run Pipelines

Before scheduling a pipeline, developers will likely want to run it manually to ensure it works as expected.

As we have already rendered the pipeline in "step 1", we now need a way to run it.

Example

You will find the following items under /step-2--run-pipelines/example_pipeline_1/:

File/Directory	Description
`./run_pipeline.sh`	A bash script which triggers a one-time run of the pipeline rendered at `/step-1--render-pipelines/example_pipeline_1/RENDERED_PIPELINE/`. This script makes use of the shared `/common_python/run_pipeline.py` script.

Step 3: Schedule Pipelines

To manage the pipeline schedules with GitOps, we need a system with the following features:

Declarative Configs: The system should have a single set of configs which completely define the desired state of the scheduled pipelines.
Reconciliation: The system should be able to read the declarative configs, determine if the current state matches the configs, and if not, make the required changes to bring the current state into alignment with the configs.
Version Control: The system should store the declarative configs in a version control system, so that changes to the configs can be reviewed, and so that the history of changes can be viewed.

Example

You will find the following items under /step-3--schedule-pipelines/:

File/Directory	Description
`./team-1/`	The folder containing the declarative configs for the `team-1` profile/namespace.
`./team-1/experiments.yaml`	The declarative configs for KFP "Experiments" in the `team-1` profile/namespace.
`./team-1/recurring_runs.yaml`	The declarative configs for KFP "Recurring Runs" in the `team-1` profile/namespace.
`./reconcile_team-1.sh`	A bash script which triggers a one-time reconciliation of the configs under `./team-1/` to the `team-1` profile/namespace. This script makes use of the shared `/common_python/reconcile_kfp.py` script.

WARNING:

Because Kubeflow Pipelines is NOT able to update existing recurring runs (kubeflow/pipelines#3789), the reconciliation script uses the following process:

creates a paused recurring run with the new definition

pauses the existing recurring run

NOTE: in-progress runs will continue to run until completion

unpauses the new recurring run

deletes old versions of the recurring run until there are only keep_history versions remaining

WARNING: in-progress runs for the deleted versions will be immediately terminated

WARNING:

The only way to ensure a recurring run never has more than one active instance is to do ONE of the following:

set keep_history to 0 and job.max_concurrency to 1 (if your pipeline can safely be terminated at any time)

create a step at the beginning of your pipeline which checks if there is already a run in progress, and if so, exits

WARNING:

Removing a recurring run from the recurring_runs.yaml file will NOT pause or delete any recurring runs already in the cluster, to delete a recurring run requires the following steps:

update the job.enabled flag to false for the recurring run (in the recurring_runs.yaml file)

run the reconciliation script

delete the recurring run from the recurring_runs.yaml file

run the reconciliation script

(optional) delete the remaining paused recurring runs using the KFP Web UI

GitHub Actions

We provide the following GitHub Actions as reusable workflow templates under /.github/workflows/:

Workflow Template	Description
`./_check-reconciliation-configs.yaml`	Takes a list named `reconciliation_config_folders` with paths of folders containing reconciliation configs, so they can be checked for errors before merging PRs. See `./check-reconciliation-configs.yaml` for an example of calling this workflow.

Step 4: Automatic Reconciliation

For true GitOps, we need to ensure the state of the cluster is ALWAYS in sync with the configs in this repo.

Generally speaking, there are two approaches to achieve automatic reconciliation:

PUSH-Based (GitHub Actions): whenever a change is pushed to GitHub, a job is triggered to reconcile the configs.
PULL-Based (Kubernetes Deployment): a kubernetes deployment in the cluster periodically reconciles the configs.

PUSH-Based (GitHub Actions)

NOTE:

This approach requires GitHub Actions to have access to your Kubeflow Pipelines API, either by it being public, or by connecting it to your private network.

Drift is possible when the cluster state is changed outside the GitOps repo, this is because changes are only reverted when the next push occurs.

TBA

PULL-Based (Kubernetes Deployment)

TBA

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
common_python		common_python
common_scripts		common_scripts
step-1--render-pipelines		step-1--render-pipelines
step-2--run-pipelines/example_pipeline_1		step-2--run-pipelines/example_pipeline_1
step-3--schedule-pipelines		step-3--schedule-pipelines
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

deployKF/kubeflow-pipelines-gitops

Folders and files

Latest commit

History

Repository files navigation

GitOps for Kubeflow Pipelines

Steps

Real-World Usage

Repository Contents

Step 1: Render Pipelines

Example

GitHub Actions

Step 2: Run Pipelines

Example

Step 3: Schedule Pipelines

Example

GitHub Actions

Step 4: Automatic Reconciliation

PUSH-Based (GitHub Actions)

PULL-Based (Kubernetes Deployment)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages