Skip to content

example of using GitOps with Kubeflow Pipelines from deployKF

License

Notifications You must be signed in to change notification settings

deployKF/kubeflow-pipelines-gitops

Repository files navigation

GitOps for Kubeflow Pipelines

This repo demonstrates how GitOps can be used with Kubeflow Pipelines from deployKF.

NOTE:

  • This repo is about using GitOps to manage pipelines definitions and pipeline schedules NOT the platform itself.
  • This repo only supports Kubeflow Pipelines compiled in V1 mode.

Steps

This repository is logically grouped into four steps:

  1. Render Pipelines: demonstrates how to render pipelines
  2. Run Pipelines: demonstrates how run the rendered pipelines
  3. Schedule Pipelines: demonstrates how to schedule the rendered pipelines
  4. Automatic Reconciliation: demonstrates how to automatically reconcile the schedule configs

Real-World Usage

Unlike this demo, in the real world you typically store pipeline definitions and schedules in separate repositories.

For example, you may have the following repositories:

Repository Purpose Demo Steps Used
ml-project-1 pipeline definitions for "ml project 1" "Step 1: Render Pipelines"
"Step 2: Run Pipelines"
ml-project-2 pipeline definitions for "ml project 2"
ml-project-3 pipeline definitions for "ml project 3"
kfp-schedules schedules for all pipelines "Step 3: Run Schedule Pipelines"
"Step 4: Automatic Reconciliation"

Repository Contents

This repository contains the following content:

Directory Description
/.github/workflows/ reference GitHub Actions workflows
/common_python/ shared Python code
/common_scripts/ shared Bash scripts
/step-1--render-pipelines/ examples/scripts for rendering pipelines
/step-2--run-pipelines/ examples/scripts for running rendered pipelines
/step-3--schedule-pipelines/ examples/scripts for scheduling rendered pipelines

Step 1: Render Pipelines

The Kubeflow Pipelines SDK is a Python DSL which compiles down to Argo Workflow resources, the Kubeflow Pipelines backend is able to execute compiled pipelines on a Kubernetes cluster on a schedule.

To manage pipeline definitions/schedules with GitOps, we need a reliable way to render the pipelines from their "dynamic Python representation" into their "static YAML representation".

Example

You will find the following items under /step-1--render-pipelines/example_pipeline_1/:

File/Directory Description
./pipeline.py
  • A Python script containing a pipeline definition.
  • This script exposes an argument named --output-folder, which specifies where the rendered pipeline should be saved.
./render_pipeline.sh
  • A Bash script which invokes pipeline.py in a reproducible way, with static arguments.
  • This script uses shared code from /common_python/ and /common_scripts/ to ensure the rendered pipeline is only updated if the pipeline definition actually changes (rendered pipelines contain their build time).
./RENDERED_PIPELINE/
  • A directory containing the output of render_pipeline.sh.
  • This directory contains the following items:
./example_component.yaml
  • A YAML file containing the definition of a reusable kubeflow component.
  • This component is used by pipeline.py to define a step in the pipeline.

WARNING:

It is NOT recommended to run pipeline.py directly, but rather to use scripts like render_pipeline.sh that ensure the rendered pipeline is only updated if the pipeline definition actually changes.

TIP:

If each run of render_pipeline.sh results in a different rendered pipeline, your pipeline definition is not deterministic, for example, it might be using datetime.now() in the definition itself, rather than within a step.

If a step in your pipeline requires the current date/time, you may use the Argo Workflows "variables" feature to set a step's inputs:

  • {{workflow.creationTimestamp.RFC3339}} becomes the run-time of the workflow ("2030-01-01T00:00:00Z")
  • {{workflow.creationTimestamp.<STRFTIME_CHAR>}} becomes the run-time formatted by a single strftime character
    • TIP: custom time formats can be created using multiple variables, {{workflow.creationTimestamp.Y}}-{{workflow.creationTimestamp.m}}-{{workflow.creationTimestamp.d}} becomes "2030-01-01"

TIP:

Additional arguments may be added to pipeline.py so that the same pipeline definition can render multiple variants:

  • If you do this, you will need to create a separate render_pipeline.sh script for each variant, for example, render_pipeline_dev.sh, render_pipeline_test.sh, render_pipeline_prod.sh.
  • These scripts should be configured to render the pipeline into a separate directory, for example, RENDERED_PIPELINE_dev/, RENDERED_PIPELINE_test/, RENDERED_PIPELINE_prod/.

GitHub Actions

We provide the following GitHub Actions as reusable workflow templates under /.github/workflows/:

Workflow Template Description
./_check-pipelines-are-rendered.yaml
  • Takes a list named pipeline_render_scripts with paths to scripts like render_pipeline.sh, and runs them to prevent merging PRs which forget to run them.
  • See ./check-pipelines-are-rendered.yaml for an example of calling this workflow.

Step 2: Run Pipelines

Before scheduling a pipeline, developers will likely want to run it manually to ensure it works as expected.

As we have already rendered the pipeline in "step 1", we now need a way to run it.

Example

You will find the following items under /step-2--run-pipelines/example_pipeline_1/:

File/Directory Description
./run_pipeline.sh

Step 3: Schedule Pipelines

To manage the pipeline schedules with GitOps, we need a system with the following features:

  • Declarative Configs: The system should have a single set of configs which completely define the desired state of the scheduled pipelines.
  • Reconciliation: The system should be able to read the declarative configs, determine if the current state matches the configs, and if not, make the required changes to bring the current state into alignment with the configs.
  • Version Control: The system should store the declarative configs in a version control system, so that changes to the configs can be reviewed, and so that the history of changes can be viewed.

Example

You will find the following items under /step-3--schedule-pipelines/:

File/Directory Description
./team-1/
  • The folder containing the declarative configs for the team-1 profile/namespace.
./team-1/experiments.yaml
  • The declarative configs for KFP "Experiments" in the team-1 profile/namespace.
./team-1/recurring_runs.yaml
  • The declarative configs for KFP "Recurring Runs" in the team-1 profile/namespace.
./reconcile_team-1.sh
  • A bash script which triggers a one-time reconciliation of the configs under ./team-1/ to the team-1 profile/namespace.
  • This script makes use of the shared /common_python/reconcile_kfp.py script.

WARNING:

Because Kubeflow Pipelines is NOT able to update existing recurring runs (kubeflow/pipelines#3789), the reconciliation script uses the following process:

  1. creates a paused recurring run with the new definition
  2. pauses the existing recurring run
    • NOTE: in-progress runs will continue to run until completion
  3. unpauses the new recurring run
  4. deletes old versions of the recurring run until there are only keep_history versions remaining
    • WARNING: in-progress runs for the deleted versions will be immediately terminated

WARNING:

The only way to ensure a recurring run never has more than one active instance is to do ONE of the following:

  • set keep_history to 0 and job.max_concurrency to 1 (if your pipeline can safely be terminated at any time)
  • create a step at the beginning of your pipeline which checks if there is already a run in progress, and if so, exits

WARNING:

Removing a recurring run from the recurring_runs.yaml file will NOT pause or delete any recurring runs already in the cluster, to delete a recurring run requires the following steps:

  1. update the job.enabled flag to false for the recurring run (in the recurring_runs.yaml file)
  2. run the reconciliation script
  3. delete the recurring run from the recurring_runs.yaml file
  4. run the reconciliation script
  5. (optional) delete the remaining paused recurring runs using the KFP Web UI

GitHub Actions

We provide the following GitHub Actions as reusable workflow templates under /.github/workflows/:

Workflow Template Description
./_check-reconciliation-configs.yaml
  • Takes a list named reconciliation_config_folders with paths of folders containing reconciliation configs, so they can be checked for errors before merging PRs.
  • See ./check-reconciliation-configs.yaml for an example of calling this workflow.

Step 4: Automatic Reconciliation

For true GitOps, we need to ensure the state of the cluster is ALWAYS in sync with the configs in this repo.

Generally speaking, there are two approaches to achieve automatic reconciliation:

  1. PUSH-Based (GitHub Actions): whenever a change is pushed to GitHub, a job is triggered to reconcile the configs.
  2. PULL-Based (Kubernetes Deployment): a kubernetes deployment in the cluster periodically reconciles the configs.

PUSH-Based (GitHub Actions)

NOTE:

  • This approach requires GitHub Actions to have access to your Kubeflow Pipelines API, either by it being public, or by connecting it to your private network.
  • Drift is possible when the cluster state is changed outside the GitOps repo, this is because changes are only reverted when the next push occurs.

TBA

PULL-Based (Kubernetes Deployment)

TBA

About

example of using GitOps with Kubeflow Pipelines from deployKF

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published