Skip to content

Releases: sky-uk/kfp-operator

v0.6.0

20 May 10:40
c009d4a
Compare
Choose a tag to compare

Namespace isolation of Vertex AI resources

In order to preserve isolation of resources from multiple namespaces submitting to a single Vertex AI project, resources (Schedules, Pipeline Runs, Pipeline definition storage location, Artefact Storage Location) are now prefixed with their origin namespace.


What's Changed

  • Reduce NATS connectionBackoff in eventsource by @grahamia in #323
  • Change to CONTAINER_REPOSITORIES for artifact repository location by @grahamia in #324
  • VAI Provider - Namespaced custom resources for pipelines (scheduled and one off) by @grahamia in #326

Full Changelog: v0.5.0...v0.6.0


Migration

Migrating from v0.5.0 to v0.6.0 will require re-compilation of all pipeline definitions and re-applying run-schedules. A
script is available to help trigger existing pipelines and remove existing run-schedules, these will be automatically recreated.
Note: Script requires jq version >1.7 and bash >4

CONTAINER_REPOSITORIES environment variable has replaced the existing CONTAINER_REGISTRY_HOSTS

v0.5.0

14 Mar 17:26
fcfc03d
Compare
Choose a tag to compare

Native Vertex AI Scheduler API

Due to Vertex AI's lack of support for scheduled pipeline runs, the KFP-Operator had to create Google Cloud Scheduler objects as well as PubSub subscriptions for managing enqueued and ongoing runs. This setup has now been superseded by Vertex AI's scheduler API as well as Vertex AI's task-level event logs.

This release now only uses the native scheduler within Vertex AI.

What's Changed

  • Remove VAI provider cloud scheduler support by @grahamia in #322

Migration

If you already have any running schedules in Google Cloud Scheduler, first upgrade to v0.4.1 then ensure all schedules have been migrated from Google Cloud Scheduler into Vertex AI Scheduler. This can be done by deleting all scheduler resources
> kubectl delete mlrs --all
and the KFP-Operator will automatically migrate all the schedules across.
Then upgrade to this version.

Full Changelog: v0.4.1...v0.5.0

v0.4.1

12 Mar 21:06
8a214b1
Compare
Choose a tag to compare

Native Vertex AI Scheduler API

Due to Vertex AI's lack of support for scheduled pipeline runs, the KFP-Operator had to create Google Cloud Scheduler objects as well as PubSub subscriptions for managing enqueued and ongoing runs. This setup has now been superseded by Vertex AI's scheduler API as well as Vertex AI's task-level event logs.

This release will migrate currently existing Cloud Scheduler scheduled runs on any update to the schedule and setup the updated schedule within Vertex AI Scheduler. Currently scheduled jobs will carry on working as normal.

The suggested process for migration is to delete all RunSchedule resources. The KFP-Operator will then reconcile the corresponding RunConfiguration resources by recreating schedules in Vertex AI.

The next release will have this migration process and all the legacy GCP Cloud Scheduler code/setup removed.

Improvements

  • Add metadata to status-updater sensor metadata #289
  • Complete RCs with one-off runs #291
  • Succeed RC when dependencies are not met #292
  • Refactor argo-events gRPC API dependency #296
  • Make topic in public eventbus configurable #301
  • Handle fullstops in pipeline version #306
  • Provide provider name in RunCompletionEvents #310
  • Support TFX 1.14 #297
  • Update quickstart image #314
  • Log-based events #311
  • Vertex AI Provider - migrate to using Vertex AI provided scheduler #319

Bugfixes

  • Fix intermittent decoupled test failures #312

Full Changelog: v0.4.0...v0.4.1

v0.4.0

04 Sep 15:22
8facb00
Compare
Choose a tag to compare

Training-Time Model Ensembling

We have introduced support for declaring dependencies between training pipelines at training time through produced and consumed artifacts.

Improvements

  • Add new RunConfigurations triggers
    • on changes to the referenced pipeline
    • on changes to the definition of the corresponding run
    • on completion of another run configuration
  • Expose artifacts to be consumed by a dependent run configuration

See https://sky-uk.github.io/kfp-operator/docs/getting-started/example/ for an in-depth example of training-time model ensembling

Bug fixes

  • Store and propagate provider in RunConfigurations #232
  • Filter Runschedules marked for deletion #235
  • Allow valid docker tags in pipeline identifier #281
  • Initialise ServingModelArtifacts in run completion events #282

Deprecation notes

  • All versions other than v1alpha5 are deprecated, and all resources should be upgraded to the latest schema
  • servingModelArtifacts in run completion events has been deprecated in favour of the more generic artifacts

Migration

After upgrading to this version, perform the following steps to ensure optimal behaviour:

  • Force re-upload of RunConfigurations by deleting all RunSchedules and triggering re-creation

v0.3.0

22 Mar 16:02
beb726b
Compare
Choose a tag to compare

Vertex AI Support

We have introduced support for managing machine learning resources on Vertex AI declaratively.

Improvements

  • Vertex AI support
  • Support multiple providers in a single KFP-Operator instance #171
  • Provider workflows now run in a dedicated namespace instead of the user namespace #183
  • Introduce one-off pipeline run resource #64
  • The eventing system has been redesigned (see updated docs for details) #89

Bug fixes

  • RunConfiguration runtime parameters not created #175

v0.2.1

09 Sep 10:06
ca8264d
Compare
Choose a tag to compare

Python 3.9 support

We have introduced support for TFX pipelines built using Python 3.9. This means, TFX is now supported up to it's recent release of 1.9.1.

Improvements

  • #160 Changes the compiler so that the pipelines Python version is detected and the respective compiler path is set.
  • #161 Has changed the way CRD version conversion works. We have made the decision to never error in version conversions and preserve incompatible fields in all versions. This allows the K8s API server and other components to keep requesting old versions, even if they are compatible.

v0.2.0

02 Sep 14:43
06bdf9b
Compare
Choose a tag to compare

Workflow Templates, Named Lists and Schema Conversions

This release increases schema versions to v1alpha3.
Schema conversions from v1alpha2 onwards now support Kubernetes CRD conversions, allowing users to migrate resources in their own time.

Improvements

  • #90 Argo Workflows have been refactored to use workflow templates stored in the cluster, which will be beneficial for upcoming work supporting the Vertex AI backend.

  • #31 All map fields in CRDs have been restructured to follow the K8s convention:

apiVersion: pipelines.kubeflow.org/v1alpha1
kind: Pipeline
metadata:
  name: pipeline-sample
spec:
  env:
    ENV_ARG: example  
  beamArgs:
    experiments: an_experiment 

will now be

apiVersion: pipelines.kubeflow.org/v1alpha3
kind: Pipeline
metadata:
  name: pipeline-sample
spec:
  env:
  - name: ENV_ARG
     value: example  
  beamArgs:
  - name: experiments
     value: an_experiment 

Consequently, beamArgs may now contain duplicate names, which will be passed on respectively.

  • #67 RunConfigurations can now train pipelines at specified versions in addition to tracking the latest changes. pipelineName has therefore been renamed to pipeline to allow specifying a pipeline with and without a version:
apiVersion: pipelines.kubeflow.org/v1alpha3
kind: RunConfiguration
metadata:
  name: pipeline-sample
spec:
  pipeline: pipeline-sample:257c1e6-440251

v0.1.1

01 Aug 10:21
e473366
Compare
Choose a tag to compare

CRD Version Downgrade

In previous versions, all CRDs have been released as v1, which doesn't represent the state of the project correctly. This release downgrades all CRD versions to valpha1 - allowing future releases to incrementally increase this version.

This is a breaking release and we recommend not using a version prior to this release. If you have installed a previous version and want to upgrade, you will have to manually migrate resources. Please reach out via https://github.com/sky-uk/kfp-operator/discussions if you need assistance.

v0.1.0

14 Jul 08:35
8d76b4d
Compare
Choose a tag to compare

Public Alpha

As part of this release, ownership labels in workflows have been renamed. This is a breaking change, and the upgrade requires manual migration detailed in the PR.

Improvements

  • #138 High Availability

Bug Fixes

None.

v0.0.3

20 Jun 13:43
d0fae7b
Compare
Choose a tag to compare
v0.0.3 Pre-release
Pre-release

Experiment Resources

#1 Introduces the new Experiment Custom Resource Definition which allows the declarative definition of scheduled KFP pipeline Experiments.

apiVersion: pipelines.kubeflow.org/v1
kind: Experiment

Improvements

  • #75 Rename Model Update Event Source to Run Completion Event Source which also emits events for failed pipeline runs
  • #74 Provide Kubernetes Events for all resource kinds
  • #101 Introduce ObservedGeneration Operator best practice

Bug Fixes

  • #105 Undeleted succeeded workflows succeed all future operations
  • #93 Fall back to entrypoint as pipeline name for scheduled pipeline runs
  • #118 Handle resource updates if runconfiguration doesn't exist in KFP
  • #91 Run Completion Event contains serving locations even when model was not pushed