Skip to content

Commit

Permalink
feat(sagemaker): add Model L2 construct (#22549)
Browse files Browse the repository at this point in the history
This is the first of three PRs to complete the implementation of RFC 431:
aws/aws-cdk-rfcs#431

fixes #2809

Co-authored-by: Matt McClean <mmcclean@amazon.com>
Co-authored-by: Long Yao <yl1984108@gmail.com>
Co-authored-by: Drew Jetter <60628154+jetterdj@users.noreply.github.com>
Co-authored-by: Murali Ganesh <59461079+foxpro24@users.noreply.github.com>
Co-authored-by: Abilash Rangoju <988529+rangoju@users.noreply.github.com>


----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
petermeansrock committed Nov 9, 2022
1 parent 834fab4 commit 93915f1
Show file tree
Hide file tree
Showing 30 changed files with 3,902 additions and 20 deletions.
143 changes: 131 additions & 12 deletions packages/@aws-cdk/aws-sagemaker/README.md
Expand Up @@ -9,31 +9,150 @@
>
> [CFN Resources]: https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_lib
![cdk-constructs: Experimental](https://img.shields.io/badge/cdk--constructs-experimental-important.svg?style=for-the-badge)

> The APIs of higher level constructs in this module are experimental and under active development.
> They are subject to non-backward compatible changes or removal in any future version. These are
> not subject to the [Semantic Versioning](https://semver.org/) model and breaking changes will be
> announced in the release notes. This means that while you may use them, you may need to update
> your source code when upgrading to a newer version of this package.
---

<!--END STABILITY BANNER-->

This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) project.
Amazon SageMaker provides every developer and data scientist with the ability to build, train, and
deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the
entire machine learning workflow to label and prepare your data, choose an algorithm, train the
model, tune and optimize it for deployment, make predictions, and take action. Your models get to
production faster with much less effort and lower cost.

## Installation

Install the module:

```console
$ npm i @aws-cdk/aws-sagemaker
```

Import it into your code:

```typescript
import * as sagemaker from '@aws-cdk/aws-sagemaker';
```

## Model

To create a machine learning model with Amazon Sagemaker, use the `Model` construct. This construct
includes properties that can be configured to define model components, including the model inference
code as a Docker image and an optional set of separate model data artifacts. See the [AWS
documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-marketplace-develop.html)
to learn more about SageMaker models.

### Single Container Model

```ts nofixture
In the event that a single container is sufficient for your inference use-case, you can define a
single-container model:

```typescript
import * as sagemaker from '@aws-cdk/aws-sagemaker';
import * as path from 'path';

const image = sagemaker.ContainerImage.fromAsset(path.join('path', 'to', 'Dockerfile', 'directory'));
const modelData = sagemaker.ModelData.fromAsset(path.join('path', 'to', 'artifact', 'file.tar.gz'));

const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
containers: [
{
image: image,
modelData: modelData,
}
]
});
```

<!--BEGIN CFNONLY DISCLAIMER-->
### Inference Pipeline Model

An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple
containers that process requests for inferences on data. See the [AWS
documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html) to learn
more about SageMaker inference pipelines. To define an inference pipeline, you can provide
additional containers for your model:

There are no official hand-written ([L2](https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_lib)) constructs for this service yet. Here are some suggestions on how to proceed:
```typescript
import * as sagemaker from '@aws-cdk/aws-sagemaker';

- Search [Construct Hub for SageMaker construct libraries](https://constructs.dev/search?q=sagemaker)
- Use the automatically generated [L1](https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_l1_using) constructs, in the same way you would use [the CloudFormation AWS::SageMaker resources](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_SageMaker.html) directly.
declare const image1: sagemaker.ContainerImage;
declare const modelData1: sagemaker.ModelData;
declare const image2: sagemaker.ContainerImage;
declare const modelData2: sagemaker.ModelData;
declare const image3: sagemaker.ContainerImage;
declare const modelData3: sagemaker.ModelData;

const model = new sagemaker.Model(this, 'InferencePipelineModel', {
containers: [
{ image: image1, modelData: modelData1 },
{ image: image2, modelData: modelData2 },
{ image: image3, modelData: modelData3 }
],
});
```

### Container Images

<!--BEGIN CFNONLY DISCLAIMER-->
Inference code can be stored in the Amazon EC2 Container Registry (Amazon ECR), which is specified
via `ContainerDefinition`'s `image` property which accepts a class that extends the `ContainerImage`
abstract base class.

There are no hand-written ([L2](https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_lib)) constructs for this service yet.
However, you can still use the automatically generated [L1](https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_l1_using) constructs, and use this service exactly as you would using CloudFormation directly.
#### Asset Image

For more information on the resources and properties available for this service, see the [CloudFormation documentation for AWS::SageMaker](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_SageMaker.html).
Reference a local directory containing a Dockerfile:

(Read the [CDK Contributing Guide](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and submit an RFC if you are interested in contributing to this construct library.)
```typescript
import * as sagemaker from '@aws-cdk/aws-sagemaker';
import * as path from 'path';

const image = sagemaker.ContainerImage.fromAsset(path.join('path', 'to', 'Dockerfile', 'directory'));
```

<!--END CFNONLY DISCLAIMER-->
#### ECR Image

Reference an image available within ECR:

```typescript
import * as ecr from '@aws-cdk/aws-ecr';
import * as sagemaker from '@aws-cdk/aws-sagemaker';

const repository = ecr.Repository.fromRepositoryName(this, 'Repository', 'repo');
const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
```

### Model Artifacts

If you choose to decouple your model artifacts from your inference code (as is natural given
different rates of change between inference code and model artifacts), the artifacts can be
specified via the `modelData` property which accepts a class that extends the `ModelData` abstract
base class. The default is to have no model artifacts associated with a model.

#### Asset Model Data

Reference local model data:

```typescript
import * as sagemaker from '@aws-cdk/aws-sagemaker';
import * as path from 'path';

const modelData = sagemaker.ModelData.fromAsset(path.join('path', 'to', 'artifact', 'file.tar.gz'));
```

#### S3 Model Data

Reference an S3 bucket and object key as the artifacts for a model:

```typescript
import * as s3 from '@aws-cdk/aws-s3';
import * as sagemaker from '@aws-cdk/aws-sagemaker';

const bucket = new s3.Bucket(this, 'MyBucket');
const modelData = sagemaker.ModelData.fromBucket(bucket, 'path/to/artifact/file.tar.gz');
```
83 changes: 83 additions & 0 deletions packages/@aws-cdk/aws-sagemaker/lib/container-image.ts
@@ -0,0 +1,83 @@
import * as ecr from '@aws-cdk/aws-ecr';
import * as assets from '@aws-cdk/aws-ecr-assets';
import { Construct } from 'constructs';
import { Model } from './model';
import { hashcode } from './private/util';

/**
* The configuration for creating a container image.
*/
export interface ContainerImageConfig {
/**
* The image name. Images in Amazon ECR repositories can be specified by either using the full registry/repository:tag or
* registry/repository@digest.
*
* For example, `012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>:latest` or
* `012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>@sha256:94afd1f2e64d908bc90dbca0035a5b567EXAMPLE`.
*/
readonly imageName: string;
}

/**
* Constructs for types of container images
*/
export abstract class ContainerImage {
/**
* Reference an image in an ECR repository
*/
public static fromEcrRepository(repository: ecr.IRepository, tag: string = 'latest'): ContainerImage {
return new EcrImage(repository, tag);
}

/**
* Reference an image that's constructed directly from sources on disk
* @param directory The directory where the Dockerfile is stored
* @param options The options to further configure the selected image
*/
public static fromAsset(directory: string, options: assets.DockerImageAssetOptions = {}): ContainerImage {
return new AssetImage(directory, options);
}

/**
* Called when the image is used by a Model
*/
public abstract bind(scope: Construct, model: Model): ContainerImageConfig;
}

class EcrImage extends ContainerImage {
constructor(private readonly repository: ecr.IRepository, private readonly tag: string) {
super();
}

public bind(_scope: Construct, model: Model): ContainerImageConfig {
this.repository.grantPull(model);

return {
imageName: this.repository.repositoryUriForTag(this.tag),
};
}
}

class AssetImage extends ContainerImage {
private asset?: assets.DockerImageAsset;

constructor(private readonly directory: string, private readonly options: assets.DockerImageAssetOptions = {}) {
super();
}

public bind(scope: Construct, model: Model): ContainerImageConfig {
// Retain the first instantiation of this asset
if (!this.asset) {
this.asset = new assets.DockerImageAsset(scope, `ModelImage${hashcode(this.directory)}`, {
directory: this.directory,
...this.options,
});
}

this.asset.repository.grantPull(model);

return {
imageName: this.asset.imageUri,
};
}
}
4 changes: 4 additions & 0 deletions packages/@aws-cdk/aws-sagemaker/lib/index.ts
@@ -1,2 +1,6 @@
export * from './container-image';
export * from './model';
export * from './model-data';

// AWS::SageMaker CloudFormation Resources:
export * from './sagemaker.generated';
93 changes: 93 additions & 0 deletions packages/@aws-cdk/aws-sagemaker/lib/model-data.ts
@@ -0,0 +1,93 @@
import * as s3 from '@aws-cdk/aws-s3';
import * as assets from '@aws-cdk/aws-s3-assets';
import { Construct } from 'constructs';
import { IModel } from './model';
import { hashcode } from './private/util';

// The only supported extension for local asset model data
// https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-sagemaker-model-containerdefinition.html#cfn-sagemaker-model-containerdefinition-modeldataurl
const ARTIFACT_EXTENSION = '.tar.gz';

/**
* The configuration needed to reference model artifacts.
*/
export interface ModelDataConfig {
/**
* The S3 path where the model artifacts, which result from model training, are stored. This path
* must point to a single gzip compressed tar archive (.tar.gz suffix).
*/
readonly uri: string;
}

/**
* Model data represents the source of model artifacts, which will ultimately be loaded from an S3
* location.
*/
export abstract class ModelData {
/**
* Constructs model data which is already available within S3.
* @param bucket The S3 bucket within which the model artifacts are stored
* @param objectKey The S3 object key at which the model artifacts are stored
*/
public static fromBucket(bucket: s3.IBucket, objectKey: string): ModelData {
return new S3ModelData(bucket, objectKey);
}

/**
* Constructs model data that will be uploaded to S3 as part of the CDK app deployment.
* @param path The local path to a model artifact file as a gzipped tar file
* @param options The options to further configure the selected asset
*/
public static fromAsset(path: string, options: assets.AssetOptions = {}): ModelData {
return new AssetModelData(path, options);
}

/**
* This method is invoked by the SageMaker Model construct when it needs to resolve the model
* data to a URI.
* @param scope The scope within which the model data is resolved
* @param model The Model construct performing the URI resolution
*/
public abstract bind(scope: Construct, model: IModel): ModelDataConfig;
}

class S3ModelData extends ModelData {
constructor(private readonly bucket: s3.IBucket, private readonly objectKey: string) {
super();
}

public bind(_scope: Construct, model: IModel): ModelDataConfig {
this.bucket.grantRead(model);

return {
uri: this.bucket.urlForObject(this.objectKey),
};
}
}

class AssetModelData extends ModelData {
private asset?: assets.Asset;

constructor(private readonly path: string, private readonly options: assets.AssetOptions) {
super();
if (!path.toLowerCase().endsWith(ARTIFACT_EXTENSION)) {
throw new Error(`Asset must be a gzipped tar file with extension ${ARTIFACT_EXTENSION} (${this.path})`);
}
}

public bind(scope: Construct, model: IModel): ModelDataConfig {
// Retain the first instantiation of this asset
if (!this.asset) {
this.asset = new assets.Asset(scope, `ModelData${hashcode(this.path)}`, {
path: this.path,
...this.options,
});
}

this.asset.grantRead(model);

return {
uri: this.asset.httpUrl,
};
}
}

0 comments on commit 93915f1

Please sign in to comment.