Skip to content

Latest commit

 

History

History
70 lines (56 loc) · 3.65 KB

README.md

File metadata and controls

70 lines (56 loc) · 3.65 KB

SLSA for Models

This project shows how we can generate SLSA provenance for ML models on GitHub Actions and Google Cloud Platform.

SLSA was originally developed for traditional software to protect against tampering with builds, such as in the Solarwinds attack, and this project is a proof of concept that the same supply chain protections can be applied to ML.

When users download a given version of a model they can also check its provenance. This can be integrated in the model hub and/or model serving platforms: for example the model serving pipeline could validate provenance for all new models before serving them. However, the verification can also be done manually, on demand.

As an additional benefit, having provenance for a model allows users to react to vulnerabilities in a training framework: they can quickly determine if a model needs to be retrained because it was created using a vulnerable version.

See the guides for GitHub Actions and Google Cloud Platform for details.

Models

We support both TensorFlow and PyTorch models. The example repo trains a model on CIFAR10 dataset, saves it in one of the supported formats, and generates provenance for the output. The supported formats are:

Workflow Argument Training Framework Model format
tensorflow_model.keras TensorFlow Keras format (default)
tensorflow_hdf5_model.h5 TensorFlow Legacy HDF5 format
tensorflow_hdf5.weights.h5 TensorFlow Legacy HDF5 weights only format
pytorch_model.pth PyTorch PyTorch default format
pytorch_full_model.pth PyTorch PyTorch complete model format
pytorch_jitted_model.pt PyTorch PyTorch TorchScript format

While most of the ML models are currently too expensive to train, future work will cover the training of ML models that require access to accelerators (i.e., GPUs, TPUs) or that require multiple hours for training.

Future Work

Accelerators

Future work will involve covering training ML models that require access to accelerators (i.e., GPUs, TPUs).

Platforms

While our examples have targeted GitHub Actions and Tekton in GCP, we aim to bring support for other platforms (e.g., GCB and GitLab) and model training environments.

Directory Format

TensorFlow also supports saving models in SavedModel format. This is a directory-based serialization format and currently we don't fully support this. We can generate SLSA provenance for all the files in the directory but there are caveats regarding verification. Furthermore, because there is a difference between the hashes generated by provenance and the hash generated during model signing, we have decided to add support for these model formats at a future time, after standardizing a way to generate and verify provenance in SLSA (in general, not just for ML).