SLSA for Models

This project shows how we can generate SLSA provenance for ML models on GitHub Actions and Google Cloud Platform.

SLSA was originally developed for traditional software to protect against tampering with builds, such as in the Solarwinds attack, and this project is a proof of concept that the same supply chain protections can be applied to ML.

When users download a given version of a model they can also check its provenance. This can be integrated in the model hub and/or model serving platforms: for example the model serving pipeline could validate provenance for all new models before serving them. However, the verification can also be done manually, on demand.

As an additional benefit, having provenance for a model allows users to react to vulnerabilities in a training framework: they can quickly determine if a model needs to be retrained because it was created using a vulnerable version.

See the guides for GitHub Actions and Google Cloud Platform for details.

Models

We support both TensorFlow and PyTorch models. The example repo trains a model on CIFAR10 dataset, saves it in one of the supported formats, and generates provenance for the output. The supported formats are:

Workflow Argument	Training Framework	Model format
`tensorflow_model.keras`	TensorFlow	Keras format (default)
`tensorflow_hdf5_model.h5`	TensorFlow	Legacy HDF5 format
`tensorflow_hdf5.weights.h5`	TensorFlow	Legacy HDF5 weights only format
`pytorch_model.pth`	PyTorch	PyTorch default format
`pytorch_full_model.pth`	PyTorch	PyTorch complete model format
`pytorch_jitted_model.pt`	PyTorch	PyTorch TorchScript format

While most of the ML models are currently too expensive to train, future work will cover the training of ML models that require access to accelerators (i.e., GPUs, TPUs) or that require multiple hours for training.

Future Work

Accelerators

Future work will involve covering training ML models that require access to accelerators (i.e., GPUs, TPUs).

Platforms

While our examples have targeted GitHub Actions and Tekton in GCP, we aim to bring support for other platforms (e.g., GCB and GitLab) and model training environments.

Directory Format

TensorFlow also supports saving models in SavedModel format. This is a directory-based serialization format and currently we don't fully support this. We can generate SLSA provenance for all the files in the directory but there are caveats regarding verification. Furthermore, because there is a difference between the hashes generated by provenance and the hash generated during model signing, we have decided to add support for these model formats at a future time, after standardizing a way to generate and verify provenance in SLSA (in general, not just for ML).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SLSA for Models

Models

Future Work

Accelerators

Platforms

Directory Format

Files

README.md

Latest commit

History

README.md

File metadata and controls

SLSA for Models

Models

Future Work

Accelerators

Platforms

Directory Format