ScalingTL: a productionized transfer learning pipeline for UniRep

This project uses Metaflow on AWS Batch to scale training of UniRep in a reproducible, version controlled fashion. This training platform is served to the end user via a Dash frontend and model metadata is tracked via a MySQL database.

How to use:

Navigate to the website.
Select a starting model to train off of.
Upload your data or select a preloaded dataset.
Select your training parameters and initiate transfer learning!

What happens next is the webserver submits a job to the AWS Batch job queue. Batch will spin up an EC2 instance if none are currently operating in the compute environment. Then it will load the container with the model's dependencies from ECR, and the metaflow container with all the ScalingTL code on top of that. The model will train until the stopping criteria are met. Finally the model's weights will be saved to S3, and a registry of the model will be appended to the database.

Highlights:

Simple: transfer learning on new data is started via a few button clicks on the website.
Scalable: can scale to as many concurrent jobs as you want to pay for via AWS Batch.
Reproducible: all steps of training are version controlled via Metaflow.
Transparent: all trained models are available to everyone.

Use cases:

Good for a large org that wants to empower its users or employees to easily build performant models on new data. Not good for a single user building models on their own (better to use Metaflow's python module than a website).
Good for training on models that have large opportunities for transfer learning on new datasets with the the same data schema (e.g. UniRep, DeepLabCut, etc.). Not good for training a be-all end-all model on a singular or evolving dataset (the frontend as well as the database of all trained models would be unnecessary).

Architecture

Installing this Cloud Service For Your Own Use

Clone this repository: git clone https://github.com/elyall/ScalingTL
Build docker container by following docker/README.md.
Set up Batch compute environment by following metaflow/cloudformation/README.md.
Install database by following mysql/README.md. Place it on an EC2 instance on the metaflow VPC so that batch jobs can easily write to it.
Install frontend by following dash/README.md. Place it on an EC2 instance on the database's VPC (read: metaflow's VPC) so that it can read/write to it.
Navigate to your website, load in a dataset, and perform transfer learning!

Future Roadmap

Build out prediction pipeline.
Separate UniRep model from repo to allow starting model to be plug and play.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
dash		dash
docker		docker
metaflow/cloudformation		metaflow/cloudformation
mysql		mysql
.gitignore		.gitignore
README.md		README.md
TrainUniRep.py		TrainUniRep.py
__init__.py		__init__.py
architecture.png		architecture.png
data_IO.py		data_IO.py
data_utils.py		data_utils.py
db_tools.py		db_tools.py
requirements.txt		requirements.txt
unirep.py		unirep.py
unirep_tools.py		unirep_tools.py

elyall/ScalingTL

Folders and files

Latest commit

History

Repository files navigation

ScalingTL: a productionized transfer learning pipeline for UniRep

How to use:

Highlights:

Use cases:

Architecture

Installing this Cloud Service For Your Own Use

Future Roadmap

About

Resources

Stars

Watchers

Forks

Languages