INNDiE

INNDiE: An Integrated Neural Network Development Environment

A Computer Science and Robotic Engineering Major Qualifying Project submitted to the Faculty of Worcester Polytechnic Institute in partial fulfillment of the requirements for the degree of Bachelor of Science.

Core

The core project is responsible for understanding the operations the user wants to complete and executing them in the correct order. Typically, the output of the user's commands takes the form of code generation.

Code Generation

This project has a DSL which models the generated code. Project dsl-interface provides the interface for the components of the DSL. Project dsl provides the implementation. Other offshoot projects, such as tasks-yolov3, add model-specific implementation details that can be used with the DSL.

The flow of information through the DSL to the generated code is structured as follows:

Information is input using the DSL via the ScriptGenerator. This forms a "program" configuration which completely describes the code which will be generated.
This configuration is checked for correctness by the ScriptGenerator inside ScriptGenerator.code.
The configuration is parsed into a Code dependency graph using CodeGraph, which performs further correctness checks to verify the dependency graph is a singular DAG.
That graph is then traversed and parsed into a program by appending Code.code segments in traversal order, predecessors first.

If at any point the ScriptGenerator or any of its dependencies determines that the configuration is invalid, an error type is returned.

Modeling TensorFlow's Data

Adding a Layer

To add a layer,

Add it in SealedLayer as a sealed subclass. Any parameter validation should be tested.
Add test cases that use it in DefaultLayerToCodeTest and then add it as a case in DefaultLayerToCode.
Use it in LoadLayersFromHDF5::parseLayer and amend any tests that now fail because they previously did not understand the new layer type. If no tests fail, add a test that loads a model containing the new layer type.

Adding an Initializer

To add an initializer,

Add it in Initializer as a sealed subclass.
Add a case for it in DefaultInitializerToCode. Add test cases in DefaultInitializerToCodeTest.
Add a case for it in LoadLayersFromHDF5::initializer. Generate new models that use each variance of the new Initializer and add a test in LoadLayersWithInitializersIntegrationTest that loads each one.

Plugins

INNDiE uses a simple plugin system to generalize over many different datasets and models.

Dataset Plugins

You can use dataset plugins to control how datasets are processed before training. In the training script, after a dataset is loaded, it is given to a dataset plugin for processing, before being given to the model during training. This is when any model-specific processing or formatting should be done. Dataset plugins must implement this function:

def process_dataset(x, y):
    # Do some data processing in here and return the results.
    return (processed_x, processed_y)

In that function, x is the input data and y is the target data.

Test Data Loading Plugin

At the start of a test run, the test data must be loaded by a plugin. You can control how the test data is loaded and processed before being given to the model during inference using a test data loading plugin. These plugins must implement this function:

def load_test_data(path):
    # Load the dataset from `path`, process it, and return it
    # along with `steps`.
    return (data, steps)

In that function, path is the file path to the test data that the user requested be loaded. In the return statement, data is the final dataset that will be given directly to the model and steps is the number of steps to give to TensorFlow when running inference; consult the TensorFlow documentation for how to use steps.

Test Output Processing Plugin

After the inference step of a test run has completed, the input and output to/from the model is given to a test output processing plugin. This plugin is responsible for interpreting the output of the model and writing any test results to a folder in the current directory called output. Any files put into this directory will be presented to the user in INNDiE's test view UI. These plugins must implement this function:

def process_model_output(model_input, model_output):
    from pathlib import Path
    # Write test result files into the `output` directory.
    Path("output/file1.txt").touch()
    Path("output/file2.txt").touch()

In that function, model_input is the data given to the model for inference and model_output is the output directly from the model.

AWS Integration

S3 Directories Managed by INNDiE

Inside INNDiE's autogenerated S3 bucket (named with a prefix inndie-autogenerated- followed by some random alphanumeric characters for uniqueness), INNDiE manages these directories:

inndie-untrained-models
- Contains “untrained” models that the user can use to create a new Job with
- These models cannot be used for testing because they are assumed to not contain any weights (or at least not any meaningful weights)
inndie-training-results
- Contains all results from running a training script
inndie-test-data
- Contains test data files that can be used with the test view
inndie-datasets
- Contains the user's custom datasets
inndie-training-scripts
- Contains generated training scripts
inndie-training-progress
- Contains training progress files that INNDiE polls to get training progress updates
inndie-plugins
- Unofficial plugins are stored here

AWS Configuration

Security Group for ECS named inndie-autogenerated-ecs-sg
Security Group for EC2 named inndie-autogenerated-ec2-sg
Security Group for RDS named inndie-autogenerated-rds-sg
Task role for ECS named inndie-autogenerated-ecs-task-role
IAM role for EC2 named inndie-autogenerated-ec2-role
Instance profile for EC2 named inndie-autogenerated-ec2-instance-profile
ECS Cluster named inndie-autogenerated-cluster
ECS Task Definition named inndie-autogenerated-task-family

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github		.github
aws		aws
config		config
db-test-util		db-test-util
db		db
dependencies/tornadofx		dependencies/tornadofx
dsl-interface		dsl-interface
dsl-test-util		dsl-test-util
dsl		dsl
example-models		example-models
gradle		gradle
libraries		libraries
logging		logging
pattern-match		pattern-match
plugin		plugin
test-runner		test-runner
test-util		test-util
tf-data-code		tf-data-code
tf-data		tf-data
tf-layer-loader		tf-layer-loader
training-test-util		training-test-util
training		training
ui-javafx		ui-javafx
util		util
.codecov.yml		.codecov.yml
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
inndie.gradle.kts		inndie.gradle.kts
settings.gradle.kts		settings.gradle.kts

License

WPIRoboticsProjects/INNDiE

Folders and files

Latest commit

History

Repository files navigation