Skip to content

Latest commit

History

History
46 lines (41 loc) 路 29 KB

File metadata and controls

46 lines (41 loc) 路 29 KB

It is important that you make it easy for users with different levels of experience to use your algorithm in AWS Marketplace. To do this, we recommend that you include the following elements as part of your AWS Marketplace algorithm listing.

Short Product Description section

# Section Description Mandatory/Highly Recommended Sample Example
SD1 Short product description List most important product use case and supported input content type (short description). Mandatory An AutoML algorithm that trains a multi-layer stack ensemble model to predict on regression/classification datasets directly from CSV data.

Product overview section

# Section Description Mandatory/Highly Recommended Sample Example
PO1 Product overview Describe algorithm category (e.g. Tree, Neural Net, Ensemble). Mandatory An AutoML algorithm that trains a multi-layer stack ensemble model to predict on regression/classification datasets directly from CSV data.
PO2 Product overview Summarize how the algorithm works including any feature engineering. Mandatory AutoGluon-Tabular can save time by automating time-consuming manual steps鈥攈andling missing data, manual feature transformations, data splitting, model selection, algorithm selection, hyperparameter selection and tuning, ensembling multiple models, and repeating the process when data changes.
PO3 Product overview List the core framework that the model/algorithm was built on. Highly Recommended Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers.
PO4 Product overview Differentiated capabilities of model/algorithm. Mandatory Dynamic factor models (DFMs) can be used to analyze and forecast large sets of time-series, such as measurements and indicators of national or multinational economies, prices of products or instruments constantly traded in markets, measurements and observations of natural or engineering processes, and trends in social media or sports tournaments.
The evolutions of these time-series are influenced by evolutions of a number of unobserved factors commonly affecting all or many of the time-series. A long-memory DFM can estimate influences of longer histories of factor evolutions.
PO5 Product overview List most important use case(s) for this product. Mandatory The long-memory dynamic factor model (LMDFM) algorithm is developed to analyze and forecast large sets of time-series when the time-series are influenced by evolution histories of a number of unobserved factors commonly affecting all or many of the time-series. By applying objective data-driven constraints, the LMDFM algorithm can estimate the influences of longer histories of common factors.
The algorithm accommodates wider ranges of values of model parameters, especially model learning parameters. The wider ranges can further enhance the power of machine learning.

Current version of the LMDFM algorithm estimates: (a) dynamic factor loadings matrixes, (b) vector autoregressive (VAR) coefficients of the factors, (c) time-series of factor scores, and (d) forecasts of the set of time-series.

Highlights section

# Section Description Mandatory/Highly Recommended Sample Example
H1 Highlights Summarize algorithm performance metric. Mandatory In benchmarks from the AutoGluon-Tabular paper, AutoGluon outperformed many popular open-source/commercial AutoML platforms on 50 classification/regression datasets from Kaggle/OpenML. AutoGluon is faster, more robust, and much more accurate than other tools, even outperforming the best-of five other AutoML platforms on most datasets. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after just four hours of training.
H2 Highlights Summarize Model performance metric. Must-have for an algorithm with a pre-trained model. Highly recommended for others. The model applying learning on top of an existing model trained on Pascal VOC 2007-2012 dataset extended with 200 annotated images from the XXXX dataset in addition to 4,000 augmented images to represent blur and foggy conditions.

The algorithm accepts customer data and fine tunes the base ML model further to achieve higher Mean Average Precision (MaP) in short time.
H3 Highlights Specify inference latency metric and/or transaction per second on recommended Amazon SageMaker compute instance. Mandatory The avg response time for a single image single vehicle inference on the compute optimized ml.c5.2xlarge instance with 8 vCPUs & 16 GB Memory is approximately 3.25 secs.
H4 Highlights Specify if algorithm is compatible with, for example, model auto tuning functionality, distributed training, GPU on Amazon SageMaker. Mandatory The model can be trained using automatic model tuning capability and you can specify multiple instances while running a model training job.
H5 Highlights Applicable research paper/repo related to model/algorithm. Highly Recommended ArXiv Publication- AutoGluon-Tabular

Usage Information section

# Section Description Mandatory/Highly Recommended Sample Example
US1 Usage Instructions Describe how to use algorithm: data pre-processing guidelines, training data format (e.g, mandatory fields), minimum data row requirements, and guidelines to create good model (e.g., identify important hyperparameters and values).

Also, add details of approximate duration of training based on data. Clarify any feature engineering (e.g., scaling, imputing of values) performed by the algorithm.
Mandatory The algorithm is a X pass algorithm and a training on XXX.XXXX instance for a dataset of size XX MB takes XX minutes of compute.You are recommended to provide 50 rows required in the training data.
For large datasets with > 100,000 rows or > 1,000 columns, use larger instance type.

For best results, provide all your data to AutoGluon as train_data rather than splitting a validation set yourself and specify which eval_metric will be used to evaluate predictions.

The algorithm handles missing data, manual feature transformations, data splitting, model selection, algorithm selection, hyperparameter selection and tuning, ensembling multiple models, and repeating the process when data changes.
US2 Usage Instructions Mime-type for input data. Mandatory Supported MIME Content Types: text/csv.
US3 Usage Instructions Input data limitations (text) - for supervised learning algorithms, describe how are labeled data provided to the algorithm. Mandatory The first line of your CSV file should contain names for each column. Columns in your CSV file can be strings/text-fields/Numeric.
US4 Usage Instructions Format and description for inference input for trained model. Mandatory AutoGluon-Tabular requires no manual data preprocessing as long as your data is a valid CSV table. Your data must contain the column that you identify as 'label' in your hyperparameter configuration.
US5 Usage Instructions Mime-type for inference output. Mandatory Content type: text/plain.
US6 Usage Instructions Format and description for inference output (text). Mandatory For this license plate image, the ML Model returned following output.
Sample output: KL40L5577.

If your output is complex, here is the sample description of output for your reference.

The model returns JSON object detections that includes an array with individual elements for each face detected. Each element has two attributes:
1) box_points: includes the bounding box pixels of the detected face. The first value represents XX, second value represents XX, third value XX, and fourth value XX.
2) classes: no_mask represents the probability score that the bounding box does not include mask. When multiple faces are detected in the image multiple inferences are returned as part of the array...
US7 Usage Instructions Provide example to pre-process data (text). Highly Recommended

Additional Resources section

# Section Description Mandatory/Highly Recommended Sample Example
AR1 Additional Resources Provide a validated notebook, data and other resources in GitHub. Note, this notebook and sample data will also be verified by MCO.

Prepare a notebook using this template - https://github.com/awslabs/amazon-sagemaker-examples/tree/master/aws_marketplace/curating_aws_marketplace_listing_and_sample_notebook/Algorithm.
Mandatory https://github.com/awslabs/amazon-sagemaker-examples/tree/master/aws_marketplace/using_algorithms/autogluon
AR2 Additional Resources Link to sample training input data. Mandatory http://timeseriesclassification.com/Downloads/ECG200.zip
AR3 Additional Resources Links for additional resources such as architecture diagram, related listings to integrate model with other applications and services. Highly Recommended A blog-post or a link such as this which explains architecture as well as process for using the model in a real world application :聽VITech Lab Healthcare introduces Automated PPE compliance control on Amazon Web Services.
AR4 Additional Resources Sample inference input data for real-time invocation (text or link on Github). Mandatory https://gitlab.qdatalabs.com/quantiphi-sagemaker-marketplace-examples/vehicle-license-plate-recognition/tree/master/data/output/batch
AR5 Additional Resources Sample inference input data for batch invocation (link on Github). Mandatory https://gitlab.qdatalabs.com/quantiphi-sagemaker-marketplace-examples/vehicle-license-plate-recognition/tree/master/data/output/batch
AR6 Additional Resources Sample inference output for real-time invocation for the input sample provided (text or links on Github). Mandatory https://gitlab.qdatalabs.com/quantiphi-sagemaker-marketplace-examples/vehicle-license-plate-recognition/tree/master/data/output/batch
AR7 Additional Resources Sample inference output for batch invocation corresponding to the batch input samples (text or links on Github). Mandatory https://gitlab.qdatalabs.com/quantiphi-sagemaker-marketplace-examples/vehicle-license-plate-recognition/tree/master/data/output/batch