Olive-ai 0.2.0
Examples
The following examples are added
- ResNet Optimization with Vitis-AI Quantization for CPU
- SqueezeNet Optimization with DirectML for GPU
- Stable Diffusion Optimization with DirectML for GPU
- MobileNet Optimization with QDQ Quantization for Qualcomm NPU
- Whisper Optimization for CPU
- BERT Optimization with Intel® Neural Compressor PTQ for CPU
General
- Simplify data load experience by adding transformers data config support. For transformer models, user can use hf_config.dataset to leverage the online huggingface datasets.
- Ease the process of setting up environment: user can run olive.workflows.run --config config.json --setup to install necessary packages required by passes.
Passes (optimization techniques)
- Integrate Intel® Neural Compressor into Olive: introduce new passes IncStaticQuantization, IncDynamicQuantization, and IncQuantization.
- Integrate Vitis-AI into Olive: intriduce new pass VitisAIQuantization.
- Introduce OnnxFloatToFloat16: converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16.
- Introduce OrtMixedPrecision: converts model to mixed precision to retain a certain level of accuracy.
- introduce AppendPrePostProcessingOps: adds Pre/Post nodes to the input model.
- introduce InsertBeamSearch: chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
- Support external data for all ONNX passes.
- Enable transformer optimization fusion options in workflow file.
- Expose extra_options in ONNX quantization passes.
Models
- Introduce DistributedOnnxModel to support distributed inferencing
- Introduce CompositeOnnxModel to represent models with encoder and decoder subcomponents as individual OnnxModels.
- Add io_config to PytorchModel, including input_names, input_shapes, output_names and dynamic_axes
- Add MLFlow model loader
Systems
- Introduce PythonEnvironmentSystem: a python environment on the host machine. This system allows user to evaluate models using onnxruntime or pacakges installed in a different python environment.
Evaluator
- Remove target from the evaluator config.
- Introduce dummy dataloader for latency evaluation.
Metrics
- Introduce priority_rank: User needs to specify "priority_rank": rank_num for the metrics if you have multiple metrics. Olive will use the priority_ranks of the metrics to determine the best model.
Engine
- Introduce Olive Footprint: generate report json files, including footprints.json and Pareto frontier footprints, and dump frontier to html/image.
- Introduce Packaing Olive artifacts: pakcages CandidateModels, SampleCode and ONNXRuntimePackages in the output_dir folder if it is configured from Engine Configuration.
- Introduce log_severity_level.