15 May 11:13

trajepl

868d12d

Olive-ai 0.6.0 Latest

Latest

Examples

The following examples are added:

Add LLM sample for DirectML #1082 #1106
- This adds an LLM sample for DirectML that can convert and quantize a bunch of LLMs from HuggingFace. The Dolly, Phi and LLaMA 2 folders were removed and replaced with a more generic LLM example that supports a large number of LLMs, including but not limited to Phi-2, Mistral, LLaMA 2
- Add Gemma to DML LLM sample #1138
Llama2 optimization with multi-ep managed env #1087
Llama2: Multi-lora example notebook, Custom generator #1114
Search Optimal optimization among multiple EPs #1092

Olive CLI updates

Previous commands python -m olive.workflows.run and python -m olive.platform_sdk.qualcomm.configure are deprecated. Use olive run or python -m olive instead. #1129

Passes (optimization techniques)

Pytorch
- AutoAWQQuantizer Enable AutoAwq in Olive and provides the capbility for onnx conversion #1080
- SliceGPT: Add support for generic data sets to SliceGPT pass #1145
ONNXRuntime
- ExtractAdapters pass supports int4 quantized models and expose the external data config options to users. #1083
- ModelBuilder: Converts a Huggingface/AML generative PyTorch model to ONNX model using the ONNX Runtime Generative AI >= 0.2.0. #1089 #1073 #1110 #1112 #1118 #1130 #1131 #1141 #1146 #1147 #1154
- OnnxFloatToFloat16: Use ort float16 converter #1132
- NVModelOptQuantization Quantize ONNX model with Nvidia-ModelOpt. #1135
- OnnxIOFloat16ToFloat32: Converts float16 model inputs/outputs to float32. #1149
- [Vitis AI] Make Vitis AI techniques compatible with ORT 1.18 #1140

Data Config

Remove name ambiguity in dataset configuration #1111
Remove HfConfig::dataset references in examples and tests #1113

Engine

Add aml deployment packaging. #1090

System

Make the accelerator EP optional in olive systems for non-onnx pass. #1072

Data

Add AML resource support for data configs.
Add audio classification data preprocess function.

Model

Provide build-in kv_cache_config for generative model's io_config #1121
MLFlow transfrormers models to huggingface format which can be consumed by the passes which need huggingface format. #1150

Metrics

Dependencies:

Support onnxruntime 1.17.3

Issues

Fix code scanning issues. #1078 #1081 #1084 #1085 #1091 #1094 #1103 #1104 #1107 #1126 #1124 #1128

Assets 3

11 Apr 05:56

trajepl

v0.5.2

1d268da

Olive-ai 0.5.2

Examples

The following examples are added

Phi2 SliceGPT example #1052
Phi2 Genai example. #1061
Llama ExtractAdapters example. #1064

Passes (optimization techniques)

SliceGPT: SliceGPT is post-training sparsification scheme that makes transformer networks smaller by applying orthogonal transformations to each transformer layer that reduces the model size by slicing off the least-significant rows and columns of the weight matrices. This results in speedups and a reduced memory footprint.
ExtractAdapters: Extracts the lora adapters (float or static quantized) weights and saves them in a separate file.

Engine

Simplify the engine config

Fix

GenAIModelExporter: In windows, the cache_dir of genai model exporter will exceed 260.

Assets 3

07 Apr 08:18

trajepl

v0.5.1

c66eae8

Olive-ai 0.5.1

Examples

The following examples are added

Mistral FP16. #980
Phi2 Fine tuning example. #1030

Passes (optimization techniques)

QNNPreprocess: Add the configs which are added in onnxruntime nightly package.
GptqQuantizer: PTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.
OnnxMatMul4Quantizer: Add matmul RTN/HQQ/GPTQ quant configs.
Move all pass need create inference session to run on target:
- IncQuantization
- OptimumMerging
- OrtTransformersOptimization
- VitisAIQuantization
- OrtPerfTuning

Engine

Support to pack AzureML output.
Remove execution_providers from engine config, typical config looks like:

"systems": {
    "local_system": {
        "type": "LocalSystem",
        "config": {
            "accelerators": [
                {
                    "device": "gpu",
                    "execution_providers": [
                        "CUDAExecutionProvider"
                    ]
                }
            ]
        }
    }
},
"engine": {
      "host": "local_system",
      "target": "local_system",
}

Workflows

Delayed python pass module loading and provide the option --package-config to let advanced users to write their individual pass module and corresponding dependencies.

Fix

Cannot load MLFlow model as from_pretrained_args is missed.
LoRA: Provide save_embedding_layers=False to saving the peft model. Otherwise, it defaults to "auto" which checks if the vocab size changed.
Update the model_rank file for zipfile packaging type. The model path now is the path relative to the output zip file.
Fix windows shutil.which return None when passing full python path.

Assets 3

07 Mar 23:48

xiaoyu-work

v0.5.0

04435a3

Olive-ai 0.5.0

Examples

The following examples are added:

Passes (optimization techniques)

New Passes

PyTorch
- Introduce GenAIModelExporter pass to export a PyTorch model using GenAI exporter.
- Introduce LoftQ pass which performs model fine-tuning using the LoftQ initialization proposed in https://arxiv.org/abs/2310.08659.
ONNXRuntime
- Introduce DynamicToFixedShape pass to convert dynamic shape to fixed shape for ONNX model.
- Introduce OnnxOpVersionConversion pass to convert an existing ONNX model with another target opset.
- [QNN-EP] Add the option of prepare_qnn_config:bool for quantization under QNN-EP where the int16/uint16 are supported both for weights and activation.
- [QNN-EP] Introduce QNNPreprocess pass to preprocess the model before quantization.
QNN
- Introduce QNNConversion pass to convert models to QNN C++ model.
- Introduce QNNContextBinaryGenerator pass to generate the context binary from a compiled model library using a specific backend.
- Introduce QNNModelLibGenerator pass to compile the C++ model into a model library for the desired target.

Updates

OnnxConversion
- Support both past_key_values.index.key/value and past_key_value.index.
OptimumConversion
- Provide parameter components if the user wants to export only some models such as decoder_model and decoder_with_past_model.
- Uses the default exporter args and behavior of the underlying optimum version. For versions 1.14.0+, this means legacy=False and no_post_process=False. User must provide them using extra_args if legacy behavior is desired.
OpenVINO
- Upgrade OpenVINO API to 2023.2.0.
OrtPerTuning
- Add tunable_op_enable and tunable_op_tuning_enable for ROCM ep to speed up the performance.
LoRA/QLoRA
- Support bfloat16 with ort-training.
- Support resuming training from checkpoint by
  - resume_from_checkpoint option.
  - overwrite_output_dir option.
MoEExpertsDistributor
- Add option to configure number of parallel jobs.

Engine

As for Zipfile packaging, add models rank json file. This file ranks all output models from different EPs. This json file includes model_config and metrics.
Add Auto Optimizer which is a tool that can be used to automatically search Olive passes combination.

System

Add hf_token support for Olive systems.
AzureMLSystem
- Olive config file will be uploaded to AML jobs under codes folder.
- Support adding tags to the AML jobs.
- Support using existing AML workspace Environment for AzureMLSystem.
DockerSystem
- Support running Olive Pass.
PythonEnvironmentSystem requires Olive to be installed in the environment. It can run passes and evaluate models.
New IsolatedORTSystem introduced that only supports evaluation of ONNX models. It requires onnxruntime to be installed in the environment. Can be used to for packages like onnxruntime-qnn which can only be run on Windows ARM64 python environment.

Data

Add AML resource support for data configs.
Add audio classification data preprocess function.

Model

Rename model_loading_args to from_pretrained_args in hf_config.

Metrics

Add throughput metric support.

Dependencies:

Support onnxruntime 1.17.1.

Assets 3

14 Nov 03:05

jambayk

v0.4.0

60ee322

Olive-ai 0.4.0

Examples

The following examples are added

Passes (optimization techniques)

OrtPerTuning
- Raises known failure exceptions to immediately stop tuning.
- Default values for device and providers_list is based on the accelerator spec.
OrtTransformersOptimization
- Checks that model_type is provided in the pass configs or available in the model attributes. None is invalid.
- fp16 related arguments are better documented.
Introduce LoRA pass for finetuning pytorch models with Low-Rank Adaptation
Introduce OnnxMatMul4Quantizer pass to quantize onnx models to 4-bit integers.
Introduce OnnxBnb4Quantization pass to quantize onnx models to 4-bit data types from bitsandbytes (FP4, NF4).
Onnx external data configuration supports size_threshold and convert_attribute parameters.
LlamaPyTorchTensorParallel pass to split Llama model into a tensor parallel distributed pytorch model.
OnnxConversion
- Support DistributedPyTorchModel.
- use_device and torch_dtype options to specify device ("cpu", "cuda") and data type ("float16", "float32") for the model before conversion.
DeviceSpecificOnnxConversion removed in favor or OnnxConversion pass with use_device option.
LoRA/QLoRA
- Support training using ONNX Runtime Training.
- Mixed-precision training when torch_dtype=float16 for numerical stability.

Engine

Make engine/evaluator config optional in olive run config. With this default way, user can just run optimization without search and evaluation in simplest pass config.
evaluate_input_model is optional in engine config in no-search model. It is forced to False when no evaluator is provided.
ort_py_log_severity_level option to control logging level for onnxruntime python logs.
CLI option --tempdir to use a custom directory as the root directory for tempfile.
IO-Binding:
- New method to efficiently bind inputs and outputs to the session using either the CPU or GPU depending on the device.
- shared_kv_buffer option to enable key value buffer sharing between input (past key values) and output (present key values)

Model

DistributedOnnxModel file structure updated to use resource paths. Can be saved from cache to destination directory.
Introduce DistributedPyTorchModel that is analogous to DistributedOnnxModel for pytorch model.
trust_remote_code added to HFConfig model_loading_args.

Metrics

Option to provide kwargs to user_script functions through func_kwargs

Dependencies:

Support onnxruntime 1.16.2

Assets 3

18 Oct 12:50

trajepl

v0.3.3

65b15a0

Olive-ai 0.3.3

Quick fix for v0.3.2

Vitis AI quantization support ORT 1.16.1
Add optional attention mask for text-generation task

Assets 3

18 Oct 10:29

trajepl

v0.3.2

cd04b1e

Olive-ai 0.3.2

Examples

The following examples are added

Passes (optimization techniques)

QLoRA pass for torch model fine-tuning
Intel® Neural Compressor 4-bits weight-only quantization
OnnxModelOptimizer
- inserts a Cast operation for cases where ArgMax input isn't supported on the device
- Fuse consecutive Reshape operations when the latter results in flattening

Engine

Summarize pass run history in table(install tabulate for better preview)
Support to tune and evaluate models across different execution providers which are managed by Olive-ai.

Model

Add model_loading_args, load_model and load_model_config to HFConfig.
Add adapter_path to PyTorchModel
Introduce model_attributes which can be used to simplify user's input for transformer_optimization
Add AML curated model support

Dataset

Auto-insertion of the input model (if it's a pytorch model with hf_config.dataset) data config in pass configs is removed. Use “input_model_data_config” if user want to use the input model's data config.
Support a second type of dataset for text-generation tasks called pair
Support convert olive dataset to huggingface datasets.Dataset

Known Issues

#571 Whisper gpu does not consume gpu resources
#573 Distinguish pass instance with name not cls name

Dependencies:

Support onnxruntime 1.16.1
Drop python 3.7. Now you should ensure python >=3.8 to run Olive-ai optimization.

Assets 3

21 Aug 07:55

leqiao-1

v0.3.1

15a6a97

Olive-ai 0.3.1

Examples

The following examples are added

Passes (optimization techniques)

Introduce TorchTRTConversion
Introduce SparseGPT pass for one-shot model pruning on large GPT like models using the algorithm proposed in https://arxiv.org/abs/2301.00774.

Systems

Add AzureML sku support for AMLSystem

Evaluator

Add metric_func config to custom metric. Olive will run the inference for custom eval func for user. User doesn't need to do inference by themselves.
Add RawDataContainer:
SNPE evaluation and quantization now accept generic dataloaders such as torch dataloader

Metrics

Add Perplexity metric for text-generation task

Engine

Provide the interface to let user set the multi pass flows to run in save olive workflow

Assets 3

23 May 07:14

leqiao-1

v0.2.1

25ff208

Olive-ai 0.2.1

Examples

The following examples are added

Dolly V2 optimization with DirectML

General

Enable hardware accelerator for Olive. It introduced new config accelerators in systems, for example, CPU, GPU etc. and execution_providers in engine, for example CPUExecutionProvider, CUDAExecutionProvider etc.

Evaluator

Support for evaluating distributed ONNX models

Metrics

Extend metrics' sub_type to accept list input to gather the results in one evaluation job if possible, and add sub_type_for_rank to sort/search strategy and etc.

Assets 3

17 May 12:15

leqiao-1

v0.2.0

fb639ba

Olive-ai 0.2.0

Examples

The following examples are added

General

Simplify data load experience by adding transformers data config support. For transformer models, user can use hf_config.dataset to leverage the online huggingface datasets.
Ease the process of setting up environment: user can run olive.workflows.run --config config.json --setup to install necessary packages required by passes.

Passes (optimization techniques)

Integrate Intel® Neural Compressor into Olive: introduce new passes IncStaticQuantization, IncDynamicQuantization, and IncQuantization.
Integrate Vitis-AI into Olive: intriduce new pass VitisAIQuantization.
Introduce OnnxFloatToFloat16: converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16.
Introduce OrtMixedPrecision: converts model to mixed precision to retain a certain level of accuracy.
introduce AppendPrePostProcessingOps: adds Pre/Post nodes to the input model.
introduce InsertBeamSearch: chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
Support external data for all ONNX passes.
Enable transformer optimization fusion options in workflow file.
Expose extra_options in ONNX quantization passes.

Models

Introduce DistributedOnnxModel to support distributed inferencing
Introduce CompositeOnnxModel to represent models with encoder and decoder subcomponents as individual OnnxModels.
Add io_config to PytorchModel, including input_names, input_shapes, output_names and dynamic_axes
Add MLFlow model loader

Systems

Introduce PythonEnvironmentSystem: a python environment on the host machine. This system allows user to evaluate models using onnxruntime or pacakges installed in a different python environment.

Evaluator

Remove target from the evaluator config.
Introduce dummy dataloader for latency evaluation.

Metrics

Introduce priority_rank: User needs to specify "priority_rank": rank_num for the metrics if you have multiple metrics. Olive will use the priority_ranks of the metrics to determine the best model.

Engine

Introduce Olive Footprint: generate report json files, including footprints.json and Pareto frontier footprints, and dump frontier to html/image.
Introduce Packaing Olive artifacts: pakcages CandidateModels, SampleCode and ONNXRuntimePackages in the output_dir folder if it is configured from Engine Configuration.
Introduce log_severity_level.

Assets 3

Releases: microsoft/Olive

Olive-ai 0.6.0

Examples

Olive CLI updates

Passes (optimization techniques)

Data Config

Engine

System

Data

Model

Metrics

Dependencies:

Issues

Olive-ai 0.5.2

Examples

Passes (optimization techniques)

Engine

Fix

Olive-ai 0.5.1

Examples

Passes (optimization techniques)

Engine

Workflows

Fix

Olive-ai 0.5.0

Examples

Passes (optimization techniques)

New Passes

Updates

Engine

System

Data

Model

Metrics

Dependencies:

Olive-ai 0.4.0

Examples

Passes (optimization techniques)

Engine

Model

Metrics

Dependencies:

Olive-ai 0.3.3

Quick fix for v0.3.2

Olive-ai 0.3.2

Examples

Passes (optimization techniques)

Engine

Model

Dataset

Known Issues

Dependencies:

Olive-ai 0.3.1

Examples

Passes (optimization techniques)

Systems

Evaluator

Metrics

Engine

Olive-ai 0.2.1

Examples

General

Evaluator

Metrics

Olive-ai 0.2.0

Examples

General

Passes (optimization techniques)

Models

Systems

Evaluator

Metrics

Engine