Skip to content

Releases: microsoft/Olive

Olive-ai 0.6.0

15 May 11:13
Compare
Choose a tag to compare

Examples

The following examples are added:

Olive CLI updates

  • Previous commands python -m olive.workflows.run and python -m olive.platform_sdk.qualcomm.configure are deprecated. Use olive run or python -m olive instead. #1129

Passes (optimization techniques)

  • Pytorch
    • AutoAWQQuantizer Enable AutoAwq in Olive and provides the capbility for onnx conversion #1080
    • SliceGPT: Add support for generic data sets to SliceGPT pass #1145
  • ONNXRuntime
    • ExtractAdapters pass supports int4 quantized models and expose the external data config options to users. #1083
    • ModelBuilder: Converts a Huggingface/AML generative PyTorch model to ONNX model using the ONNX Runtime Generative AI >= 0.2.0. #1089 #1073 #1110 #1112 #1118 #1130 #1131 #1141 #1146 #1147 #1154
    • OnnxFloatToFloat16: Use ort float16 converter #1132
    • NVModelOptQuantization Quantize ONNX model with Nvidia-ModelOpt. #1135
    • OnnxIOFloat16ToFloat32: Converts float16 model inputs/outputs to float32. #1149
    • [Vitis AI] Make Vitis AI techniques compatible with ORT 1.18 #1140

Data Config

  • Remove name ambiguity in dataset configuration #1111
  • Remove HfConfig::dataset references in examples and tests #1113

Engine

  • Add aml deployment packaging. #1090

System

  • Make the accelerator EP optional in olive systems for non-onnx pass. #1072

Data

  • Add AML resource support for data configs.
  • Add audio classification data preprocess function.

Model

  • Provide build-in kv_cache_config for generative model's io_config #1121
  • MLFlow transfrormers models to huggingface format which can be consumed by the passes which need huggingface format. #1150

Metrics

Dependencies:

Support onnxruntime 1.17.3

Issues

  1. Fix code scanning issues. #1078 #1081 #1084 #1085 #1091 #1094 #1103 #1104 #1107 #1126 #1124 #1128

Olive-ai 0.5.2

11 Apr 05:56
Compare
Choose a tag to compare

Examples

The following examples are added

  • Phi2 SliceGPT example #1052
  • Phi2 Genai example. #1061
  • Llama ExtractAdapters example. #1064

Passes (optimization techniques)

  • SliceGPT: SliceGPT is post-training sparsification scheme that makes transformer networks smaller by applying orthogonal transformations to each transformer layer that reduces the model size by slicing off the least-significant rows and columns of the weight matrices. This results in speedups and a reduced memory footprint.
  • ExtractAdapters: Extracts the lora adapters (float or static quantized) weights and saves them in a separate file.

Engine

  • Simplify the engine config

Fix

  • GenAIModelExporter: In windows, the cache_dir of genai model exporter will exceed 260.

Olive-ai 0.5.1

07 Apr 08:18
Compare
Choose a tag to compare

Examples

The following examples are added

  • Mistral FP16. #980
  • Phi2 Fine tuning example. #1030

Passes (optimization techniques)

  • QNNPreprocess: Add the configs which are added in onnxruntime nightly package.
  • GptqQuantizer: PTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.
  • OnnxMatMul4Quantizer: Add matmul RTN/HQQ/GPTQ quant configs.
  • Move all pass need create inference session to run on target:
    • IncQuantization
    • OptimumMerging
    • OrtTransformersOptimization
    • VitisAIQuantization
    • OrtPerfTuning

Engine

  • Support to pack AzureML output.
  • Remove execution_providers from engine config, typical config looks like:
"systems": {
    "local_system": {
        "type": "LocalSystem",
        "config": {
            "accelerators": [
                {
                    "device": "gpu",
                    "execution_providers": [
                        "CUDAExecutionProvider"
                    ]
                }
            ]
        }
    }
},
"engine": {
      "host": "local_system",
      "target": "local_system",
}

Workflows

  • Delayed python pass module loading and provide the option --package-config to let advanced users to write their individual pass module and corresponding dependencies.

Fix

  • Cannot load MLFlow model as from_pretrained_args is missed.
  • LoRA: Provide save_embedding_layers=False to saving the peft model. Otherwise, it defaults to "auto" which checks if the vocab size changed.
  • Update the model_rank file for zipfile packaging type. The model path now is the path relative to the output zip file.
  • Fix windows shutil.which return None when passing full python path.

Olive-ai 0.5.0

07 Mar 23:48
Compare
Choose a tag to compare

Examples

The following examples are added:

Passes (optimization techniques)

New Passes

  • PyTorch
    • Introduce GenAIModelExporter pass to export a PyTorch model using GenAI exporter.
    • Introduce LoftQ pass which performs model fine-tuning using the LoftQ initialization proposed in https://arxiv.org/abs/2310.08659.
  • ONNXRuntime
    • Introduce DynamicToFixedShape pass to convert dynamic shape to fixed shape for ONNX model.
    • Introduce OnnxOpVersionConversion pass to convert an existing ONNX model with another target opset.
    • [QNN-EP] Add the option of prepare_qnn_config:bool for quantization under QNN-EP where the int16/uint16 are supported both for weights and activation.
    • [QNN-EP] Introduce QNNPreprocess pass to preprocess the model before quantization.
  • QNN
    • Introduce QNNConversion pass to convert models to QNN C++ model.
    • Introduce QNNContextBinaryGenerator pass to generate the context binary from a compiled model library using a specific backend.
    • Introduce QNNModelLibGenerator pass to compile the C++ model into a model library for the desired target.

Updates

  • OnnxConversion
    • Support both past_key_values.index.key/value and past_key_value.index.
  • OptimumConversion
    • Provide parameter components if the user wants to export only some models such as decoder_model and decoder_with_past_model.
    • Uses the default exporter args and behavior of the underlying optimum version. For versions 1.14.0+, this means legacy=False and no_post_process=False. User must provide them using extra_args if legacy behavior is desired.
  • OpenVINO
    • Upgrade OpenVINO API to 2023.2.0.
  • OrtPerTuning
    • Add tunable_op_enable and tunable_op_tuning_enable for ROCM ep to speed up the performance.
  • LoRA/QLoRA
    • Support bfloat16 with ort-training.
    • Support resuming training from checkpoint by
      • resume_from_checkpoint option.
      • overwrite_output_dir option.
  • MoEExpertsDistributor
    • Add option to configure number of parallel jobs.

Engine

  • As for Zipfile packaging, add models rank json file. This file ranks all output models from different EPs. This json file includes model_config and metrics.
  • Add Auto Optimizer which is a tool that can be used to automatically search Olive passes combination.

System

  • Add hf_token support for Olive systems.
  • AzureMLSystem
    • Olive config file will be uploaded to AML jobs under codes folder.
    • Support adding tags to the AML jobs.
    • Support using existing AML workspace Environment for AzureMLSystem.
  • DockerSystem
    • Support running Olive Pass.
  • PythonEnvironmentSystem requires Olive to be installed in the environment. It can run passes and evaluate models.
  • New IsolatedORTSystem introduced that only supports evaluation of ONNX models. It requires onnxruntime to be installed in the environment. Can be used to for packages like onnxruntime-qnn which can only be run on Windows ARM64 python environment.

Data

  • Add AML resource support for data configs.
  • Add audio classification data preprocess function.

Model

  • Rename model_loading_args to from_pretrained_args in hf_config.

Metrics

  • Add throughput metric support.

Dependencies:

Support onnxruntime 1.17.1.

Olive-ai 0.4.0

14 Nov 03:05
Compare
Choose a tag to compare

Examples

The following examples are added

Passes (optimization techniques)

  • OrtPerTuning
    • Raises known failure exceptions to immediately stop tuning.
    • Default values for device and providers_list is based on the accelerator spec.
  • OrtTransformersOptimization
    • Checks that model_type is provided in the pass configs or available in the model attributes. None is invalid.
    • fp16 related arguments are better documented.
  • Introduce LoRA pass for finetuning pytorch models with Low-Rank Adaptation
  • Introduce OnnxMatMul4Quantizer pass to quantize onnx models to 4-bit integers.
  • Introduce OnnxBnb4Quantization pass to quantize onnx models to 4-bit data types from bitsandbytes (FP4, NF4).
  • Onnx external data configuration supports size_threshold and convert_attribute parameters.
  • LlamaPyTorchTensorParallel pass to split Llama model into a tensor parallel distributed pytorch model.
  • OnnxConversion
    • Support DistributedPyTorchModel.
    • use_device and torch_dtype options to specify device ("cpu", "cuda") and data type ("float16", "float32") for the model before conversion.
  • DeviceSpecificOnnxConversion removed in favor or OnnxConversion pass with use_device option.
  • LoRA/QLoRA
    • Support training using ONNX Runtime Training.
    • Mixed-precision training when torch_dtype=float16 for numerical stability.

Engine

  • Make engine/evaluator config optional in olive run config. With this default way, user can just run optimization without search and evaluation in simplest pass config.
  • evaluate_input_model is optional in engine config in no-search model. It is forced to False when no evaluator is provided.
  • ort_py_log_severity_level option to control logging level for onnxruntime python logs.
  • CLI option --tempdir to use a custom directory as the root directory for tempfile.
  • IO-Binding:
    • New method to efficiently bind inputs and outputs to the session using either the CPU or GPU depending on the device.
    • shared_kv_buffer option to enable key value buffer sharing between input (past key values) and output (present key values)

Model

  • DistributedOnnxModel file structure updated to use resource paths. Can be saved from cache to destination directory.
  • Introduce DistributedPyTorchModel that is analogous to DistributedOnnxModel for pytorch model.
  • trust_remote_code added to HFConfig model_loading_args.

Metrics

  • Option to provide kwargs to user_script functions through func_kwargs

Dependencies:

  • Support onnxruntime 1.16.2

Olive-ai 0.3.3

18 Oct 12:50
Compare
Choose a tag to compare

Quick fix for v0.3.2

  • Vitis AI quantization support ORT 1.16.1
  • Add optional attention mask for text-generation task

Olive-ai 0.3.2

18 Oct 10:29
Compare
Choose a tag to compare

Examples

The following examples are added

Passes (optimization techniques)

  • QLoRA pass for torch model fine-tuning
  • Intel® Neural Compressor 4-bits weight-only quantization
  • OnnxModelOptimizer
    • inserts a Cast operation for cases where ArgMax input isn't supported on the device
    • Fuse consecutive Reshape operations when the latter results in flattening

Engine

  • Summarize pass run history in table(install tabulate for better preview)
  • Support to tune and evaluate models across different execution providers which are managed by Olive-ai.

Model

  • Add model_loading_args, load_model and load_model_config to HFConfig.
  • Add adapter_path to PyTorchModel
  • Introduce model_attributes which can be used to simplify user's input for transformer_optimization
  • Add AML curated model support

Dataset

  • Auto-insertion of the input model (if it's a pytorch model with hf_config.dataset) data config in pass configs is removed. Use “input_model_data_config” if user want to use the input model's data config.
  • Support a second type of dataset for text-generation tasks called pair
  • Support convert olive dataset to huggingface datasets.Dataset

Known Issues

  • #571 Whisper gpu does not consume gpu resources
  • #573 Distinguish pass instance with name not cls name

Dependencies:

  • Support onnxruntime 1.16.1
  • Drop python 3.7. Now you should ensure python >=3.8 to run Olive-ai optimization.

Olive-ai 0.3.1

21 Aug 07:55
Compare
Choose a tag to compare

Examples

The following examples are added

Passes (optimization techniques)

  • Introduce TorchTRTConversion
  • Introduce SparseGPT pass for one-shot model pruning on large GPT like models using the algorithm proposed in https://arxiv.org/abs/2301.00774.

Systems

  • Add AzureML sku support for AMLSystem

Evaluator

  • Add metric_func config to custom metric. Olive will run the inference for custom eval func for user. User doesn't need to do inference by themselves.
  • Add RawDataContainer:
    SNPE evaluation and quantization now accept generic dataloaders such as torch dataloader

Metrics

  • Add Perplexity metric for text-generation task

Engine

  • Provide the interface to let user set the multi pass flows to run in save olive workflow

Olive-ai 0.2.1

23 May 07:14
Compare
Choose a tag to compare

Examples

The following examples are added

General

  • Enable hardware accelerator for Olive. It introduced new config accelerators in systems, for example, CPU, GPU etc. and execution_providers in engine, for example CPUExecutionProvider, CUDAExecutionProvider etc.

Evaluator

  • Support for evaluating distributed ONNX models

Metrics

  • Extend metrics' sub_type to accept list input to gather the results in one evaluation job if possible, and add sub_type_for_rank to sort/search strategy and etc.

Olive-ai 0.2.0

17 May 12:15
fb639ba
Compare
Choose a tag to compare

Examples

The following examples are added

General

  • Simplify data load experience by adding transformers data config support. For transformer models, user can use hf_config.dataset to leverage the online huggingface datasets.
  • Ease the process of setting up environment: user can run olive.workflows.run --config config.json --setup to install necessary packages required by passes.

Passes (optimization techniques)

  • Integrate Intel® Neural Compressor into Olive: introduce new passes IncStaticQuantization, IncDynamicQuantization, and IncQuantization.
  • Integrate Vitis-AI into Olive: intriduce new pass VitisAIQuantization.
  • Introduce OnnxFloatToFloat16: converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16.
  • Introduce OrtMixedPrecision: converts model to mixed precision to retain a certain level of accuracy.
  • introduce AppendPrePostProcessingOps: adds Pre/Post nodes to the input model.
  • introduce InsertBeamSearch: chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
  • Support external data for all ONNX passes.
  • Enable transformer optimization fusion options in workflow file.
  • Expose extra_options in ONNX quantization passes.

Models

  • Introduce DistributedOnnxModel to support distributed inferencing
  • Introduce CompositeOnnxModel to represent models with encoder and decoder subcomponents as individual OnnxModels.
  • Add io_config to PytorchModel, including input_names, input_shapes, output_names and dynamic_axes
  • Add MLFlow model loader

Systems

  • Introduce PythonEnvironmentSystem: a python environment on the host machine. This system allows user to evaluate models using onnxruntime or pacakges installed in a different python environment. 

Evaluator

  • Remove target from the evaluator config.
  • Introduce dummy dataloader for latency evaluation.

Metrics

  • Introduce priority_rank: User needs to specify "priority_rank": rank_num for the metrics if you have multiple metrics. Olive will use the priority_ranks of the metrics to determine the best model.

Engine

  • Introduce Olive Footprint: generate report json files, including footprints.json and Pareto frontier footprints, and dump frontier to html/image.
  • Introduce Packaing Olive artifacts: pakcages CandidateModels, SampleCode and ONNXRuntimePackages in the output_dir folder if it is configured from Engine Configuration.
  • Introduce log_severity_level.