Skip to content

Olive-ai 0.4.0

Compare
Choose a tag to compare
@jambayk jambayk released this 14 Nov 03:05

Examples

The following examples are added

Passes (optimization techniques)

  • OrtPerTuning
    • Raises known failure exceptions to immediately stop tuning.
    • Default values for device and providers_list is based on the accelerator spec.
  • OrtTransformersOptimization
    • Checks that model_type is provided in the pass configs or available in the model attributes. None is invalid.
    • fp16 related arguments are better documented.
  • Introduce LoRA pass for finetuning pytorch models with Low-Rank Adaptation
  • Introduce OnnxMatMul4Quantizer pass to quantize onnx models to 4-bit integers.
  • Introduce OnnxBnb4Quantization pass to quantize onnx models to 4-bit data types from bitsandbytes (FP4, NF4).
  • Onnx external data configuration supports size_threshold and convert_attribute parameters.
  • LlamaPyTorchTensorParallel pass to split Llama model into a tensor parallel distributed pytorch model.
  • OnnxConversion
    • Support DistributedPyTorchModel.
    • use_device and torch_dtype options to specify device ("cpu", "cuda") and data type ("float16", "float32") for the model before conversion.
  • DeviceSpecificOnnxConversion removed in favor or OnnxConversion pass with use_device option.
  • LoRA/QLoRA
    • Support training using ONNX Runtime Training.
    • Mixed-precision training when torch_dtype=float16 for numerical stability.

Engine

  • Make engine/evaluator config optional in olive run config. With this default way, user can just run optimization without search and evaluation in simplest pass config.
  • evaluate_input_model is optional in engine config in no-search model. It is forced to False when no evaluator is provided.
  • ort_py_log_severity_level option to control logging level for onnxruntime python logs.
  • CLI option --tempdir to use a custom directory as the root directory for tempfile.
  • IO-Binding:
    • New method to efficiently bind inputs and outputs to the session using either the CPU or GPU depending on the device.
    • shared_kv_buffer option to enable key value buffer sharing between input (past key values) and output (present key values)

Model

  • DistributedOnnxModel file structure updated to use resource paths. Can be saved from cache to destination directory.
  • Introduce DistributedPyTorchModel that is analogous to DistributedOnnxModel for pytorch model.
  • trust_remote_code added to HFConfig model_loading_args.

Metrics

  • Option to provide kwargs to user_script functions through func_kwargs

Dependencies:

  • Support onnxruntime 1.16.2