Skip to content

Optimize SentenceTransformers models with Optimum for faster inference using model.encode

Notifications You must be signed in to change notification settings

sidhantls/optim-sentence-transformers

Repository files navigation

Optimized Sentence Transformers

This package simplifies SentenceTransformer model optimization using onnx/optimum and maintains the easy inference with SentenceTransformer's model.encode. Model optimization can lead up to 40% lower inference latency on CPU.

If your production code uses SentenceTransformer's model.encode, this package enables easy integration of optimized models with minimal code changes.

Installation

Requires Python 3.8+

Install with pip

pip install optim_sentence_transformers

Install from source

git clone github.com/sidhantls/optim-sentence-transformers;
cd optim-sentence-transformers;
pip install -e .;

Performance

Usage

Supported optimizations: "onnx" and "graph_optim" (graph optimization)

from sentence_transformers import SentenceTransformer
from optim_sentence_transformers import SentenceTransformerOptim, optimize_model

model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')

# train if required and save
model.save('trained_model') 

model_name_or_path = 'trained_model'
# model_name_or_path = 'sentence-transformers/all-distilroberta-v1' # to optimize default model 

# optimize model
save_dir = 'onnx'
optimize_model(model_name_or_path = model_name_or_path,
             pooling_model=None,
             save_dir=save_dir,
             optimize_mode='onnx'                                 
             )
             
# load optimized model 
optim_model = SentenceTransformerOptim(save_dir)
optim_model.encode(['text'], normalize_embeddings=True)

In some cases model.encode in sentence-transformers will always return normalized vectors due to normalization layer during init. Here, if vectors are required to be normalized, set normalize_embeddings=True.

Contributions

Contributions are welcome

References

About

Optimize SentenceTransformers models with Optimum for faster inference using model.encode

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published