You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does anybody know if loading quantized model is possible with sentence_transformers? I am currently looking at embedding models, and some of them like Qwen 7B seem to be SentenceTransformer models: https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct
However, SentenceTransformer loading code only accepts the model name, and doesn't expose the underlying transformers loading logic. For this reason, I can't find how to do the quantization (for example using quanto) before loading: https://huggingface.co/docs/transformers/en/quantization
I can't see why this wouldn't be compatible, since SentenceTransformers uses Transformers anyway under the hood.
Any hints appreciated!
The text was updated successfully, but these errors were encountered:
We're currently investigating the best approach to add this support in #2578. In particular, that PR will expose some parameters (model_kwargs, config_kwargs, tokenizer_kwargs) to the SentenceTransformer class to allow e.g. easy quantization. Until then, the best solution is to load the Transformer class separately and use the existing model_args. However, I'm not 100% sure if that'll work, as you also end up passing the quantization_config to the AutoConfig.
Dear maintainers,
Does anybody know if loading quantized model is possible with
sentence_transformers
? I am currently looking at embedding models, and some of them like Qwen 7B seem to beSentenceTransformer
models: https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instructHowever,
SentenceTransformer
loading code only accepts the model name, and doesn't expose the underlying transformers loading logic. For this reason, I can't find how to do the quantization (for example usingquanto
) before loading: https://huggingface.co/docs/transformers/en/quantizationI can't see why this wouldn't be compatible, since SentenceTransformers uses Transformers anyway under the hood.
Any hints appreciated!
The text was updated successfully, but these errors were encountered: