Slow Embeddings With Ollama #21870
Labels
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
🔌: chroma
Primarily related to ChromaDB integrations
Ɑ: embeddings
Related to text embedding models module
Ɑ: text splitters
Related to text splitters package
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
n/a
Description
Calls to Ollama embeddings API are very slow (1000 to 2000ms) . GPU utilization is very low. Utilization spikes 30% - 100% once every second or two. This happens if I run main() or testOllamaSpeed() In the example code. This would suggest the problem is with Ollama. But If I run the following code which does not use any langchain imports each call completes in 200-300ms and GPU utilization hovers at a consistent 70-80%. The problem is even more pronounced if I use mxbai-embed-large with the example code taking 1000 to 2000ms per call and the code below taking ~50ms per call. VRAM usage is never above 4ish GB (~25% of my total VRAM).
For reference my environment is:
Windows 11
12 Gen i9-1250HX
128GB RAM
NVIDIA RTX A4500 Laptop
16GB VRAM
Ollama 0.1.38
System Info
langchain==0.2.0
langchain-chroma==0.1.1
langchain-community==0.2.0
langchain-core==0.2.0
langchain-text-splitters==0.2.0
The text was updated successfully, but these errors were encountered: