Slow Embeddings With Ollama #21870

jkablan · 2024-05-18T19:07:38Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.embeddings.ollama import  OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
import tqdm
print('done loading imports')


def main(args):
    
    # Get the directory path from arguments
    directory_path = args.directory


    loader = PyPDFDirectoryLoader(directory_path)
    print('loading docs')
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(chunk_size=400,chunk_overlap=200)
    print('splitting docs')
    splits = splitter.split_documents(docs);    

    embedAgent = OllamaEmbeddings(model='llama2',show_progress=True)
    print('generating embeddings')

    vectStore = Chroma.from_documents(documents=splits,embedding=embedAgent,persist_directory=directory_path)    

 
import ollama

def testOllamaSpeed(args):
    # Get the directory path from arguments
    directory_path = args.directory


    loader = PyPDFDirectoryLoader(directory_path)
    print('loading docs')
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
    print('splitting docs')
    splits = splitter.split_documents(docs);    

    txts = []

    print('making txt')
    for doc in tqdm.tqdm(docs):
        txts.append(str(doc))

    print('making embeddings')
    mbeds = []
    for txt in tqdm.tqdm(txts):
       mbeds.append(ollama.embeddings(model='llama2',prompt=txt))

if __name__ == '__main__':

    # Create the argument parser
    parser = argparse.ArgumentParser(description="Script to process a directory path")
    
    # Add the -d argument for directory path
    parser.add_argument('-d', '--directory', type=str, required=True, help='Path to the directory')
    
    # Parse the arguments
    args = parser.parse_args()

    #main(args)
    testOllamaSpeed(args)

Error Message and Stack Trace (if applicable)

n/a

Description

Calls to Ollama embeddings API are very slow (1000 to 2000ms) . GPU utilization is very low. Utilization spikes 30% - 100% once every second or two. This happens if I run main() or testOllamaSpeed() In the example code. This would suggest the problem is with Ollama. But If I run the following code which does not use any langchain imports each call completes in 200-300ms and GPU utilization hovers at a consistent 70-80%. The problem is even more pronounced if I use mxbai-embed-large with the example code taking 1000 to 2000ms per call and the code below taking ~50ms per call. VRAM usage is never above 4ish GB (~25% of my total VRAM).

For reference my environment is:
Windows 11
12 Gen i9-1250HX
128GB RAM
NVIDIA RTX A4500 Laptop
16GB VRAM
Ollama 0.1.38

import ollama
import os
import PyPDF2
import tqdm
import argparse

def read_pdfs_from_directory(directory_path):
    pdf_texts = {}
    
    for filename in os.listdir(directory_path):
        if filename.endswith('.pdf'):
            file_path = os.path.join(directory_path, filename)
            pdf_texts[filename] = read_pdf(file_path)
    
    return pdf_texts

def read_pdf(file_path):
    pdf_text = ""
    
    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            pdf_text += page.extract_text()
    
    return pdf_text

def split_into_chunks(input_string, chunk_size):
    # Use list comprehension to create chunks of the specified size
    chunks = [input_string[i:i+chunk_size] for i in range(0, len(input_string), chunk_size)]
    return chunks

def main(args):

    dir = args.directory 

    print('Reading pdfs')
    allFiles = read_pdfs_from_directory(dir)

    print('chunking')
    chunks = []
    for k,v in allFiles.items():
        chunks.extend(split_into_chunks(v,1000))

    print('Generating embeddings')
    for chunk in tqdm.tqdm(chunks):
        ollama.embeddings(model='llama2',prompt=chunk)
        #ollama.embeddings(model='mxbai-embed-large',prompt=chunk)
    print('done')

if __name__ == '__main__':

    # Create the argument parser
    parser = argparse.ArgumentParser(description="Script to process a directory path")
    
    # Add the -d argument for directory path
    parser.add_argument('-d', '--directory', type=str, required=True, help='Path to the directory')
    
    # Parse the arguments
    args = parser.parse_args()

    main(args)

System Info

langchain==0.2.0
langchain-chroma==0.1.1
langchain-community==0.2.0
langchain-core==0.2.0
langchain-text-splitters==0.2.0

keenborder786 · 2024-05-31T13:10:59Z

Yes, I faced a similar situation since Ollama does not support concurrency. To overcome this issue, I started multiple containers of Ollama and distributed the embedding requests across the containers in a round-robin manner.

dosubot bot added Ɑ: embeddings Related to text embedding models module Ɑ: text splitters Related to text splitters package 🔌: chroma Primarily related to ChromaDB integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Embeddings With Ollama #21870

Slow Embeddings With Ollama #21870

jkablan commented May 18, 2024 •

edited

keenborder786 commented May 31, 2024

Slow Embeddings With Ollama #21870

Slow Embeddings With Ollama #21870

Comments

jkablan commented May 18, 2024 • edited

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

keenborder786 commented May 31, 2024

jkablan commented May 18, 2024 •

edited