Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Embeddings With Ollama #21870

Open
5 tasks done
jkablan opened this issue May 18, 2024 · 1 comment
Open
5 tasks done

Slow Embeddings With Ollama #21870

jkablan opened this issue May 18, 2024 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: embeddings Related to text embedding models module Ɑ: text splitters Related to text splitters package

Comments

@jkablan
Copy link

jkablan commented May 18, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.embeddings.ollama import  OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
import tqdm
print('done loading imports')


def main(args):
    
    # Get the directory path from arguments
    directory_path = args.directory


    loader = PyPDFDirectoryLoader(directory_path)
    print('loading docs')
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(chunk_size=400,chunk_overlap=200)
    print('splitting docs')
    splits = splitter.split_documents(docs);    

    embedAgent = OllamaEmbeddings(model='llama2',show_progress=True)
    print('generating embeddings')

    vectStore = Chroma.from_documents(documents=splits,embedding=embedAgent,persist_directory=directory_path)    

 
import ollama

def testOllamaSpeed(args):
    # Get the directory path from arguments
    directory_path = args.directory


    loader = PyPDFDirectoryLoader(directory_path)
    print('loading docs')
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
    print('splitting docs')
    splits = splitter.split_documents(docs);    

    txts = []

    print('making txt')
    for doc in tqdm.tqdm(docs):
        txts.append(str(doc))

    print('making embeddings')
    mbeds = []
    for txt in tqdm.tqdm(txts):
       mbeds.append(ollama.embeddings(model='llama2',prompt=txt))

if __name__ == '__main__':

    # Create the argument parser
    parser = argparse.ArgumentParser(description="Script to process a directory path")
    
    # Add the -d argument for directory path
    parser.add_argument('-d', '--directory', type=str, required=True, help='Path to the directory')
    
    # Parse the arguments
    args = parser.parse_args()

    #main(args)
    testOllamaSpeed(args)

Error Message and Stack Trace (if applicable)

n/a

Description

Calls to Ollama embeddings API are very slow (1000 to 2000ms) . GPU utilization is very low. Utilization spikes 30% - 100% once every second or two. This happens if I run main() or testOllamaSpeed() In the example code. This would suggest the problem is with Ollama. But If I run the following code which does not use any langchain imports each call completes in 200-300ms and GPU utilization hovers at a consistent 70-80%. The problem is even more pronounced if I use mxbai-embed-large with the example code taking 1000 to 2000ms per call and the code below taking ~50ms per call. VRAM usage is never above 4ish GB (~25% of my total VRAM).

For reference my environment is:
Windows 11
12 Gen i9-1250HX
128GB RAM
NVIDIA RTX A4500 Laptop
16GB VRAM
Ollama 0.1.38

import ollama
import os
import PyPDF2
import tqdm
import argparse

def read_pdfs_from_directory(directory_path):
    pdf_texts = {}
    
    for filename in os.listdir(directory_path):
        if filename.endswith('.pdf'):
            file_path = os.path.join(directory_path, filename)
            pdf_texts[filename] = read_pdf(file_path)
    
    return pdf_texts

def read_pdf(file_path):
    pdf_text = ""
    
    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            pdf_text += page.extract_text()
    
    return pdf_text

def split_into_chunks(input_string, chunk_size):
    # Use list comprehension to create chunks of the specified size
    chunks = [input_string[i:i+chunk_size] for i in range(0, len(input_string), chunk_size)]
    return chunks

def main(args):

    dir = args.directory 

    print('Reading pdfs')
    allFiles = read_pdfs_from_directory(dir)

    print('chunking')
    chunks = []
    for k,v in allFiles.items():
        chunks.extend(split_into_chunks(v,1000))

    print('Generating embeddings')
    for chunk in tqdm.tqdm(chunks):
        ollama.embeddings(model='llama2',prompt=chunk)
        #ollama.embeddings(model='mxbai-embed-large',prompt=chunk)
    print('done')

if __name__ == '__main__':

    # Create the argument parser
    parser = argparse.ArgumentParser(description="Script to process a directory path")
    
    # Add the -d argument for directory path
    parser.add_argument('-d', '--directory', type=str, required=True, help='Path to the directory')
    
    # Parse the arguments
    args = parser.parse_args()

    main(args)

System Info

langchain==0.2.0
langchain-chroma==0.1.1
langchain-community==0.2.0
langchain-core==0.2.0
langchain-text-splitters==0.2.0

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module Ɑ: text splitters Related to text splitters package 🔌: chroma Primarily related to ChromaDB integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 18, 2024
@keenborder786
Copy link
Contributor

Yes, I faced a similar situation since Ollama does not support concurrency. To overcome this issue, I started multiple containers of Ollama and distributed the embedding requests across the containers in a round-robin manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: embeddings Related to text embedding models module Ɑ: text splitters Related to text splitters package
Projects
None yet
Development

No branches or pull requests

2 participants