Self-Hosted RAG Web Service with BentoML

This is a BentoML example project, containing a series of tutorials where we build a complete self-hosted Retrieval-Augmented Generation (RAG) application, step-by-step.

This project will guide you through setting up a RAG service that uses vector-based search and large language models (LLMs) to answer queries using documents as a knowledge base. Our ultimate goal is to create a system that can scale efficiently and handle complex queries with high performance.

See here for a full list of BentoML example projects.

Project overview

This repository contains a series of five tutorials designed to progressively build a RAG system with custom embedding and language models as well as a vector database.

Building a Simple RAG System using LlamaIndex: Set up a basic RAG system that runs locally on your machine using LlamaIndex. This serves as a foundational step, familiarizing you with the basic components of a RAG system.
Transforming a Local RAG into a BentoML Web Service: Convert the local script into a web service by setting up a basic API service using BentoML.
Integrating a Custom Embedding Service: Replace the default OpenAI embedding model used in the RAG system with a custom model.
Integrating a Custom LLM: Replace the default OpenAI question-answering part in the RAG system with a custom LLM.
Integrating Milvus Vector Database: Implement Milvus to manage the documentation index for better scalability and performance.

Set up the environment

To begin, clone the entire project.

git clone https://github.com/bentoml/rag-tutorials.git
cd rag-tutorials

Next, set up the Python environment required for running the tutorials:

python3 -m venv rag-bentoml && . rag-bentoml/bin/activate && pip install -r requirement.txt

Get started

Each tutorial is self-contained and includes instructions on setting up and running the components discussed. Start with the first tutorial and proceed through each to build upon the previous steps. By the end of the series, you will have a better understanding of how to build a RAG system using modern technologies and custom integrations.

Have fun!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
00-simple-local-rag		00-simple-local-rag
01-simple-rag		01-simple-rag
02-custom-embedding		02-custom-embedding
03-custom-llm		03-custom-llm
04a-vector-store-milvus		04a-vector-store-milvus
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

00-simple-local-rag

00-simple-local-rag

01-simple-rag

01-simple-rag

02-custom-embedding

02-custom-embedding

03-custom-llm

03-custom-llm

04a-vector-store-milvus

04a-vector-store-milvus

.gitignore

.gitignore

README.md

README.md

requirement.txt

requirement.txt

Repository files navigation

Self-Hosted RAG Web Service with BentoML

Project overview

Set up the environment

Get started

About

Releases

Packages

Contributors 2

Languages

bentoml/rag-tutorials

Folders and files

Latest commit

History

Repository files navigation

Self-Hosted RAG Web Service with BentoML

Project overview

Set up the environment

Get started

About

Resources

Stars

Watchers

Forks

Languages