The Reginald project consists of:
├── azure
│ └── scripts to setup Reginald infrastructure on Azure
├── data
│ └── directory to store llama-index data indexes and other public Turing data
├── docker
│ └── scripts for building a Docker images for both Reginald app and Slack-bot only app
├── notebooks
│ └── data processing notebooks
│ └── development notebooks for llama-index Reginald models
└── reginald
└── models: scripts for setting up query and chat engines
└── slack_bot: scripts for setting up Slack bot
└── scripts for setting up end to end Slack bot with query engine
This is a simple Slack bot written in Python that listens for direct messages and @mentions in any channel it is in and responds with a message and an emoji. The bot uses web sockets for communication. How the bot responds to messages is determined by the response engine that is set up - see the models README for more details of the models available. The main models we use are:
llama-index-llama-cpp
: a model which uses thellama-index
library to query a data index and then uses a quantised LLM (implemented usingllama-python-cpp
) to generate a responsellama-index-hf
: a model which uses thellama-index
library to query a data index and then uses an LLM from Huggingface to generate a responsellama-index-gpt-azure
: a model which uses thellama-index
library to query a data index and then uses the Azure OpenAI API to query a LLM to generate a response
This project uses Poetry for dependency management. Make sure you have Poetry installed on your machine.
poetry install --all-extras
If you only want to run a subset of the available packages then use:
- for the LLM-only and Slack-bot-only setup:
--extras api_bot
- for the Azure configuration:
--extras azure
- for running notebooks regarding using fine-tuning:
--extras ft_notebooks
- for running notebooks regarding using
llama-index
:--extras llama_index_notebooks
Without installing extras, you will have the packages required in order to run the full Reginald model on your machine.
pre-commit install
To set up the Slack bot, you must set Slack bot environment variables. To obtain them from Slack, follow the steps below:
-
Set up the bot in Slack: Socket Mode Client.
-
To connect to Slack, the bot requires an app token and a bot token. Put these into into a
.env
file:echo "SLACK_BOT_TOKEN='your-bot-user-oauth-access-token'" >> .env echo "SLACK_APP_TOKEN='your-app-level-token'" >> .env
-
Activate the virtual environment:
poetry shell
We are currently using llama-hub
GitHub readers for creating our data indexes and pulling from relevant repos for issues and files.
As a prerequisite, you will need to generate a "classic" personal access token with the repo
and read:org
scopes - see here for instructions for creating and obtaining your personal access token.
Once, you do this, simply add this to your .env
file:
echo "GITHUB_TOKEN='your-github-personal-access-token'" >> .env
In order to run the full Reginald app locally (i.e. setting up the full response engine along with the Slack bot), you can follow the steps below:
-
Set environment variables (for more details on environtment variables, see the environment variables README):
source .env
-
Run the bot using
reginald_run
- note that this actually runsreginald/run.py
. To see CLI arguments:reginald_run --help
For examples of running each of our different models, see the models README.
The reginald_run
CLI takes in several arguments such as:
--model
(-m
): to select the type of model to use (see the models README for the list of models available)--model-name
(-n
): to select the sub-model to use within the model selected- For
llama-index-llama-cpp
andllama-index-hf
models, this specifies the LLM (or path to that model) which we would like to use - For
chat-completion-azure
andllama-index-gpt-azure
, this refers to the deployment name on Azure - For
chat-completion-openai
andllama-index-gpt-openai
, this refers to the model/engine name on OpenAI
- For
There are some CLI arguments specific to only the llama-index
models:
--mode
: to determine whether to use 'query' or 'chat' engine--data-dir
(-d
): specify the data directory location--which-index
(-w
): specify the directory name for looking up/writing data index (forllama-index
models)--force-new-index
(-f
): whether or not to force create a new data index
There are some CLI arguments specific to only the llama-index-llama-cpp
and llama-index-hf
models:
--max-input-size
(-max
): maxumum input size of LLM
There are some CLI arguments specific to only the llama-index-llama-cpp
model:
--is-path
(-p
): whether or not the model-name passed is a path to the model--n-gpu-layers
(-ngl
): number of layers to offload to GPU if usingllama-index-llama-cpp
model
There are some CLI arguments specific to only the llama-index-hf
model:
--device
(-dev
): device to host Huggingface model if usingllama-index-hf
model
Note: specifying CLI arguments will override any environment variables set.
For example, to set up a llama-index-llama-cpp
chat engine model running Llama-2-7b-Chat (quantised to 4bit), you can run:
reginald_run \
--model llama-index-llama-cpp \
--model-name https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf \
--mode chat \
--data-dir data/ \
--which-index handbook \
--max-input-size 4096 \
--n-gpu-layers 2
The bot will now listen for @mentions in the channels it's added to and respond with a simple message.
There are some cases where you'd want to run the response engine and Slack bot separately.
For instance, with the llama-index-llama-cpp
and llama-index-hf
models, you are hosting your own LLM which you might want to host on a machine with GPUs.
The Slack bot can then be run on a separate (more cost-efficient) machine.
Doing this allows you to change the model or machine running the model without having to change the Slack bot.
To do this, you can follow the steps below:
-
On the machine where you want to run the response engine, run the following command:
- Set up environment variables for the response engine (for more details on environtment variables, see the environment variables README):
source .response_engine_env
- Set up response engine using
reginald_run_api_llm
- note that this actually runsreginald/models/app.py
. To see CLI arguments:
reginald_run_api_llm --help
This command uses many of the same CLI arguments as described above. For example to set up a
llama-index-llama-cpp
chat engine model running Llama-2-7b-Chat (quantised to 4bit), you can run:reginald_run_api_llm \ --model llama-index-llama-cpp \ --model-name https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf \ --mode chat \ --data-dir data/ \ --which-index handbook \ --max-input-size 4096 \ --n-gpu-layers 2
-
On the machine where you want to run the Slack bot, run the following command:
- Set up environment variables for the Slack bot (for more details on environtment variables, see the environment variables README):
source .slack_bot_env
- Set up Slack bot using
reginald_run_api_bot
- note that this actually runsreginald/slack_bot/setup_bot.py
. To see CLI arguments:
reginald_run_api_bot --help
This command takes in an emoji to respond with. For example, to set up a Slack bot that responds with the
:llama:
emoji, you can run:reginald_run_api_bot --emoji llama
For full details of Docker setup, see the Docker README.
-
Go to the
azure
directory -
Ensure that you have installed
Pulumi
and theAzure CLI
-
Setup the Pulumi backend and deploy
./setup.sh && AZURE_KEYVAULT_AUTH_VIA_CLI=true pulumi up -y