GitHub - bolna-ai/bolna: End-to-end platform for building voice first multimodal agents

End-to-end open-source voice agents platform: Quickly build voice firsts conversational assistants through a json.

Discord | Docs | Website

Introduction

Bolna is the end-to-end open source production ready framework for quickly building LLM based voice driven conversational applications.

Demo

demo-create-agent-and-make-calls.mp4

Components

Bolna helps you create AI Voice Agents which can be instructed to do tasks beginning with:

Initiating a phone call using telephony providers like Twilio, Exotel, etc.
Transcribing the conversations using Deepgram, etc.
Using LLMs like OpenAI, Llama, Cohere, Mistral, etc to handle conversations
Synthesizing LLM responses back to telephony using AWS Polly, XTTS, ElevenLabs, Deepgram etc.
Instructing the Agent to perform tasks like sending emails, text messages, booking calendar after the conversation has ended

Refer to the docs for a deepdive into all supported providers.

Local setup

A basic local setup uses Twilio for telephony. We have dockerized the setup in local_setup/. One will need to populate an environment .env file from .env.sample.

The setup consists of four containers:

Twilio web server: for initiating the calls one will need to set up a Twilio account
Bolna server: for creating and handling agents
ngrok: for tunneling. One will need to add the authtoken to ngrok-config.yml
redis: for persisting agents & prompt data

Use docker to build the images using .env file as the environment file and run them locally

docker-compose build --no-cache: rebuild images
docker-compose up: run the build images

Once the docker containers are up, you can now start to create your agents and instruct them to initiate calls.

Creating your agent and invoking calls

Once you have the above docker setup and running, you can create agents and initiate calls.

Use the below payload to create an Agent via http://localhost:5001/agent

Agent Payload

{
    "agent_config": {
        "agent_name": "Alfred",
        "agent_type": "other",
        "agent_welcome_message": "Welcome",
        "tasks": [
            {
                "task_type": "conversation",
                "toolchain": {
                    "execution": "parallel",
                    "pipelines": [
                        [
                            "transcriber",
                            "llm",
                            "synthesizer"
                        ]
                    ]
                },
                "tools_config": {
                    "input": {
                        "format": "pcm",
                        "provider": "twilio"
                    },
                    "llm_agent": {
                        "agent_flow_type": "streaming",
                        "family": "openai",
                        "request_json": true,
                        "model": "gpt-3.5-turbo-16k",
                        "use_fallback": true
                    },
                    "output": {
                        "format": "pcm",
                        "provider": "twilio"
                    },
                    "synthesizer": {
                        "audio_format": "wav",
                        "provider": "elevenlabs",
                        "stream": true,
                        "provider_config": {
                            "voice": "Meera - high quality, emotive",
                            "model": "eleven_multilingual_v2",
                            "voice_id": "TTa58Hl9lmhnQEvhp1WM"
                        },
                        "buffer_size": 100.0
                    },
                    "transcriber": {
                        "encoding": "linear16",
                        "language": "en",
                        "model": "deepgram",
                        "stream": true
                    }
                },
                "task_config": {
                    "hangup_after_silence": 30.0
                }
            }
        ]
    },
    "agent_prompts": {
        "task_1": {
            "system_prompt": "Ask if they are coming for party tonight"
        }
    }
}

The response of the previous API will return a uuid as the agent_id. Use this agent_id to initiate a call via the telephony server running on 8001 port at http://localhost:8001/call

Call Payload

{
    "agent_id": "4c19700b-227c-4c2d-8bgf-42dfe4b240fc",
    "recipient_phone_number": "+19876543210",
}

Using your own providers

You can populate the .env file to use your own keys for providers.

ASR Providers

These are the current supported ASRs Providers:

Provider	Environment variable to be added in `.env` file
Deepgram	`DEEPGRAM_AUTH_TOKEN`

LLM Providers

Bolna uses LiteLLM package to support multiple LLM integrations.

These are the current supported LLM Provider Family:

bolna/bolna/providers.py

Lines 19 to 28 in c8a0d14

    
           SUPPORTED_LLM_MODELS = { 
        
               'openai': OpenAiLLM, 
        
               'cohere': LiteLLM, 
        
               'ollama': LiteLLM, 
        
               'mistral': LiteLLM, 
        
               'llama': LiteLLM, 
        
               'zephyr': LiteLLM, 
        
               'perplexity': LiteLLM, 
        
               'vllm': OpenAiLLM 
        
           }

For LiteLLM based LLMs, add either of the following to the .env file depending on your use-case:

LITELLM_MODEL_API_KEY: API Key of the LLM
LITELLM_MODEL_API_BASE: URL of the hosted LLM
LITELLM_MODEL_API_VERSION: API VERSION for LLMs like Azure

For LLMs hosted via VLLM, add the following to the .env file:
VLLM_SERVER_BASE_URL: URL of the hosted LLM using VLLM

TTS Providers

These are the current supported TTS Providers:

bolna/bolna/providers.py

Lines 7 to 14 in c8a0d14

    
           SUPPORTED_SYNTHESIZER_MODELS = { 
        
               'polly': PollySynthesizer, 
        
               'xtts': XTTSSynthesizer, 
        
               'elevenlabs': ElevenlabsSynthesizer, 
        
               'openai': OPENAISynthesizer, 
        
               'fourie': FourieSynthesizer, 
        
               'deepgram': DeepgramSynthesizer 
        
           }

Provider	Environment variable to be added in `.env` file
AWS Polly	Accessed from system wide credentials via ~/.aws
Elevenlabs	`ELEVENLABS_API_KEY`
OpenAI	`OPENAI_API_KEY`
Deepgram	`DEEPGRAM_AUTH_TOKEN`

Extending with other Telephony Providers

In case you wish to extend and add some other Telephony like Vonage, Telnyx, etc. following the guidelines below:

Make sure bi-directional streaming is supported by the Telephony provider
Add the telephony-specific input handler file in input_handlers/telephony_providers writing custom functions extending from the telephony.py class
1. This file will mainly contain how different types of event packets are being ingested from the telephony provider
Add telephony-specific output handler file in output_handlers/telephony_providers writing custom functions extending from the telephony.py class
1. This mainly concerns converting audio from the synthesizer class to a supported audio format and streaming it over the websocket provided by the telephony provider
Lastly, you'll have to write a dedicated server like the example twilio_api_server.py provided in local_setup to initiate calls over websockets.

Open-source v/s Paid

Though the repository is completely open source, you can connect with us if interested in managed hosted offerings or more customized solutions.

Contributing

We love all types of contributions: whether big or small helping in improving this community resource.

There are a number of open issues present which can be good ones to start with
If you have suggestions for enhancements, wish to contribute a simple fix such as correcting a typo, or want to address an apparent bug, please feel free to initiate a new issue or submit a pull request
If you're contemplating a larger change or addition to this repository, be it in terms of its structure or the features, kindly begin by creating a new issue open a new issue and outline your proposed changes. This will allow us to engage in a discussion before you dedicate a significant amount of time or effort. Your cooperation and understanding are appreciated

Name		Name	Last commit message	Last commit date
Latest commit History 601 Commits
bolna		bolna
local_setup		local_setup
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bolna

bolna

local_setup

local_setup

.env.sample

.env.sample

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

Repository files navigation

Discord | Docs | Website

Introduction

Demo

Components

Local setup

Creating your agent and invoking calls

Using your own providers

Extending with other Telephony Providers

Open-source v/s Paid

Contributing

About

Releases 10

Packages

Contributors 9

Languages

	SUPPORTED_LLM_MODELS = {
	'openai': OpenAiLLM,
	'cohere': LiteLLM,
	'ollama': LiteLLM,
	'mistral': LiteLLM,
	'llama': LiteLLM,
	'zephyr': LiteLLM,
	'perplexity': LiteLLM,
	'vllm': OpenAiLLM
	}

	SUPPORTED_SYNTHESIZER_MODELS = {
	'polly': PollySynthesizer,
	'xtts': XTTSSynthesizer,
	'elevenlabs': ElevenlabsSynthesizer,
	'openai': OPENAISynthesizer,
	'fourie': FourieSynthesizer,
	'deepgram': DeepgramSynthesizer
	}

License

bolna-ai/bolna

Folders and files

Latest commit

History

Repository files navigation

Discord | Docs | Website

Introduction

Demo

Components

Local setup

Creating your agent and invoking calls

Using your own providers

Extending with other Telephony Providers

Open-source v/s Paid

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages