VOICE TO GPT WITH API

Voice-to-GPT is a web application that allows users to interact with an AI assistant using voice commands. The application records users' voice input, transcribes it, and sends the transcribed text to OpenAI's GPT-4 for processing. The AI assistant responds with an answer, which is then converted back to speech and played to the user. this version is faster than https://github.com/mkdev-me/voice-to-gpt because use whisper API instead of the free code. But it is not cheap

Screen.Recording.2023-03-28.at.11.57.42.mp4

functionalities:

Voice input: Users can speak their questions or commands directly into their microphone.
Automatic speech recognition (ASR): The application transcribes users' voice input using Whisper ASR.
AI assistant: The transcribed text is sent to OpenAI's GPT-4, which processes the input and generates an appropriate response.
Text-to-speech (TTS): The AI assistant's response is converted back to speech and played to the user.
Please follow the instructions in the "Installation" section to set up and run the application.

Installation

remember to add the GPT API key in you env first

export OPENAI_API_KEY=......

You only need to say what you want to ask the GPT API.

To compile the image you need to do

docker build -t audio-to-gpt .

and to execute

docker run -p 5001:5000 -e OPENAI_API_KEY=$OPENAI_API_KEY audio-to-gpt

and after that open your browser in

https://127.0.0.1:5001

and enjoy

Remember that depends of your computer, lambda, cloud run, etc resources spead will be different

Usage

Open the application in a web browser.
Click the "Record" button and speak your question or command into the microphone.
Click the "Stop" button when you're done speaking.
The application will transcribe your speech, send the text to GPT-4, and play the AI assistant's response.

Dependencies

Flask

Flask-CORS

OpenAI

Whisper

Contributing

If you'd like to contribute to this project, please submit a pull request with your proposed changes. Be sure to provide a clear description of the changes and any relevant information.

License

This project is licensed under the MIT License. Please refer to the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
static		static
templates		templates
Dockerfile		Dockerfile
README.md		README.md
audio_processing.py		audio_processing.py
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

static

static

templates

templates

Dockerfile

Dockerfile

README.md

README.md

audio_processing.py

audio_processing.py

requirements.txt

requirements.txt

server.py

server.py

Repository files navigation

VOICE TO GPT WITH API

functionalities:

Installation

Usage

Dependencies

Contributing

License

About

Releases

Packages 1

Contributors 2

Languages

mkdev-me/voice-to-gpt-with-api

Folders and files

Latest commit

History

Repository files navigation

VOICE TO GPT WITH API

functionalities:

Installation

Usage

Dependencies

Contributing

License

About

Resources

Stars

Watchers

Forks

Languages