Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Flush Feature #351

Open
dvonthenen opened this issue Mar 26, 2024 · 5 comments
Open

Implement Flush Feature #351

dvonthenen opened this issue Mar 26, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@dvonthenen
Copy link
Contributor

Proposed changes

Context

Possible Implementation

Other information

@dvonthenen dvonthenen added this to the Python SDK v.3.3 milestone Mar 26, 2024
@dvonthenen dvonthenen added the enhancement New feature or request label Mar 28, 2024
@dvonthenen dvonthenen self-assigned this May 13, 2024
@dvonthenen dvonthenen mentioned this issue May 13, 2024
8 tasks
@saleshwaram
Copy link

Hi @dvonthenen ,

I've noticed that the recent changes involving the flush feature for speech-to-text are not reflected in the SDK. I'm currently using deepgram-sdk 3.2.7 and have not seen the expected functionality (Finalize). Could you please provide some guidance on this or a timeline for when these changes might be integrated into the SDK?

Thank you!

@dvonthenen
Copy link
Contributor Author

dvonthenen commented May 21, 2024

hi @saleshwaram

It's in the queue.

You don't need to wait for this to be implemented in the SDK. You can use this right now. You can send the following message in the send() function:

{ "type": "Finalize" }

@saleshwaram
Copy link

Hi @dvonthenen,

It seems there is some confusion regarding the functionality of the "Finalize" type in the send() function, as my implementation is not receiving the expected final transcription when using this feature. Specifically, I am trying to address an edge case where I do not receive speech_final as true after finishing speaking. To handle this, I'm attempting to send a "Finalize" payload when no interim transcript is coming every 2 seconds, with the expectation that it will provide a finalized transcript up to that point. Below, I am including the relevant code snippets, the output I'm receiving, and the output I expect. Could you please clarify how the flush feature should work in this context? Are there any specific implementation details that might be missing or need to be adjusted in my code?

Thank you for your help!

Here's my code:

deepgramstt.py

import datetime
import threading
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
from dotenv import load_dotenv
import json
load_dotenv()

class DeepgramSTT:
    def __init__(self):
        self.full_transcription = ""
        self.final_transcription = ""
        self.other_text = ""
        self.transcript_ready = threading.Event()
        self.connection_status = False
        self.deepgram = DeepgramClient()
        self.connection = self.deepgram.listen.live.v("1")
        self.setup_events()
        self.timer = None

    def setup_events(self):
        self.connection.on(LiveTranscriptionEvents.Open, self.on_open)
        self.connection.on(LiveTranscriptionEvents.Close, self.on_close)
        self.connection.on(LiveTranscriptionEvents.Transcript, self.on_message)
        self.connection.on(LiveTranscriptionEvents.SpeechStarted, self.on_speech_started)
        self.connection.on(LiveTranscriptionEvents.Metadata, self.on_metadata)

    def on_open(self, *args, **kwargs):
        self.connection_status = True
        print("Connection opened")

    def on_speech_started(self, x, speech_started, **kwargs):
        print("Speech started")

    def on_metadata(self, x, metadata, **kwargs):
        print(f"\n\n{metadata}\n\n")

    def on_close(self, *args, **kwargs):
        self.connection_status = False
        print("Connection closed")

    def on_message(self, x, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        # print(f"{datetime.datetime.now()}: {result.is_final}: {result.speech_final}  {sentence}")
         # Reset the timer whenever a new sentence is received
        if len(sentence) == 0:
            return
        if result.is_final and result.speech_final:
            self.final_transcription = self.full_transcription + sentence
            if self.final_transcription!="":
                self.transcript_ready.set()
                return
            else:
                print("final")
                return
        elif result.is_final and not result.speech_final:
            self.reset_timer() 
            self.full_transcription += sentence + " "
            return
        else:
            self.reset_timer() 
            self.other_text = sentence
            print("Interim sentence: ", sentence)

    def reset_timer(self):
        if self.timer and self.other_text!="":
            self.timer.cancel()
        self.timer = threading.Timer(2.0, self.send_finalize)
        self.timer.start()

    def send_finalize(self):
        self.connection.send(json.dumps({"type": "Finalize"}))
        print("Finalize sent due to 2 seconds of silence")

    def start_connection(self):
        options = LiveOptions(
            model="nova-2",
            language="en-US",
            punctuate=True,
            encoding="linear16",
            channels=1,
            sample_rate=16000,
            vad_events=True,
            endpointing=300,
                    interim_results=True,
            utterance_end_ms="1000",
        )
        if not self.connection.start(options):
            print("Failed to start connection")
            return False
        return True

    def send_audio_data(self, data):
        self.connection.send(data)

    def finish(self):
        if self.timer:
            self.timer.cancel()
        self.connection.finish()
        print("Finished")
        self.print_final_transcript()

    def print_final_transcript(self):
        print("Complete final transcript:")
        print(self.full_transcription)

    def is_connection_active(self):
        return self.connection_status

test.py

from deepgramstt import DeepgramSTT
from datetime import datetime
import threading
import pyaudio

def main():
    # Audio stream configuration
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    SAMPLE_RATE = 16000
    FRAMES_PER_BUFFER = 3200
    
    
    

    # Initialize PyAudio
    p = pyaudio.PyAudio()
    try:
        stream = p.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, input=True, frames_per_buffer=FRAMES_PER_BUFFER)
    except IOError as e:
        print(f"Could not open audio stream: {e}")
        p.terminate()
        return

    # Initialize DeepgramSTT
    dg_connection = DeepgramSTT()
    if not dg_connection.start_connection():
        print("Failed to start Deepgram connection")
        stream.stop_stream()
        stream.close()
        p.terminate()
        return
    
    print("Connection started. Begin speaking now.")
    
    # Start the audio stream thread immediately
    exit_flag = False

    def audio_stream_thread():
  
        try:
            while not exit_flag and dg_connection.is_connection_active():
                try:
                    data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
                except IOError as e:
                    print(f"Error reading audio data: {e}")
                    break  # Exit the loop if we can't read the data
                dg_connection.send_audio_data(data)

                if dg_connection.transcript_ready.is_set():  # Non-blocking check for the event
                    print(f"final: {dg_connection.final_transcription}\ttime: {datetime.utcnow().isoformat(timespec='milliseconds') + 'Z'}")
                    dg_connection.final_transcription = ""
                    dg_connection.transcript_ready.clear()  # Reset the event
        except Exception as e:
            print(f"Unexpected error: {e}")
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()
            dg_connection.finish()

    audio_thread = threading.Thread(target=audio_stream_thread)
    audio_thread.start()

    input("Press Enter to stop recording...\n")

    exit_flag = True
    audio_thread.join()

    print("Finished recording and processing.")

if __name__ == "__main__":
    main()

Output:

Received behaviour:

$ python -m test
Connection opened
Connection started. Begin speaking now.
Press Enter to stop recording...
Speech started
Interim sentence:  Early one morning,
Interim sentence:  Early one morning, while the sun was
Interim sentence:  Early one morning, while the sun was just
Interim sentence:  Early one morning, while the sun was just starting to rise, a
Interim sentence:  Early one morning, while the sun was just starting to rise, a young and energetic dog
Speech started
Interim sentence:  excitedly ran around
Interim sentence:  excitedly ran around the park. Juncker gave a
Interim sentence:  excitedly ran around the park, jumping over small bushes, and chasing
Interim sentence:  excitedly ran around the park, jumping over small bushes and chasing after brightly colored
Speech started
Interim sentence:  a group of children
Interim sentence:  a group of children laughed and played nearby
Interim sentence:  a group of children laughed and played nearby, enjoying the
Interim sentence:  a group of children laughed and played nearby, enjoying the warm weather and the free
Speech started
Interim sentence:  before school started.
final: Early one morning, while the sun was just starting to rise, a young and energetic dog excitedly ran around the park, jumping over small bushes and chasing after brightly colored butterflies a group of children laughed and played nearby, enjoying the warm weather and the freedom of being outside before school started.  time: 2024-05-22T11:44:15.713Z
Finalize sent due to 2 seconds of silence
Speech started

Connection closed



{
    "type": "Metadata",
    "transaction_key": "deprecated",
    "request_id": "457e4f1b-e9a7-4e99-a704-d2f0f045d00a",
    "sha256": "f91f59bcb63d46d4ea6e3a9b647d65e940d83373d9f929f71ff32940342c578e",
    "created": "2024-05-22T11:43:53.803Z",
    "duration": 23.6,
    "channels": 1,
    "models": [
        "1dbdfb4d-85b2-4659-9831-16b3c76229aa"
    ],
    "model_info": {
        "1dbdfb4d-85b2-4659-9831-16b3c76229aa": {
            "name": "2-general-nova",
            "version": "2024-01-11.36317",
            "arch": "nova-2"
        }
    }
}



Finished

Another output:

$ python -m test
Connection opened
Connection started. Begin speaking now.
Press Enter to stop recording...
Speech started
Interim sentence:  Early one more
Interim sentence:  Early one morning, while the sun was just
Interim sentence:  Early one morning, while the sun was just starting to rise, a
Interim sentence:  Early one morning, while the sun was just starting to rise, a young and energetic
Speech started
Interim sentence:  a young and energetic dog excited
Interim sentence:  a young and energetic dog excitedly ran around the path
Interim sentence:  a young and energetic dog excitedly ran around the park jumping over small
Speech started
Interim sentence:  and chasing after prey
Interim sentence:  and chasing up brightly colored butterflies.
Interim sentence:  and chasing up brightly colored butterflies as a group of children
Interim sentence:  and chasing after brightly colored butterflies as a group of children laughed and played near
Speech started
Interim sentence:  and played nearby, enjoying the
Interim sentence:  and played nearby, enjoying the warm weather and the free
Interim sentence:  and played nearby, enjoying the warm weather and the freedom of being outside
Interim sentence:  and played nearby, enjoying the warm weather and the freedom of being outside before school started.
Speech started
Finalize sent due to 2 seconds of silence


Connection closed



{
    "type": "Metadata",
    "transaction_key": "deprecated",
    "request_id": "d3380ca8-175b-470f-b514-84f4199b5baa",
    "sha256": "798f63a6df80a3ae1bee4548708d5ea0190e5508e4d357debe807402cf944e31",
    "created": "2024-05-22T11:43:07.732Z",
    "duration": 25.4,
    "channels": 1,
    "models": [
        "1dbdfb4d-85b2-4659-9831-16b3c76229aa"
    ],
    "model_info": {
        "1dbdfb4d-85b2-4659-9831-16b3c76229aa": {
            "name": "2-general-nova",
            "version": "2024-01-11.36317",
            "arch": "nova-2"
        }
    }
}


Finished

Expected output:

$ python -m test
Connection opened
Connection started. Begin speaking now.
Press Enter to stop recording...
Speech started
Interim sentence:  Early one more
Interim sentence:  Early one morning, while the sun was just
Interim sentence:  Early one morning, while the sun was just starting to rise, a
Interim sentence:  Early one morning, while the sun was just starting to rise, a young and energetic
Speech started
Interim sentence:  a young and energetic dog excited
Interim sentence:  a young and energetic dog excitedly ran around the path
Interim sentence:  a young and energetic dog excitedly ran around the park jumping over small
Speech started
Interim sentence:  and chasing after prey
Interim sentence:  and chasing up brightly colored butterflies.
Interim sentence:  and chasing up brightly colored butterflies as a group of children
Interim sentence:  and chasing after brightly colored butterflies as a group of children laughed and played near
Speech started
Interim sentence:  and played nearby, enjoying the
Interim sentence:  and played nearby, enjoying the warm weather and the free
Interim sentence:  and played nearby, enjoying the warm weather and the freedom of being outside
Interim sentence:  and played nearby, enjoying the warm weather and the freedom of being outside before school started.
Speech started
Finalize sent due to 2 seconds of silence
final: Early one morning, while the sun was just starting to rise, a young and energetic dog excitedly ran around the park, jumping over small bushes and chasing after brightly colored butterflies a group of children laughed and played nearby, enjoying the warm weather and the freedom of being outside before school started.  time: 2024-05-22T11:44:15.713Z


Connection closed



{
    "type": "Metadata",
    "transaction_key": "deprecated",
    "request_id": "d3380ca8-175b-470f-b514-84f4199b5baa",
    "sha256": "798f63a6df80a3ae1bee4548708d5ea0190e5508e4d357debe807402cf944e31",
    "created": "2024-05-22T11:43:07.732Z",
    "duration": 25.4,
    "channels": 1,
    "models": [
        "1dbdfb4d-85b2-4659-9831-16b3c76229aa"
    ],
    "model_info": {
        "1dbdfb4d-85b2-4659-9831-16b3c76229aa": {
            "name": "2-general-nova",
            "version": "2024-01-11.36317",
            "arch": "nova-2"
        }
    }
}


Finished

@dvonthenen
Copy link
Contributor Author

If I understand the output correctly, Tthe first example I wouldn't expect anything to happen since the final: happened just before the flush.

The second doesn't seem right, but I haven't experimented with the feature much. There are people using this in production, so it seems like there might be an issue in your code.

@saleshwaram
Copy link

In the first example, the 'final' transcript is received and then 'finalize' is sent. Since the final transcript has already been received, I am not expecting anything further.

However, in the second transcript implementation, I have tried it a couple of times but it never produced any final response. If you could provide a working sample, I could test it on my side because I don't see any issue on my code side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants