Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Browser Unable to Decode and Play Partial Speech Segments due to Missing Header Information #823

Open
rohit1coding opened this issue May 9, 2024 · 1 comment
Assignees
Labels
pending close Ready for closure pending follow-up or prolonged inactivity question Further information is requested

Comments

@rohit1coding
Copy link

rohit1coding commented May 9, 2024

What happened?

Issue Description:
I am currently working on a project where real-time audio playback is required while text-to-speech conversion is still in progress. To achieve lower latency, I am attempting to play partial speech segments as they become available instead of waiting for the complete text-to-speech data.

Problem:
The issue arises with the partial audio buffers: these buffers are raw and lack the necessary header information that browsers require for decoding and playback. Consequently, while complete speech data from the speakSsmlAsync method works correctly, partial speech data does not function as expected due to this missing header.

Code Implementation:
Backend Code:
const pushStream = SpeechSdk.AudioOutputStream.createPullStream(); const audioConfig = SpeechSdk.AudioConfig.fromStreamOutput(pushStream); synthesizer = new SpeechSdk.SpeechSynthesizer(speechConfig, audioConfig); pushStream.write = (audioData) => { playAudio(audioData); };

Frontend Code:
const playAudio = async (audioData) => { const audioDataBufferArray = Uint8Array.from(audioData).buffer; try { const decodedAudioBuffer = await audioContext.decodeAudioData(audioDataBufferArray); } catch (error) { console.error('Error decoding audio data:', error); } };

Expected Behavior:
The browser should be able to decode and play partial speech segments without any issues.

Current Behavior:
The browser fails to decode the partial audio data due to the absence of header information, leading to errors and inability to play the speech segments.

Steps to Reproduce:
Initiate the text-to-speech conversion process.
Attempt to play audio as it is being synthesized.
Observe that while complete audio data plays without issues, partial segments fail to decode and play.

Potential Solutions:
A possible approach to resolve this issue could involve dynamically adding the necessary header information to the partial buffers before attempting playback, or implementing a method to handle raw audio data more effectively in the browser.

This issue significantly affects the usability of real-time audio features in our application, and any guidance or solutions would be greatly appreciated.

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

Chrome

Relevant log output

No response

@rohit1coding rohit1coding added the bug Something isn't working label May 9, 2024
@glharper
Copy link
Member

@rohit1coding Thank you for using JS Speech SDK, and writing this issue up. About how many bytes are these partial speech segments you're wanting to decode? There is code in the JS Speech SDK for creating a wav header here, if you'd like to reuse it in your own code to prepend to the audio stream before writing to the pushStream.

@glharper glharper added the question Further information is requested label May 22, 2024
@glharper glharper added pending close Ready for closure pending follow-up or prolonged inactivity and removed bug Something isn't working labels May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending close Ready for closure pending follow-up or prolonged inactivity question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants