You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the microsoft-cognitiveservices-speech-sdk@1.28.0 on Node.js
When using some voices (en-US-SaraNeural does this the most frequently for me, but I've run across it with other voices), filler words like "um" or "uh" cause the wordBoundary event arguments to get garbled after the filler word. Both the text and timing don't match the audio when this happens.
Example code
Here's a small example that illustrates what I'm running into for the text becoming inaccurate:
synthesizeSpeech: async(): Promise<string[]>=>{constspeechConfig=SpeechConfig.fromSubscription(key,region);speechConfig.speechSynthesisOutputFormat=SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3;constsynthesizer=newSpeechSynthesizer(speechConfig,nullasunknownasAudioConfig);constssml=` <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-US-SaraNeural"> Hi my name is uh Sara. I am a voice that gives good audio but incorrect word boundary results. </voice> </speak> `;returnnewPromise<string[]>((resolve,reject)=>{constonError=(err: string|Error): void=>{synthesizer.close();reject(err);};constwords: string[]=[];synthesizer.wordBoundary=(_,event)=>{words.push(event.text);};synthesizer.speakSsmlAsync(ssml,result=>{if(result.errorDetails){onError(result.errorDetails);}elseif(result.reason===ResultReason.SynthesizingAudioCompleted){synthesizer.close();resolve(words);}},onError);});}
so that await synthesizeSpeech() returns
["Hi","my","name","is","rrec",// everything after where "uh" should be is garbled"t","w","rd","b","undar"," res","lts. "," "," "," </","oice>\n "," ","</speak>","",""]
The text was updated successfully, but these errors were encountered:
I'm using the microsoft-cognitiveservices-speech-sdk@1.28.0 on Node.js
When using some voices (
en-US-SaraNeural
does this the most frequently for me, but I've run across it with other voices), filler words like "um" or "uh" cause the wordBoundary event arguments to get garbled after the filler word. Both the text and timing don't match the audio when this happens.Example code
Here's a small example that illustrates what I'm running into for the text becoming inaccurate:
so that
await synthesizeSpeech()
returnsThe text was updated successfully, but these errors were encountered: