Filler words sometimes cause issues with wordBoundary results #680

dwegscheidTSC · 2023-05-22T15:13:25Z

I'm using the microsoft-cognitiveservices-speech-sdk@1.28.0 on Node.js

When using some voices (en-US-SaraNeural does this the most frequently for me, but I've run across it with other voices), filler words like "um" or "uh" cause the wordBoundary event arguments to get garbled after the filler word. Both the text and timing don't match the audio when this happens.

Example code

Here's a small example that illustrates what I'm running into for the text becoming inaccurate:

synthesizeSpeech: async (): Promise<string[]> => {
    const speechConfig = SpeechConfig.fromSubscription(key, region);
    speechConfig.speechSynthesisOutputFormat = SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3;

    const synthesizer = new SpeechSynthesizer(speechConfig, null as unknown as AudioConfig);
    const ssml = `
       <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
          <voice name="en-US-SaraNeural">
             Hi my name is uh Sara. I am a voice that gives good audio but incorrect word boundary results. 
          </voice>
       </speak>
    `;

    return new Promise<string[]>((resolve, reject) => {
       const onError = (err: string | Error): void => {
          synthesizer.close();
          reject(err);
       };

       const words: string[] = [];
       synthesizer.wordBoundary = (_, event) => {
          words.push(event.text);
       };
       synthesizer.speakSsmlAsync(ssml, result => {
          if (result.errorDetails) {
             onError(result.errorDetails);
          } else if (result.reason === ResultReason.SynthesizingAudioCompleted) {
             synthesizer.close();
             resolve(words);
          }
       }, onError);
    });
 }

so that await synthesizeSpeech() returns

[
   "Hi",
   "my",
   "name",
   "is",
   "rrec", // everything after where "uh" should be is garbled
   "t",
   "w",
   "rd",
   "b",
   "undar",
   " res",
   "lts. ",
   "    ",
   "     ",
   " </",
   "oice>\n   ",
   "    ",
   "</speak>",
   "",
   ""
]

The text was updated successfully, but these errors were encountered:

yulin-li · 2023-05-29T15:23:33Z

Thanks for reporting this issue. I will forward this to the service experts for further investigation

glharper assigned yulin-li May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filler words sometimes cause issues with wordBoundary results #680

Filler words sometimes cause issues with wordBoundary results #680

dwegscheidTSC commented May 22, 2023

yulin-li commented May 29, 2023

Filler words sometimes cause issues with wordBoundary results #680

Filler words sometimes cause issues with wordBoundary results #680

Comments

dwegscheidTSC commented May 22, 2023

Example code

yulin-li commented May 29, 2023