Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synthesizer.speakSsmlAsync fails with mstts tags only #528

Open
gad2103 opened this issue May 1, 2022 · 13 comments
Open

synthesizer.speakSsmlAsync fails with mstts tags only #528

gad2103 opened this issue May 1, 2022 · 13 comments
Assignees
Labels
bug Something isn't working text-to-speech

Comments

@gad2103
Copy link

gad2103 commented May 1, 2022

When I use the sdk to generate speech it works fine with the following ssml:

let ssmlThatWorks = "<speak version=\"1.0\" xmlns=\"https://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">\n" +
        "  <voice name=\"en-US-JennyNeural\">\n" +
        toContentString(parsedXml) + " \n" +
        "  </voice>\n" +
        "</speak>"

however, if i use the variant that includes the mstts tags i get the following,
Error Message

SpeechSynthesisResult {
  privResultId: '147A8275E1E34BE3AD49F7892846A194',
  privReason: 1,
  privErrorDetails: "Unexpected TextToSpeech.Protocols.Universal.Messages.AudioMetadataResponseMessage' message for Reque websocket error code: 1002",
  privProperties: PropertyCollection {
    privKeys: [ 'CancellationErrorCode' ],
    privValues: [ 'ConnectionFailure' ]
  },
  privAudioData: undefined,
  privAudioDuration: undefined
}

the ssml that reproduces that error consistently looks like, ( replacing the bracketed content with any public audio file)

let ssmlThatProducesErrors = "<speak version=\"1.0\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xml:lang=\"en-US\">\n" +
            "<mstts:backgroundaudio src=\"https://[PUT PUBLIC AUDIO FILE HERE TO REPRODUCE].mp3\" volume=\"0.7\" fadein=\"0\" fadeout=\"0\" />  <voice name=\"en-US-JennyNeural\">\n" +
            "Hello  \n" +
            "  </voice>\n" +
            "</speak>"

If I test the bad ssml in the browser on the official azure tts site, everything is generated correctly...

I would love to be able to use background music in my application!

Other things i tried:

  • generating a new api key
  • upgrading my account to pay as you go

Any help would be greatly appreciated! Thanks in advance.

@glharper
Copy link
Member

glharper commented May 5, 2022

@yulin-li Is there a service contact we can pass this to?

@yulin-li yulin-li self-assigned this May 11, 2022
@yulin-li
Copy link
Contributor

Hi @gad2103, I still cannot repro your error, could you share the resultId with us? We can check at service side.

@gad2103
Copy link
Author

gad2103 commented May 11, 2022

@yulin-li can you share how you're trying to repro? is the result id not in the original error message i posted ☝️

SpeechSynthesisResult {
  privResultId: '147A8275E1E34BE3AD49F7892846A194',
  privReason: 1,
  privErrorDetails: "Unexpected TextToSpeech.Protocols.Universal.Messages.AudioMetadataResponseMessage' message for Reque websocket error code: 1002",
  privProperties: PropertyCollection {
    privKeys: [ 'CancellationErrorCode' ],
    privValues: [ 'ConnectionFailure' ]
  },
  privAudioData: undefined,
  privAudioDuration: undefined
}

if no, where do i find the correct id?

@gad2103
Copy link
Author

gad2103 commented May 11, 2022

looks possibly related Azure-Samples/cognitive-services-speech-sdk#1492

@johnmalatras
Copy link

I'm also seeing issues with background audio (that is my issue @gad2103 linked above). I've spent several hours debugging and have yet to get it to work - unfortunately this is necessary for our use case

@yulin-li
Copy link
Contributor

yulin-li commented May 12, 2022

Hi @gad2103, sorry for missing the result id in your error message.

I can repro the issue now, if I set the audio output format to Raw8Khz8BitMonoMULaw as you set. I report this issue to service guys and they will take a look.

As a workaround, could you try to use formats other than 8kHz ones?

@johnmalatras
Copy link

For what it's worth I'm also seeing the issue with audio-48khz-96kbitrate-mono-mp3

@gad2103
Copy link
Author

gad2103 commented May 12, 2022

Hi @gad2103, sorry for missing the result id in your error message.

I can repro the issue now, if I set the audio output format to Raw8Khz8BitMonoMULaw as you set. I report this issue to service guys and they will take a look.

As a workaround, could you try to use formats other than 8kHz ones?

@yulin-li i can try to see if that resolves the error, however that's the audio format i need for my application.

@yulin-li
Copy link
Contributor

I understand, the service guys are working on this bug

@yulin-li yulin-li added bug Something isn't working text-to-speech labels May 13, 2022
@gad2103
Copy link
Author

gad2103 commented May 24, 2022

I understand, the service guys are working on this bug

just checking in on the status here

@johnmalatras
Copy link

Also wanting to follow up on this. To add another data point - long form synthesis fails entirely when I include the background audio tag.

@ciaran-parloa
Copy link

We are also affected by this issue, any updates @yulin-li ?

@sebvieux
Copy link

Hi, any updates on this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working text-to-speech
Projects
None yet
Development

No branches or pull requests

7 participants
@gad2103 @johnmalatras @yulin-li @sebvieux @glharper @ciaran-parloa and others