[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

vpetit-reimagine · 2023-11-22T15:52:44Z

What happened?

Hi, in a current project we are trying to do, we use the latest cognitive-services-speech-sdk library to transform some text to audible audio.

Context:

We have a div for which we append text as it arrives from a streamed data source (coming from a backend server), to which the delay is independent of us. The goal is to use the Microsoft Cognitive Speech SDK to transform that text to speech as it arrives (in fact, the audio should be queued and processed sequentially until the text is completely processed).
The div is observed by a Mutation Observer, calling the synthesizer.speakTextAsync() method on the newly provided text.

Behaviour:
On Chromium and Edge browsers, using the synthesizer without closing it/disposing once the audio is completed works as it will be used automatically by the observer, producing the expected result (the audio is processed sequentially until the text is fully processed).

However, on Safari and Firefox, the audio is not provided to the browser at all as the synthesizer isn't closed (this is probably expected as it is impossible for those browsers to process streamed audio directly). However, as mentionned above, we have to use a single synthesizer as it is required that the audio is processed sequentially (using multiple synthesizer will process the text in parallel, creating audio output in parallel as well). We thought about using an array where we append the audio as it is completed by the synthesizer, without any success.

The synthesizer SHOULD remain open/available until we decide that it can be disposed (basically when the end user leaves the current page), as the text the user on the page receives is unknown upfront.
We can't create/open new synthesizers in parallel as they would generate the audio output at the same time and it not be what we expect.

Do you have any suggestion on how we could fix this? Or would it be possible to tell the synthesizer to "flush" the current result.audioData to the browser without closing it?

(Attached, you will find the modified sample file that mimics the behavior we want to achieve. If you test it on Chrome/Edge, it will work perfectly fine, but as soon as you test it on Firefox/iOS(Safari), the audio does not play).

Best regards,
Vincent.
sdk-test.zip

Version

1.33.0 (Latest)

What browser/platform are you seeing the problem on?

Firefox, Safari

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

1014156094 · 2023-12-06T06:51:39Z

+1

k-daimon · 2024-02-23T09:38:27Z

Which combination of browser and environments have you tried this code?
Chrome/Edge/FireFox (PC), Chrome/Safari/FireFox (Mac), Chrome (Android), Safari/Chrome (iOS) and so on.

vpetit-reimagine · 2024-02-29T14:43:58Z

We tested it on all the different possible combinations. The reason why it didn't work as is on Safari/iPhone was linked to how the audio is created/played on that environment (cannot be autoplayed, and doesn't use the same audio API than the other browsers).

We had to find another way to make it work in our use case as this was not possible to rely on the example

k-daimon · 2024-03-04T07:04:17Z

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API).
The SDK default is set to use MSE. The alternative way would be to use Web Audio API.
WebAudio availability: https://caniuse.com/audio-api
MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

nosisky · 2024-03-30T21:15:47Z

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.

localhostd3veloper · 2024-05-25T05:58:12Z

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.

Have you been able to resolve that?

vpetit-reimagine added the bug Something isn't working label Nov 22, 2023

vpetit-reimagine assigned glharper Nov 22, 2023

glharper assigned yulin-li and unassigned glharper Nov 22, 2023

k-daimon mentioned this issue Mar 4, 2024

TTS Javascript + Safari iOS = "Format MP3 could not be played by MSE, streaming playback is not enabled" #297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

vpetit-reimagine commented Nov 22, 2023

1014156094 commented Dec 6, 2023

k-daimon commented Feb 23, 2024

vpetit-reimagine commented Feb 29, 2024

k-daimon commented Mar 4, 2024 •

edited

nosisky commented Mar 30, 2024

localhostd3veloper commented May 25, 2024

[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

Comments

vpetit-reimagine commented Nov 22, 2023

What happened?

Version

What browser/platform are you seeing the problem on?

Relevant log output

1014156094 commented Dec 6, 2023

k-daimon commented Feb 23, 2024

vpetit-reimagine commented Feb 29, 2024

k-daimon commented Mar 4, 2024 • edited

nosisky commented Mar 30, 2024

localhostd3veloper commented May 25, 2024

k-daimon commented Mar 4, 2024 •

edited