Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: synthesizer.speakTextAsync not working properly with Firefox or on iOS (Safari) with continuous text input #767

Open
vpetit-reimagine opened this issue Nov 22, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@vpetit-reimagine
Copy link

What happened?

Hi, in a current project we are trying to do, we use the latest cognitive-services-speech-sdk library to transform some text to audible audio.

Context:

  • We have a div for which we append text as it arrives from a streamed data source (coming from a backend server), to which the delay is independent of us. The goal is to use the Microsoft Cognitive Speech SDK to transform that text to speech as it arrives (in fact, the audio should be queued and processed sequentially until the text is completely processed).

  • The div is observed by a Mutation Observer, calling the synthesizer.speakTextAsync() method on the newly provided text.

Behaviour:
On Chromium and Edge browsers, using the synthesizer without closing it/disposing once the audio is completed works as it will be used automatically by the observer, producing the expected result (the audio is processed sequentially until the text is fully processed).

However, on Safari and Firefox, the audio is not provided to the browser at all as the synthesizer isn't closed (this is probably expected as it is impossible for those browsers to process streamed audio directly). However, as mentionned above, we have to use a single synthesizer as it is required that the audio is processed sequentially (using multiple synthesizer will process the text in parallel, creating audio output in parallel as well). We thought about using an array where we append the audio as it is completed by the synthesizer, without any success.

The synthesizer SHOULD remain open/available until we decide that it can be disposed (basically when the end user leaves the current page), as the text the user on the page receives is unknown upfront.
We can't create/open new synthesizers in parallel as they would generate the audio output at the same time and it not be what we expect.

Do you have any suggestion on how we could fix this? Or would it be possible to tell the synthesizer to "flush" the current result.audioData to the browser without closing it?

(Attached, you will find the modified sample file that mimics the behavior we want to achieve. If you test it on Chrome/Edge, it will work perfectly fine, but as soon as you test it on Firefox/iOS(Safari), the audio does not play).

Best regards,
Vincent.
sdk-test.zip

Version

1.33.0 (Latest)

What browser/platform are you seeing the problem on?

Firefox, Safari

Relevant log output

No response

@vpetit-reimagine vpetit-reimagine added the bug Something isn't working label Nov 22, 2023
@glharper glharper assigned yulin-li and unassigned glharper Nov 22, 2023
@1014156094
Copy link

+1

@k-daimon
Copy link

Which combination of browser and environments have you tried this code?
Chrome/Edge/FireFox (PC), Chrome/Safari/FireFox (Mac), Chrome (Android), Safari/Chrome (iOS) and so on.

@vpetit-reimagine
Copy link
Author

We tested it on all the different possible combinations. The reason why it didn't work as is on Safari/iPhone was linked to how the audio is created/played on that environment (cannot be autoplayed, and doesn't use the same audio API than the other browsers).

We had to find another way to make it work in our use case as this was not possible to rely on the example

@k-daimon
Copy link

k-daimon commented Mar 4, 2024

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API).
The SDK default is set to use MSE. The alternative way would be to use Web Audio API.
WebAudio availability: https://caniuse.com/audio-api
MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

@nosisky
Copy link

nosisky commented Mar 30, 2024

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.

@localhostd3veloper
Copy link

This is because Safari/iPhone(iOS) can not handle MSE API (Media Source Extensions API). The SDK default is set to use MSE. The alternative way would be to use Web Audio API. WebAudio availability: https://caniuse.com/audio-api MSE availability: https://caniuse.com/mediasource

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';

// WebAudio API
let audioContext = new AudioContext();
let bufferSource = audioContext.createBufferSource();

// Setup Speech SDK
const speechConfig = sdk.SpeechConfig.fromSubscription("******************", "******");
let audioStream = sdk.PullAudioOutputStream.create();
const audioConfig = sdk.AudioConfig.fromStreamOutput(audioStream);
let synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// Text input
let text = 'Lorem ipsum dolor sit amet ..... ';

synthesizer.speakTextAsync(text, result => {
  audioContext.decodeAudioData(result.audioData, buffer => {
    bufferSource.buffer = buffer;
    bufferSource.connect(audioContext.destination);
    bufferSource.start(0);
  });
}, error => {
  // Some error process
});

Thank you this worked but I ran into an issue where the audio is very low when the phone microphone is also in use, this happens only in IOS.

Have you been able to resolve that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants