How do I get the speaker's name from SpeechSynthesizer events? #819

sanonz · 2024-04-29T03:49:06Z

What happened?

Codes:

const sdk = require("microsoft-cognitiveservices-speech-sdk");

const speechConfig = sdk.SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
speechConfig.enableAudioLogging();

var audioFile = "YourAudioFile.wav";
const audioConfig = sdk.AudioConfig.fromAudioFileOutput(audioFile);

const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

const ssml = `
<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts'> 
    <voice name='en-US-AvaMultilingualNeural'>
        The rainbow has seven colors
    </voice>
    <voice name='en-US-JennyNeural'>
        What's the weather like?
    </voice>
</speak>
`;

synthesizer.wordBoundary = (s, e) => {
    // Word, Punctuation, or Sentence
    var str = `WordBoundary event: \
        \r\n\tBoundaryType: ${e.boundaryType} \
        \r\n\tAudioOffset: ${(e.audioOffset + 5000) / 10000}ms \
        \r\n\tDuration: ${e.duration} \
        \r\n\tText: \"${e.text}\" \
        \r\n\tTextOffset: ${e.textOffset} \
        \r\n\tWordLength: ${e.wordLength}`;
    console.log(str);
};

synthesizer.speakSsmlAsync(ssml,
    result => {
        if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
            console.log("Speech synthesis succeeded.");
        } else {
            console.error("Speech synthesis failed:", result.errorDetails);
        }
        synthesizer.close();
    },
    error => {
        console.error("Speech synthesis error:", error);
    }
);

Results:

WordBoundary event:             
	BoundaryType: SentenceBoundary             
	AudioOffset: 50.5ms             
	Duration: 23750000             
	Text: "The rainbow has seven colors"             
	TextOffset: 192             
	WordLength: 28

WordBoundary event:             
	BoundaryType: SentenceBoundary             
	AudioOffset: 2475.5ms             
	Duration: 19750000             
	Text: "What's the weather like?"             
	TextOffset: 291             
	WordLength: 24

How do I get the speaker names of en-US-AvaMultilingualNeural and en-US-JennyNeural from event wordBoundary?
Such as:

synthesizer.wordBoundary = (s, e) => {
    console.log('Speaker Name:', e.speakerName);
};

Or add a new event:

synthesizer.tagReached = (s, e) => {
    console.log('Tag Name:', e.tag); // voice
    console.log('Speaker Name:', e.speakerName); // en-US-AvaMultilingualNeural or en-US
};

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

sanonz added the bug Something isn't working label Apr 29, 2024

sanonz assigned glharper Apr 29, 2024

glharper assigned yulin-li and unassigned glharper Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I get the speaker's name from SpeechSynthesizer events? #819

How do I get the speaker's name from SpeechSynthesizer events? #819

sanonz commented Apr 29, 2024

How do I get the speaker's name from SpeechSynthesizer events? #819

How do I get the speaker's name from SpeechSynthesizer events? #819

Comments

sanonz commented Apr 29, 2024

What happened?

Version

What browser/platform are you seeing the problem on?

Relevant log output