Transcribe OGG(OPUS) format #351

sutekyara · 2021-03-30T11:18:32Z

I'm expecting a feature for transcribing OGG/OPUS audio data.
Current SDK version seems not to be adapted for that.

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/releasenotes
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-csharp

or, is there any solutions with current SDK version?
here is my platfrom
Windows10
Electron@6.3.3 (electron can use the browser js feature, Node.js, and wasm)

OGG/OPUS is very good for network. This feature influences whether we choose Azure for transcribing.

thank you

rhurey · 2021-03-31T17:36:43Z

Currently there isn't support for compressing the JavaScript SDK->Service audio stream.

We have an item in our backlog for it, but it hasn't been scheduled.

yoshigev · 2021-04-21T09:25:53Z

Hi @rhurey,

We are also interested with another missing coder: MULAW.
From my understanding, the SDK itself is not the one that should perform the compression. The SDK should only forward the compressed stream that it receives from the client code to the cloud STT service (on our case, the audio stream is received from a telephony integration).
Given so, the missing part in the SDK is only to allow the configuration of a compressed coder of the input stream, having this configuration be passed to the cloud STT service.

Is it more complicated than I just described?

Thanks,
Yehoshua

rssfrncs · 2021-06-28T07:45:25Z

This would be a huge win to reduce network usage for the JS SDK which is often used client-side. Any estimates when it can be expected? 🙏🏼

yoshigev · 2021-10-31T14:13:36Z

For your information, there is a workaround that uses a private API of the SDK.

Sample code:

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
import {
    AudioStreamFormatImpl,
    AudioFormatTag
} from 'microsoft-cognitiveservices-speech-sdk/distrib/lib/src/sdk/Audio/AudioStreamFormat';

function startRecognition() {
    // Note: AudioStreamFormatImpl is private (not exposed on Azure SDK).
    const audioFormat = new AudioStreamFormatImpl(8000, 8, 1, AudioFormatTag.MuLaw);

    const pushStream = sdk.AudioInputStream.createPushStream(audioFormat);
    const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
    
    const speechConfig = sdk.SpeechConfig.fromSubscription(/*...*/);
    const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    // ...
}

We have tested it with MuLaw but I guess it should also work for OPUS.

glharper added the enhancement New feature or request label Mar 30, 2021

orgads mentioned this issue Nov 4, 2021

Support different coders for input audio stream #452

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe OGG(OPUS) format #351

Transcribe OGG(OPUS) format #351

sutekyara commented Mar 30, 2021

rhurey commented Mar 31, 2021

yoshigev commented Apr 21, 2021

rssfrncs commented Jun 28, 2021 •

edited

yoshigev commented Oct 31, 2021

Transcribe OGG(OPUS) format #351

Transcribe OGG(OPUS) format #351

Comments

sutekyara commented Mar 30, 2021

rhurey commented Mar 31, 2021

yoshigev commented Apr 21, 2021

rssfrncs commented Jun 28, 2021 • edited

yoshigev commented Oct 31, 2021

rssfrncs commented Jun 28, 2021 •

edited