Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribe OGG(OPUS) format #351

Open
sutekyara opened this issue Mar 30, 2021 · 4 comments
Open

Transcribe OGG(OPUS) format #351

sutekyara opened this issue Mar 30, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@sutekyara
Copy link

I'm expecting a feature for transcribing OGG/OPUS audio data.
Current SDK version seems not to be adapted for that.

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/releasenotes
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-csharp

or, is there any solutions with current SDK version?
here is my platfrom
Windows10
Electron@6.3.3 (electron can use the browser js feature, Node.js, and wasm)

OGG/OPUS is very good for network. This feature influences whether we choose Azure for transcribing.

thank you

@glharper glharper added the enhancement New feature or request label Mar 30, 2021
@rhurey
Copy link
Member

rhurey commented Mar 31, 2021

Currently there isn't support for compressing the JavaScript SDK->Service audio stream.

We have an item in our backlog for it, but it hasn't been scheduled.

@yoshigev
Copy link

Hi @rhurey,

We are also interested with another missing coder: MULAW.
From my understanding, the SDK itself is not the one that should perform the compression. The SDK should only forward the compressed stream that it receives from the client code to the cloud STT service (on our case, the audio stream is received from a telephony integration).
Given so, the missing part in the SDK is only to allow the configuration of a compressed coder of the input stream, having this configuration be passed to the cloud STT service.

Is it more complicated than I just described?

Thanks,
Yehoshua

@rssfrncs
Copy link

rssfrncs commented Jun 28, 2021

This would be a huge win to reduce network usage for the JS SDK which is often used client-side. Any estimates when it can be expected? 🙏🏼

@yoshigev
Copy link

For your information, there is a workaround that uses a private API of the SDK.

Sample code:

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
import {
    AudioStreamFormatImpl,
    AudioFormatTag
} from 'microsoft-cognitiveservices-speech-sdk/distrib/lib/src/sdk/Audio/AudioStreamFormat';

function startRecognition() {
    // Note: AudioStreamFormatImpl is private (not exposed on Azure SDK).
    const audioFormat = new AudioStreamFormatImpl(8000, 8, 1, AudioFormatTag.MuLaw);

    const pushStream = sdk.AudioInputStream.createPushStream(audioFormat);
    const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
    
    const speechConfig = sdk.SpeechConfig.fromSubscription(/*...*/);
    const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    // ...
}

We have tested it with MuLaw but I guess it should also work for OPUS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants
@yoshigev @rssfrncs @rhurey @sutekyara @glharper and others