[Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not #818

ru4sam326 · 2024-04-27T04:52:53Z

What happened?

Hi Team,

I'm using JS SDK capturing the speech using SpeechSDK.AudioConfig.fromDefaultMicrophoneInput, If the teams/zoom call is going on through the desktop app, teams/zoom call other participants sounds coming from the speakers are coming through the above microphone input. I'm using 1.36.0 version.

Where as If i'm doing the same in JAVA with 1.37.0 version it is not capturing the Teams/zoom call other participants sounds coming from the speakers.

Please let me know how to resolve this in js.

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

No response

Relevant log output

Javascript code using:

const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

this.recognizer = new SpeechSDK.SpeechRecognizer(this.speechConfig, audioConfig);
    this.recognizer.sessionStarted = (s: any, e: any) => {

    }
    this.recognizer.speechStartDetected = (s: any, e: any) => {
      console.log('speechStartDetected:', s);


    }

    this.recognizer.recognizing = (s: any, e: any) => {
      this.displayText = 'You are speaking...';
    };



    this.recognizer.recognized = async (s: any, e: any) => {
      console.log('recognized:', s, e);
}


JAVA code:

AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_NONE);
		AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput(audioProcessingOptions);
		SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

		stopTranslationWithFileSemaphore = new Semaphore(0);

		speechRecognizer.recognizing.addEventListener((s, e) -> {
//			System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
		});

		speechRecognizer.recognized.addEventListener((s, e) -> {
			if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
				System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
			}
			else if (e.getResult().getReason() == ResultReason.NoMatch) {
				System.out.println("NOMATCH: Speech could not be recognized.");
			}
		});

ru4sam326 · 2024-04-29T04:55:05Z

Can u plz respond?

glharper · 2024-04-29T17:56:02Z

@ru4sam326 Thank you for using the JS Speech SDK, and writing this issue up. The Java Speech SDK includes an echo cancellation model that mitigates background noise, while the JS Speech SDK does not, which is why the discrepancy you've encountered exists. Implementation in the JS Speech SDK is TBD. A couple of options on your end:

limit audio devices to headsets
implement a "push to talk" button that mutes speaker out while on.

ru4sam326 · 2024-04-30T16:38:01Z

any example to use java api for browser side chat.
We are building a chatbot listening to realtime speech to do the analysis, we went with js which is causing the noise as you mentioned.

So now how to stream the browser stream to java side speechsdk ??

glharper · 2024-05-02T16:05:08Z

If you are implementing this chatbot on Windows, there's an "Acoustic Echo Cancellation" setting you can turn on to see if the noise for JS is mitigated, see attached picture:

For Java, if you can access the outgoing audio stream, presumably you can transform to a 16KHz 16-bit PCM stream and adapt the push stream code here to send it to the Java recognizer for recognition.

ru4sam326 · 2024-05-08T02:00:58Z

Hi Team,

Could you plz share some samples on sending the audio stream from browser to JAVA.
Will be really helpful for us. Tried some but they are not working.

Approach tried:

JS Code:

 async initRecognition() {
    const stream = await navigator.mediaDevices.getUserMedia(
      {audio:true}
    );

    const options: RecordRTC.Options = {};
    options.type = "audio";
    options.mimeType = "audio/wav";
    options.timeSlice = 3000
    options.recorderType = StereoAudioRecorder
    options.numberOfAudioChannels = 1
    options.desiredSampRate=16000
    options.sampleRate=16000
    options.bitrate=16
    options.ondataavailable = async (blob:Blob)=> this.dataavailable(blob);



    const recorder = new RecordRTCPromisesHandler(stream,options);
    recorder.startRecording()
    
      async dataavailable(blob: Blob) {
    console.log('blob',blob)
     if(this.socket.OPEN=== this.socket.readyState){
       this.socket.send(blob)
     }

JAVA Code:

public void handleBinaryMessage(WebSocketSession session, BinaryMessage  message) throws Exception {
       byte[] arr = new byte[message.getPayloadLength()];
       message.getPayload().get(arr);
       SpeechConfig speechConfig = SpeechConfig.fromSubscription("**********************", "******");
       speechConfig.setSpeechRecognitionLanguage("en-IN");

       System.out.println("Started before");
       Semaphore stopRecognitionSemaphore = new Semaphore(0);
       PushAudioInputStream pushStream = AudioInputStream.createPushStream();
       System.out.println("Started after");
      

       // Creates a speech recognizer using Push Stream as audio input.
       AudioConfig audioInput = AudioConfig.fromStreamInput(pushStream);

       SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);
           recognizer.recognized.addEventListener((s, e) -> {
               if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                   System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
               } else if (e.getResult().getReason() == ResultReason.NoMatch) {
                   System.out.println("NOMATCH: Speech could not be recognized.");
               }
           });

Thanks,
Samba

ru4sam326 · 2024-05-11T10:38:45Z

Hi @glharper

Ignore the above one.
Able to stream the audio from browser to backend still unable to cancel the Acoustic echo, Could you please suggest.

glharper · 2024-05-17T14:00:07Z

@ru4sam326 Since you're using the Java Speech SDK, this question is better asked in the native Speech SDK repo. This repo is specifically for the JavaScript Speech SDK.

ru4sam326 · 2024-05-19T03:48:24Z

Thanks @glharper.
Raised Azure-Samples/cognitive-services-speech-sdk#2381, for reference. In case someone follows.

ru4sam326 added the bug Something isn't working label Apr 27, 2024

ru4sam326 assigned glharper Apr 27, 2024

glharper closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not #818

[Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not #818

ru4sam326 commented Apr 27, 2024

ru4sam326 commented Apr 29, 2024

glharper commented Apr 29, 2024

ru4sam326 commented Apr 30, 2024

glharper commented May 2, 2024

ru4sam326 commented May 8, 2024 •

edited

ru4sam326 commented May 11, 2024

glharper commented May 17, 2024

ru4sam326 commented May 19, 2024

[Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not #818

[Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not #818

Comments

ru4sam326 commented Apr 27, 2024

What happened?

Version

What browser/platform are you seeing the problem on?

Relevant log output

ru4sam326 commented Apr 29, 2024

glharper commented Apr 29, 2024

ru4sam326 commented Apr 30, 2024

glharper commented May 2, 2024

ru4sam326 commented May 8, 2024 • edited

ru4sam326 commented May 11, 2024

glharper commented May 17, 2024

ru4sam326 commented May 19, 2024

ru4sam326 commented May 8, 2024 •

edited