Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not #818

Closed
ru4sam326 opened this issue Apr 27, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@ru4sam326
Copy link

What happened?

Hi Team,

I'm using JS SDK capturing the speech using SpeechSDK.AudioConfig.fromDefaultMicrophoneInput, If the teams/zoom call is going on through the desktop app, teams/zoom call other participants sounds coming from the speakers are coming through the above microphone input. I'm using 1.36.0 version.

Where as If i'm doing the same in JAVA with 1.37.0 version it is not capturing the Teams/zoom call other participants sounds coming from the speakers.

Please let me know how to resolve this in js.

Version

1.36.0 (Latest)

What browser/platform are you seeing the problem on?

No response

Relevant log output

Javascript code using:

const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

this.recognizer = new SpeechSDK.SpeechRecognizer(this.speechConfig, audioConfig);
    this.recognizer.sessionStarted = (s: any, e: any) => {

    }
    this.recognizer.speechStartDetected = (s: any, e: any) => {
      console.log('speechStartDetected:', s);


    }

    this.recognizer.recognizing = (s: any, e: any) => {
      this.displayText = 'You are speaking...';
    };



    this.recognizer.recognized = async (s: any, e: any) => {
      console.log('recognized:', s, e);
}


JAVA code:

AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.create(AudioProcessingConstants.AUDIO_INPUT_PROCESSING_NONE);
		AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput(audioProcessingOptions);
		SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

		stopTranslationWithFileSemaphore = new Semaphore(0);

		speechRecognizer.recognizing.addEventListener((s, e) -> {
//			System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
		});

		speechRecognizer.recognized.addEventListener((s, e) -> {
			if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
				System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
			}
			else if (e.getResult().getReason() == ResultReason.NoMatch) {
				System.out.println("NOMATCH: Speech could not be recognized.");
			}
		});
@ru4sam326 ru4sam326 added the bug Something isn't working label Apr 27, 2024
@ru4sam326
Copy link
Author

Can u plz respond?

@glharper
Copy link
Member

@ru4sam326 Thank you for using the JS Speech SDK, and writing this issue up. The Java Speech SDK includes an echo cancellation model that mitigates background noise, while the JS Speech SDK does not, which is why the discrepancy you've encountered exists. Implementation in the JS Speech SDK is TBD. A couple of options on your end:

  • limit audio devices to headsets
  • implement a "push to talk" button that mutes speaker out while on.

@ru4sam326
Copy link
Author

any example to use java api for browser side chat.
We are building a chatbot listening to realtime speech to do the analysis, we went with js which is causing the noise as you mentioned.

So now how to stream the browser stream to java side speechsdk ??

@glharper
Copy link
Member

glharper commented May 2, 2024

If you are implementing this chatbot on Windows, there's an "Acoustic Echo Cancellation" setting you can turn on to see if the noise for JS is mitigated, see attached picture:
Image 5-2-24 at 12 02 PM

For Java, if you can access the outgoing audio stream, presumably you can transform to a 16KHz 16-bit PCM stream and adapt the push stream code here to send it to the Java recognizer for recognition.

@ru4sam326
Copy link
Author

ru4sam326 commented May 8, 2024

Hi Team,

Could you plz share some samples on sending the audio stream from browser to JAVA.
Will be really helpful for us. Tried some but they are not working.

Approach tried:

JS Code:

 async initRecognition() {
    const stream = await navigator.mediaDevices.getUserMedia(
      {audio:true}
    );

    const options: RecordRTC.Options = {};
    options.type = "audio";
    options.mimeType = "audio/wav";
    options.timeSlice = 3000
    options.recorderType = StereoAudioRecorder
    options.numberOfAudioChannels = 1
    options.desiredSampRate=16000
    options.sampleRate=16000
    options.bitrate=16
    options.ondataavailable = async (blob:Blob)=> this.dataavailable(blob);



    const recorder = new RecordRTCPromisesHandler(stream,options);
    recorder.startRecording()
    
      async dataavailable(blob: Blob) {
    console.log('blob',blob)
     if(this.socket.OPEN=== this.socket.readyState){
       this.socket.send(blob)
     }

JAVA Code:

public void handleBinaryMessage(WebSocketSession session, BinaryMessage  message) throws Exception {
       byte[] arr = new byte[message.getPayloadLength()];
       message.getPayload().get(arr);
       SpeechConfig speechConfig = SpeechConfig.fromSubscription("**********************", "******");
       speechConfig.setSpeechRecognitionLanguage("en-IN");

       System.out.println("Started before");
       Semaphore stopRecognitionSemaphore = new Semaphore(0);
       PushAudioInputStream pushStream = AudioInputStream.createPushStream();
       System.out.println("Started after");
      

       // Creates a speech recognizer using Push Stream as audio input.
       AudioConfig audioInput = AudioConfig.fromStreamInput(pushStream);

       SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioInput);
           recognizer.recognized.addEventListener((s, e) -> {
               if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                   System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
               } else if (e.getResult().getReason() == ResultReason.NoMatch) {
                   System.out.println("NOMATCH: Speech could not be recognized.");
               }
           });

Thanks,
Samba

@ru4sam326
Copy link
Author

Hi @glharper

Ignore the above one.
Able to stream the audio from browser to backend still unable to cancel the Acoustic echo, Could you please suggest.

@glharper
Copy link
Member

@ru4sam326 Since you're using the Java Speech SDK, this question is better asked in the native Speech SDK repo. This repo is specifically for the JavaScript Speech SDK.

@ru4sam326
Copy link
Author

Thanks @glharper.
Raised Azure-Samples/cognitive-services-speech-sdk#2381, for reference. In case someone follows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants