Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Real-Time Speech-to-Text Lag and Synchronization Problems on Low-Power Devices #806

Open
hardik-veloxcore opened this issue Apr 2, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@hardik-veloxcore
Copy link

hardik-veloxcore commented Apr 2, 2024

I am facing issues with real-time speech-to-text on low-power devices. It works fine on devices with a good processor.

But when using Google Chrome with a CPU performance 6x lower than normal, the speech recognition lags and does not provide real-time text from the audio input.
Screenshot 2024-04-02 181525

image

I have to click on 'stop listening' to get the text. This also affects the frames per second (fps) as the web socket continuously processes input from the audio.

The same issue occurs with real-time speech-to-text on https://speech.microsoft.com.

Version

1.33.1 (Default)

What browser/platform are you seeing the problem on?

Chrome

Relevant log output

X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:turn.start Content-Type:application/json; charset=utf-8 { "context": { "serviceTag": "4247a50e319645de98ab0f2891b2af9b" } }	186	
18:22:22.659
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.startDetected Content-Type:application/json; charset=utf-8 {"Offset":11200000}	140	
18:22:26.124
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"8d17fe9b2e1c4e4a8c007dc36e5a25c7","Text":"are you","Offset":11200000,"Duration":2800000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0}	264	
18:22:26.211
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"0bd4832a28d840bf86640987a32ba715","Text":"are you rating","Offset":8400000,"Duration":6800000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0}	270	
18:22:26.374
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"f6a2b00ab81241a7b8a45f83e587c763","Text":"are you rating my point","Offset":8400000,"Duration":12000000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0}	280	
18:22:26.555
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"d82b5381f072432280caeb78ea13a65a","Text":"are you getting opponent","Offset":8400000,"Duration":12000000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0}	281	
18:22:26.719
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.phrase Content-Type:application/json; charset=utf-8 {"Id":"813b56e3fa8a446ba38f715cc63cbca7","RecognitionStatus":"Success","Offset":8400000,"Duration":12000000,"Channel":0,"DisplayText":"Are you getting opponent?","NBest":[{"Confidence":0.40714973,"Lexical":"are you getting opponent","ITN":"are you getting opponent","MaskedITN":"are you getting opponent","Display":"Are you getting opponent?","Words":[{"Word":"are","Offset":8400000,"Duration":1200000},{"Word":"you","Offset":9600000,"Duration":1200000},{"Word":"getting","Offset":11200000,"Duration":4000000},{"Word":"opponent","Offset":15600000,"Duration":4800000}]},{"Confidence":0.3220464,"Lexical":"are you rating my point","ITN":"are you rating my point","MaskedITN":"are you rating my point","Display":"are you rating my point","Words":[{"Word":"are","Offset":8400000,"Duration":1600000},{"Word":"you","Offset":10400000,"Duration":1200000},{"Word":"rating","Offset":11600000,"Duration":3600000},{"Word":"my","Offset":15200000,"Duration":1200000},{"Word":"point","Offset":16400000,"Duration":4000000}]}]}	1124	
18:22:26.979
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.endDetected Content-Type:application/json; charset=utf-8 {"Offset":36000000}	138	
18:22:27.367
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.phrase Content-Type:application/json; charset=utf-8 {"Id":"60b6172d4347483a999ebb01e3594763","RecognitionStatus":"EndOfDictation","Offset":36000000,"Duration":0,"Channel":0}	235	
18:22:27.409
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:turn.end Content-Type:application/json; charset=utf-8 { "StatusCode": "Success" }
@hardik-veloxcore hardik-veloxcore added the bug Something isn't working label Apr 2, 2024
@glharper
Copy link
Member

glharper commented Apr 3, 2024

@hardik-veloxcore Thank you for using Speech SDK, and writing this issue up.
Using the attached sample html file, I ran a continuous recognition session on this test file with the CPU at 6x slowdown, but saw no issue with the recognition results returned.

Would you mind standing up a web page that demonstrates this issue at 6x slowdown with that file?
index.html.zip

@hardik-veloxcore
Copy link
Author

Speech.Studio.-.Real-time.speech.to.text.Issue.mp4

I am attaching a video of the issue here so you can see what the problem is. The same issue happened in my application too.

@ralph-msft
Copy link
Contributor

[B-7141286]

@Hardik-Rana
Copy link

Any updates on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants