[feat(whisper)] Add recognize_whisper #625

joy-void-joy · 2022-09-28T20:24:53Z

Solve #624 by adding recognize_whisper to Recognizer.

This works by writing in a tempfile, due to the format whisper asks for.

Usage example:

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

print("Got it, now to recognize it...")

try:
    print("Whisper thinks you said " + r.recognize_whisper(audio, language='english'))
except sr.UnknownValueError:
    print("Whisper could not understand audio")
except sr.RequestError as e:
    print("Whisper error; {0}".format(e))

Add a recognizer for https://github.com/openai/whisper

ftnext · 2022-09-28T23:49:56Z

Thanks!
I'll check this later.

ftnext · 2022-11-06T02:47:53Z

tests/test_recognition.py

+    def test_whisper_chinese(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_whisper(audio, model="small", language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")


model="small" is required.

✍️ When I specify model="base" (the default value), this test failed due to wrong recognition.

====================================================================== FAIL: test_whisper_chinese (test_recognition.TestRecognition) ---------------------------------------------------------------------- Traceback (most recent call last): File "/.../speech_recognition-pr/tests/test_recognition.py", line 98, in test_whisper_chinese self.assertEqual(r.recognize_whisper(audio, language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳") AssertionError: "�<|translate|> I'm sorry." != '砸自己的腳' - �<|translate|> I'm sorry. + 砸自己的腳

ftnext · 2022-11-06T02:55:20Z

examples/microphone_recognition.py

+
+# recognize speech using whisper
+try:
+    print("Whisper thinks you said " + r.recognize_whisper(audio, language="english"))


It works!🎉 Thanks.

$ python examples/microphone_recognition.py Say something! /.../speech_recognition-pr/venv/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") Whisper thinks you said Hello whisper

ftnext

Thanks a lot for your great PR.
Whisper works with SpeechRecognition😃

I am very sorry for my too late review.
I would like to merge this once only the MUST comment are addressed.

@joy-void-joy Can you respond to that comment?
If it's difficult for you, that is no problem.
I'll fix the MUST comment and merge this PR this night (JST)

Let's discuss comments other than MUST after merge.

README.rst

ftnext · 2022-11-06T03:46:37Z

speech_recognition/__init__.py

+                **transcribe_options
+            )
+
+        if show_dict:


nits: I found Conditional expressions x if C else y make here more concisely, but it depends on my preferences.

speech_recognition/__init__.py

ftnext · 2022-11-06T04:11:26Z

speech_recognition/__init__.py

+        assert isinstance(audio_data, AudioData), "Data must be audio data"
+        import whisper
+
+        if load_options or not hasattr(self, "whisper_model") or self.whisper_model.get(model) is None:


✍️memo: or is short-circuit.

https://docs.python.org/3/reference/expressions.html#boolean-operations

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

When you passed not empty dict as load_options, load model

When load_options is None or {} and the instance does not have whisper_model attribute, then load model

When load_options is None or {} and the instance have whisper_model attribute but the name model does not included, then load model

ftnext · 2022-11-06T16:35:02Z

It seems unit tests failed because of not pip installing whisper.
I'll fix the unittest workflow file to install.

ModuleNotFoundError: No module named 'whisper'

https://github.com/Uberi/speech_recognition/actions/runs/3405126020/jobs/5662866082

ftnext · 2022-11-06T16:46:42Z

FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

https://github.com/Uberi/speech_recognition/actions/runs/3405145980/jobs/5662903599

It seems that ffmpeg are needed to install in the ubuntu-latest runner.
FYI: https://github.com/actions/runner-images/tree/main/images/linux

joy-void-joy · 2022-11-09T14:09:40Z

Thanks for the review, do you need anything helped with? I think the fp16 bug should be fixed in the new version of whisper, I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

ftnext · 2022-11-09T17:11:39Z

Thanks for your reply.

I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

I agree.
I already merged #630 fp16=torch.cuda.is_available() and I believe the current implementation is similar to your idea.

If you have an idea to make it even slightly better, please send us a pull request.
Pull requests are always welcome!

[feat(whisper)] Add recognize_whisper

282402b

Add a recognizer for https://github.com/openai/whisper

ftnext self-assigned this Sep 28, 2022

ftnext reviewed Nov 6, 2022

View reviewed changes

ftnext requested changes Nov 6, 2022

View reviewed changes

ftnext reviewed Nov 6, 2022

View reviewed changes

ftnext mentioned this pull request Nov 6, 2022

Support pocketsphinx 5.0.0 #626

Open

5 tasks

Fix inline code markup

aa09576

ftnext added 2 commits November 7, 2022 01:37

Install whisper before running tests

65e20dd

Merge branch 'master' into whisper_integration

68b2438

Install ffmpeg to run whisper in unit tests

b3665f4

ftnext merged commit 7461563 into Uberi:master Nov 6, 2022

ftnext added the whisper Features related to Whisper label Nov 7, 2022

This was referenced Nov 7, 2022

Add audio_transcribe example for whisper #628

Open

Add a recognizer for whisper #624

Closed

whisper: address the warning "FP16 is not supported on CPU; using FP32 instead" #629

Closed

ftnext mentioned this pull request Nov 21, 2022

In whisper implementation, tempfile is not required; In-memory stream can be used instead #633

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat(whisper)] Add recognize_whisper #625

[feat(whisper)] Add recognize_whisper #625

joy-void-joy commented Sep 28, 2022 •

edited

ftnext commented Sep 28, 2022

ftnext Nov 6, 2022

ftnext Nov 6, 2022

ftnext left a comment

ftnext Nov 6, 2022

ftnext Nov 6, 2022 •

edited

ftnext commented Nov 6, 2022 •

edited

ftnext commented Nov 6, 2022

joy-void-joy commented Nov 9, 2022

ftnext commented Nov 9, 2022

[feat(whisper)] Add recognize_whisper #625

[feat(whisper)] Add recognize_whisper #625

Conversation

joy-void-joy commented Sep 28, 2022 • edited

ftnext commented Sep 28, 2022

ftnext Nov 6, 2022

Choose a reason for hiding this comment

ftnext Nov 6, 2022

Choose a reason for hiding this comment

ftnext left a comment

Choose a reason for hiding this comment

ftnext Nov 6, 2022

Choose a reason for hiding this comment

ftnext Nov 6, 2022 • edited

Choose a reason for hiding this comment

ftnext commented Nov 6, 2022 • edited

ftnext commented Nov 6, 2022

joy-void-joy commented Nov 9, 2022

ftnext commented Nov 9, 2022

joy-void-joy commented Sep 28, 2022 •

edited

ftnext Nov 6, 2022 •

edited

ftnext commented Nov 6, 2022 •

edited