Incorrect transcription #685

hamsipower · 2023-01-10T14:23:55Z

System Info

Name: optimum
Version: 1.6.1
Google Colab

Who can help?

@JingyaHuang, @echarlaix

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Expected behavior

The text should be longer, it's a 30 second clip with a lot of talking and the language should be Turkish. Tried running with #420 .

The text was updated successfully, but these errors were encountered:

fxmarty · 2023-01-10T17:38:11Z

Hi @hamsipower , thanks for the report! Did you try with the original PyTorch model from Transformers? If the output is different it's indeed a bug, we'll have a loop asap.

hamsipower · 2023-01-10T19:13:07Z

@fxmarty , tried installing directly from github but no evail. :(
Considered using an earlier version of optimum, but the ONNX support for Seq2Seq is pretty new, so...

Edit; Ah, also I found out that if I don't include "language='tr'" field to the model.config in the PyTorch version, the outputs of ONNX and PyTorch are not exactly the same but somewhat similar. Not sure if it's related but I hope it helps.

fxmarty · 2023-01-11T16:10:49Z

Thank you! Could you share the collab or the code copy/pastable so that I have a look?

hamsipower · 2023-01-11T16:43:09Z

Here you go; Colab.
Sample audio file.

Edit; The pipeline seems to transcribe English audio fine, but it tries tro translate to English when non-English audio is given. Also, unlike whisper library, the pipeline can't translate/transcribe audio files longer than 30 seconds, it would be really nice to be able to process files that way.

hamsipower · 2023-01-12T06:44:30Z

@fxmarty , since you added onnxruntime tag, I changed the onnxruntime version to onnxruntime==1.12.0 and it seems to work on Colab. However, it still translates given audio to English.

Edit; The onnx inference time is slower than the original PyTorch model. For 30 second clip PyTorch; 41 secs, optimum 61 secs.

Edit2; Okay, got it working with the following line;

model = ORTModelForSpeechSeq2Seq.from_pretrained("openai/whisper-medium", from_transformers=True)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="tr", task = "transcribe")

but the inference time still takes a hit.

fxmarty · 2023-01-13T16:04:03Z

@hamsipower Is there anything working with the vanilla pytorch pipeline that is not using an ORTModel? I'm not super familiar with ASR, but to me if you don't pass model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="tr", task = "transcribe") even the PyTorch model returns garbage output (I will explain the design instead of Bir toplantı yapmıştık bu anahtar kelimeleri daha iyi nasıl etc etc etc on the sample mp3 you gave).

I feel like the ASR pipeline doc/implementation https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline is lacking to specify the language and task maybe

For the slowdown, I guess it could be an issue similar to microsoft/onnxruntime#13808 & #524 . Could you open a new issue with a reproducible code & detail about your CPU (lscpu) or cloud instance so that we can have a look?

I'd recommend you to try on GPU as well, if available, to see if you can see any speedup with ORT.

@mht-sharma I recall you had similar issues with Whisper on CPU? It would be great to open an issue to track.

hamsipower · 2023-01-14T06:18:49Z

@fxmarty , yes you're correct. That was a mistake on my part, just thought it will detect language in the pipeline similar to whisper library. Thank you for suggestions, got a lot of reading to do.

I will open a new issue if the performance issue persists, thank you!

Edit; Tried GPU for ORT pipeline. It's also weird that optimum.onnxruntime asks for onnxruntime to be installed, but when I install it then it asks for onnxruntime-gpu to be installed even though I installed it earlier. Uninstalling onnxruntime gives me the following error message;

No module named 'onnxruntime.capi.onnxruntime_inference_collection'

fxmarty · 2023-01-16T08:35:06Z

@hamsipower Yes, to run ONNX Runtime on GPU you need to install onnxruntimeg-gpu and not onnxruntime which is CPU only.

I'll close, feel free to open a new issue!

hamsipower added the bug Something isn't working label Jan 10, 2023

fxmarty added the onnxruntime Related to ONNX Runtime label Jan 11, 2023

fxmarty closed this as completed Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect transcription #685

Incorrect transcription #685

hamsipower commented Jan 10, 2023 •

edited

fxmarty commented Jan 10, 2023

hamsipower commented Jan 10, 2023 •

edited

fxmarty commented Jan 11, 2023

hamsipower commented Jan 11, 2023 •

edited

hamsipower commented Jan 12, 2023 •

edited

fxmarty commented Jan 13, 2023 •

edited

hamsipower commented Jan 14, 2023 •

edited

fxmarty commented Jan 16, 2023

Incorrect transcription #685

Incorrect transcription #685

Comments

hamsipower commented Jan 10, 2023 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

fxmarty commented Jan 10, 2023

hamsipower commented Jan 10, 2023 • edited

fxmarty commented Jan 11, 2023

hamsipower commented Jan 11, 2023 • edited

hamsipower commented Jan 12, 2023 • edited

fxmarty commented Jan 13, 2023 • edited

hamsipower commented Jan 14, 2023 • edited

fxmarty commented Jan 16, 2023

hamsipower commented Jan 10, 2023 •

edited

hamsipower commented Jan 10, 2023 •

edited

hamsipower commented Jan 11, 2023 •

edited

hamsipower commented Jan 12, 2023 •

edited

fxmarty commented Jan 13, 2023 •

edited

hamsipower commented Jan 14, 2023 •

edited