Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect transcription #685

Closed
1 of 4 tasks
hamsipower opened this issue Jan 10, 2023 · 8 comments
Closed
1 of 4 tasks

Incorrect transcription #685

hamsipower opened this issue Jan 10, 2023 · 8 comments
Labels
bug Something isn't working onnxruntime Related to ONNX Runtime

Comments

@hamsipower
Copy link

hamsipower commented Jan 10, 2023

System Info

Name: optimum
Version: 1.6.1
Google Colab

Who can help?

@JingyaHuang, @echarlaix

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

resim

Expected behavior

The text should be longer, it's a 30 second clip with a lot of talking and the language should be Turkish. Tried running with #420 .

@hamsipower hamsipower added the bug Something isn't working label Jan 10, 2023
@fxmarty
Copy link
Collaborator

fxmarty commented Jan 10, 2023

Hi @hamsipower , thanks for the report! Did you try with the original PyTorch model from Transformers? If the output is different it's indeed a bug, we'll have a loop asap.

@hamsipower
Copy link
Author

hamsipower commented Jan 10, 2023

resim

@fxmarty , tried installing directly from github but no evail. :(
Considered using an earlier version of optimum, but the ONNX support for Seq2Seq is pretty new, so...

Edit; Ah, also I found out that if I don't include "language='tr'" field to the model.config in the PyTorch version, the outputs of ONNX and PyTorch are not exactly the same but somewhat similar. Not sure if it's related but I hope it helps.

@fxmarty
Copy link
Collaborator

fxmarty commented Jan 11, 2023

Thank you! Could you share the collab or the code copy/pastable so that I have a look?

@hamsipower
Copy link
Author

hamsipower commented Jan 11, 2023

Here you go; Colab.
Sample audio file.

Edit; The pipeline seems to transcribe English audio fine, but it tries tro translate to English when non-English audio is given. Also, unlike whisper library, the pipeline can't translate/transcribe audio files longer than 30 seconds, it would be really nice to be able to process files that way.

@fxmarty fxmarty added the onnxruntime Related to ONNX Runtime label Jan 11, 2023
@hamsipower
Copy link
Author

hamsipower commented Jan 12, 2023

@fxmarty , since you added onnxruntime tag, I changed the onnxruntime version to onnxruntime==1.12.0 and it seems to work on Colab. However, it still translates given audio to English.

Edit; The onnx inference time is slower than the original PyTorch model. For 30 second clip PyTorch; 41 secs, optimum 61 secs.

Edit2; Okay, got it working with the following line;

model = ORTModelForSpeechSeq2Seq.from_pretrained("openai/whisper-medium", from_transformers=True)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="tr", task = "transcribe")

but the inference time still takes a hit.

@fxmarty
Copy link
Collaborator

fxmarty commented Jan 13, 2023

@hamsipower Is there anything working with the vanilla pytorch pipeline that is not using an ORTModel? I'm not super familiar with ASR, but to me if you don't pass model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="tr", task = "transcribe") even the PyTorch model returns garbage output (I will explain the design instead of Bir toplantı yapmıştık bu anahtar kelimeleri daha iyi nasıl etc etc etc on the sample mp3 you gave).

I feel like the ASR pipeline doc/implementation https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline is lacking to specify the language and task maybe

For the slowdown, I guess it could be an issue similar to microsoft/onnxruntime#13808 & #524 . Could you open a new issue with a reproducible code & detail about your CPU (lscpu) or cloud instance so that we can have a look?

I'd recommend you to try on GPU as well, if available, to see if you can see any speedup with ORT.

@mht-sharma I recall you had similar issues with Whisper on CPU? It would be great to open an issue to track.

@hamsipower
Copy link
Author

hamsipower commented Jan 14, 2023

@fxmarty , yes you're correct. That was a mistake on my part, just thought it will detect language in the pipeline similar to whisper library. Thank you for suggestions, got a lot of reading to do.

I will open a new issue if the performance issue persists, thank you!

Edit; Tried GPU for ORT pipeline. It's also weird that optimum.onnxruntime asks for onnxruntime to be installed, but when I install it then it asks for onnxruntime-gpu to be installed even though I installed it earlier. Uninstalling onnxruntime gives me the following error message;

No module named 'onnxruntime.capi.onnxruntime_inference_collection'

@fxmarty
Copy link
Collaborator

fxmarty commented Jan 16, 2023

@hamsipower Yes, to run ONNX Runtime on GPU you need to install onnxruntimeg-gpu and not onnxruntime which is CPU only.

I'll close, feel free to open a new issue!

@fxmarty fxmarty closed this as completed Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working onnxruntime Related to ONNX Runtime
Projects
None yet
Development

No branches or pull requests

2 participants