Added onnx config whisper #19525

mht-sharma · 2022-10-12T12:16:58Z

What does this PR do?

Fixes # (issue)

This PR adds onnx config and helper functions for export to onnx via optimum and transformers.onnx

HuggingFaceDocBuilderDev · 2022-10-12T12:32:46Z

The documentation is not available anymore as the PR was closed or merged.

lewtun

Thanks for working on this @mht-sharma - it's looking really good 🗣!!

I left a few nits and a general question about whether we should refactor the dummy data generation into a separate function.

Let's also add the logic to enable the ONNX export to work with transformers.onnx so we can add the model to the unit tests

README_es.md

src/transformers/models/whisper/configuration_whisper.py

src/transformers/onnx/config.py

src/transformers/models/whisper/configuration_whisper.py

mht-sharma · 2022-10-18T12:17:47Z

@lewtun @echarlaix The bug is fixed now. The incorrect results were because the export was happening with seqlength 1 due to typo in onnx config generate_dummy_input function.

lewtun

Thanks for iterating @mht-sharma - this looks very close to being ready 🔥 !!

I've left some nits and a question, but otherwise this all looks good :)

src/transformers/onnx/config.py

lewtun · 2022-10-28T07:13:07Z

src/transformers/onnx/config.py

        else:
            raise ValueError(
                "Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
            )

+    def generate_dummy_inputs_onnxruntime(self, reference_model_inputs: Mapping[str, Any]) -> Mapping[str, Any]:


If I'm not mistaken, this function doesn't do anything - why do we need it?

Hi @lewtun , this is true that this function does not do anything for existing models. However, this function can be overridden in some cases where the model inputs and ONNX Runtime inputs are different. This was needed when exporting the decoder in encoder-decoder models using encoder_outputs. For example: Check the following Decoder config in optimum where I am using the function DecoderOnnxConfig.

Since we are waiting for the optimum PR to merge and migrating the changes this is no longer needed in the current merge.

But we still need to update the VisionEncoderDecoderConfig using encoder_outputs in that case we would need such function. Should we merge with this and utilise this in new VisionEncoderDecoder PR (In case the user wants to create the PR for this change, this option would become easier for him)? Or we could remove it from here and add the function along with the new PR?

I see, thanks for the clarification! Looking at the optimum code, it seems like we use this function to add new fields to the ORT inputs - is there a reason we can't capture that logic in a single generate_dummy_inputs() function associated with the ONNX config?

(I'm happy to include this function as is, just trying to understand if we really need it or not)

The function (at least in this case) update the existing keys in the input dict, but the values remains the same.

For exporting these models we would require 2 different input sets. This is because the model.forward() has a different input signature (since this is the full model before export) that the generated ONNX model (only decoder is exported). Therefore we need a way to alter the existing model inputs to run inference with both models in validate_model_outputs.

I think creating a separate function is a much cleaner way. But I am open to any suggestions. We can add the logic in the generate_dummy_inputs by adding a new argument to the function, maybe called ort_inputs=True/False, and for these models we return different sets of inputs. But this will require updation of all the OnnxConfigs having this function.

OK great, now I understand well why we need this - thanks. IMO it's fine to include this function in this PR if we add a small note in the docstring like "Override to run inference with seq2seq models which have the encoder and decoder exported as separate ONNX files."

lewtun

Thanks for the final round of iterations @mht-sharma - this now LGTM!

Let's wait for final approval from @sgugger before merging

sgugger

Thanks for adding this! There a few details to fix, but then we should be good to merge.

sgugger · 2022-10-31T13:24:41Z

src/transformers/models/whisper/configuration_whisper.py

 from ...utils import logging


+if TYPE_CHECKING:
+    from ... import PreTrainedTokenizerBase, TensorType


Let's put the right module here :-)

sgugger · 2022-10-31T13:26:10Z

src/transformers/onnx/config.py

@@ -297,6 +314,12 @@ def generate_dummy_inputs(
                The width of the generated images.
            image_height (`int`, *optional*, defaults to 40):
                The height of the generated images.
+            sampling_rate (`int`, *optional* defaults to 22050)
+                The sampling rate for audio data generation.
+            time_duration (`int`, *optional* defaults to 5 sec)


Suggested change

time_duration (`int`, *optional* defaults to 5 sec)

time_duration (`float`, *optional* defaults to 5.0)

Let's be consistent with the signature!

sgugger · 2022-10-31T13:26:37Z

src/transformers/onnx/config.py

@@ -325,7 +348,8 @@ def generate_dummy_inputs(
                seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
            )
            # Generate dummy inputs according to compute batch and sequence
-            dummy_input = [" ".join([preprocessor.unk_token]) * seq_length] * batch_size
+            input_token = preprocessor.unk_token if preprocessor.unk_token else "0"


Suggested change

input_token = preprocessor.unk_token if preprocessor.unk_token else "0"

input_token = preprocessor.unk_token if preprocessor.unk_token is not None else "0"

We don't rely on Python bool magic conversion in the library, so let's test explicitly.

Done, Added explicit tests for None and empty string

lewtun · 2022-11-01T10:25:26Z

Thanks for the review @sgugger and catching those last issues - I've checked the latest changes and think this can now be merged @mht-sharma

* Added onnx config whisper * added whisper support onnx * add audio input data * added whisper support onnx * fixed the seqlength value * Updated the whisper onnx ocnfig * restore files to old version * removed attention mask from inputs * Updated get_dummy_input_onnxruntime docstring * Updated relative imports and token generation * update docstring

zara0m · 2023-02-07T15:15:23Z

Hello,

I tried to generate onnx model using docs,
To inference, I passed audio features and decoder_input_id , and the output was two array with the shape of (1, 2, 768), (1, 1500, 768). Could you please help me how should I use these outputs for generating transcription?

Thank you.

mht-sharma · 2023-02-07T16:00:45Z

Hi @zara0m please follow the example in this PR for export and inference using ONNX model. huggingface/optimum#420

zara0m · 2023-02-07T16:54:59Z

Hi @zara0m please follow the example in this PR for export and inference using ONNX model. huggingface/optimum#420

Thank you very much for your help and quick response!

I tested it with base and small model for some non-english audios, but the outputs were not similar to whisper model, or maybe it translates instead of transcribing, how can I fix this?
Also, is there any way that I can have begin/end time of each sentence? (like the transcribe function of whisper model)

Thank you.

mht-sharma requested review from lewtun and echarlaix October 12, 2022 12:32

lewtun requested changes Oct 12, 2022

View reviewed changes

echarlaix reviewed Oct 14, 2022

View reviewed changes

src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved

mht-sharma added 6 commits October 21, 2022 12:09

Added onnx config whisper

3c7c712

added whisper support onnx

da52a43

add audio input data

f61ae96

added whisper support onnx

79ad032

fixed the seqlength value

8bd740b

Updated the whisper onnx ocnfig

ba50fa3

mht-sharma force-pushed the add-onnx-config-whisper branch from 85a46f7 to ba50fa3 Compare October 21, 2022 12:38

mht-sharma added 2 commits October 21, 2022 12:47

restore files to old version

f9557dd

removed attention mask from inputs

85fdb68

mht-sharma mentioned this pull request Oct 25, 2022

fix vision enc-dec models conversion to onnx #19819

Closed

5 tasks

mht-sharma marked this pull request as ready for review October 26, 2022 14:18

lewtun reviewed Oct 28, 2022

View reviewed changes

Updated get_dummy_input_onnxruntime docstring

b1a2e4a

lewtun approved these changes Oct 31, 2022

View reviewed changes

sgugger approved these changes Oct 31, 2022

View reviewed changes

mht-sharma added 2 commits November 1, 2022 06:22

Updated relative imports and token generation

59a5965

update docstring

0d1904c

sgugger merged commit c796b6d into huggingface:main Nov 1, 2022

mht-sharma mentioned this pull request Nov 11, 2022

Add support ORT whisper huggingface/optimum#420

Merged

3 tasks

mht-sharma mentioned this pull request Nov 22, 2022

Add ort export in exporters for encoder-decoder models huggingface/optimum#497

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added onnx config whisper #19525

Added onnx config whisper #19525

mht-sharma commented Oct 12, 2022 •

edited

HuggingFaceDocBuilderDev commented Oct 12, 2022 •

edited

lewtun left a comment

mht-sharma commented Oct 18, 2022

lewtun left a comment

lewtun Oct 28, 2022

mht-sharma Oct 28, 2022

lewtun Oct 28, 2022

mht-sharma Oct 28, 2022 •

edited

lewtun Oct 28, 2022

mht-sharma Oct 31, 2022

lewtun left a comment

sgugger left a comment

sgugger Oct 31, 2022

mht-sharma Nov 1, 2022

sgugger Oct 31, 2022

mht-sharma Nov 1, 2022

sgugger Oct 31, 2022

mht-sharma Nov 1, 2022

lewtun commented Nov 1, 2022

zara0m commented Feb 7, 2023

mht-sharma commented Feb 7, 2023

zara0m commented Feb 7, 2023

	time_duration (`int`, optional defaults to 5 sec)
	time_duration (`float`, optional defaults to 5.0)

	input_token = preprocessor.unk_token if preprocessor.unk_token else "0"
	input_token = preprocessor.unk_token if preprocessor.unk_token is not None else "0"

Added onnx config whisper #19525

Added onnx config whisper #19525

Conversation

mht-sharma commented Oct 12, 2022 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 12, 2022 • edited

lewtun left a comment

Choose a reason for hiding this comment

mht-sharma commented Oct 18, 2022

lewtun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mht-sharma Oct 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewtun commented Nov 1, 2022

zara0m commented Feb 7, 2023

mht-sharma commented Feb 7, 2023

zara0m commented Feb 7, 2023

mht-sharma commented Oct 12, 2022 •

edited

HuggingFaceDocBuilderDev commented Oct 12, 2022 •

edited

mht-sharma Oct 28, 2022 •

edited