Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added onnx config whisper #19525

Merged
merged 11 commits into from Nov 1, 2022
Merged

Conversation

mht-sharma
Copy link
Contributor

@mht-sharma mht-sharma commented Oct 12, 2022

What does this PR do?

Fixes # (issue)

This PR adds onnx config and helper functions for export to onnx via optimum and transformers.onnx

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 12, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @mht-sharma - it's looking really good 🗣!!

I left a few nits and a general question about whether we should refactor the dummy data generation into a separate function.

Let's also add the logic to enable the ONNX export to work with transformers.onnx so we can add the model to the unit tests

README_es.md Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/models/whisper/configuration_whisper.py Outdated Show resolved Hide resolved
src/transformers/onnx/config.py Outdated Show resolved Hide resolved
@mht-sharma
Copy link
Contributor Author

@lewtun @echarlaix The bug is fixed now. The incorrect results were because the export was happening with seqlength 1 due to typo in onnx config generate_dummy_input function.

@mht-sharma mht-sharma marked this pull request as ready for review October 26, 2022 14:18
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @mht-sharma - this looks very close to being ready 🔥 !!

I've left some nits and a question, but otherwise this all looks good :)

src/transformers/onnx/config.py Outdated Show resolved Hide resolved
src/transformers/onnx/config.py Outdated Show resolved Hide resolved
src/transformers/onnx/config.py Outdated Show resolved Hide resolved
else:
raise ValueError(
"Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
)

def generate_dummy_inputs_onnxruntime(self, reference_model_inputs: Mapping[str, Any]) -> Mapping[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this function doesn't do anything - why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lewtun , this is true that this function does not do anything for existing models. However, this function can be overridden in some cases where the model inputs and ONNX Runtime inputs are different. This was needed when exporting the decoder in encoder-decoder models using encoder_outputs. For example: Check the following Decoder config in optimum where I am using the function DecoderOnnxConfig.

Since we are waiting for the optimum PR to merge and migrating the changes this is no longer needed in the current merge.

But we still need to update the VisionEncoderDecoderConfig using encoder_outputs in that case we would need such function. Should we merge with this and utilise this in new VisionEncoderDecoder PR (In case the user wants to create the PR for this change, this option would become easier for him)? Or we could remove it from here and add the function along with the new PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification! Looking at the optimum code, it seems like we use this function to add new fields to the ORT inputs - is there a reason we can't capture that logic in a single generate_dummy_inputs() function associated with the ONNX config?

(I'm happy to include this function as is, just trying to understand if we really need it or not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function (at least in this case) update the existing keys in the input dict, but the values remains the same.

For exporting these models we would require 2 different input sets. This is because the model.forward() has a different input signature (since this is the full model before export) that the generated ONNX model (only decoder is exported). Therefore we need a way to alter the existing model inputs to run inference with both models in validate_model_outputs.

I think creating a separate function is a much cleaner way. But I am open to any suggestions. We can add the logic in the generate_dummy_inputs by adding a new argument to the function, maybe called ort_inputs=True/False, and for these models we return different sets of inputs. But this will require updation of all the OnnxConfigs having this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK great, now I understand well why we need this - thanks. IMO it's fine to include this function in this PR if we add a small note in the docstring like "Override to run inference with seq2seq models which have the encoder and decoder exported as separate ONNX files."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the final round of iterations @mht-sharma - this now LGTM!

Let's wait for final approval from @sgugger before merging

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! There a few details to fix, but then we should be good to merge.

from ...utils import logging


if TYPE_CHECKING:
from ... import PreTrainedTokenizerBase, TensorType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the right module here :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -297,6 +314,12 @@ def generate_dummy_inputs(
The width of the generated images.
image_height (`int`, *optional*, defaults to 40):
The height of the generated images.
sampling_rate (`int`, *optional* defaults to 22050)
The sampling rate for audio data generation.
time_duration (`int`, *optional* defaults to 5 sec)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time_duration (`int`, *optional* defaults to 5 sec)
time_duration (`float`, *optional* defaults to 5.0)

Let's be consistent with the signature!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -325,7 +348,8 @@ def generate_dummy_inputs(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_input = [" ".join([preprocessor.unk_token]) * seq_length] * batch_size
input_token = preprocessor.unk_token if preprocessor.unk_token else "0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_token = preprocessor.unk_token if preprocessor.unk_token else "0"
input_token = preprocessor.unk_token if preprocessor.unk_token is not None else "0"

We don't rely on Python bool magic conversion in the library, so let's test explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Added explicit tests for None and empty string

@lewtun
Copy link
Member

lewtun commented Nov 1, 2022

Thanks for the review @sgugger and catching those last issues - I've checked the latest changes and think this can now be merged @mht-sharma

@sgugger sgugger merged commit c796b6d into huggingface:main Nov 1, 2022
sgugger pushed a commit that referenced this pull request Nov 1, 2022
* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Nov 3, 2022
* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring
mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring
@zara0m
Copy link

zara0m commented Feb 7, 2023

Hello,

I tried to generate onnx model using docs,
To inference, I passed audio features and decoder_input_id , and the output was two array with the shape of (1, 2, 768), (1, 1500, 768). Could you please help me how should I use these outputs for generating transcription?

Thank you.

@mht-sharma
Copy link
Contributor Author

Hi @zara0m please follow the example in this PR for export and inference using ONNX model. huggingface/optimum#420

@zara0m
Copy link

zara0m commented Feb 7, 2023

Hi @zara0m please follow the example in this PR for export and inference using ONNX model. huggingface/optimum#420

Thank you very much for your help and quick response!

I tested it with base and small model for some non-english audios, but the outputs were not similar to whisper model, or maybe it translates instead of transcribing, how can I fix this?
Also, is there any way that I can have begin/end time of each sentence? (like the transcribe function of whisper model)

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants