You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When fine-tuning facebook/wav2vec2-large-robust-ft-swbd-300h I noticed I couldn't reproduce past training results from transformers version 4.9.2 now on 4.10. I noticed that inputs are not being correctly normalized with zero mean and unit variance in this new version. This seems to happen when return_attention_mask=True, audios in a batch input have different lengths and no padding is done.
the official example scripts: (give details below)
my own modified scripts: (give details below)
The tasks I am working on is:
an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
Load Wav2Vec2Processor from facebook/wav2vec2-large-robust-ft-swbd-300h
Call processor with batched inputs of individual different lengths
Sample code to replicate the error:
import numpy as np
from transformers import Wav2Vec2Processor
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-robust-ft-swbd-300h")
sample_rate = 16000
length_1 = 10
length_2 = 20
# Generate dummy input audios of same sample rate but different lengths
input_1 = np.random.rand((sample_rate * length_1))
input_2 = np.random.rand((sample_rate * length_1))
input_3 = np.random.rand((sample_rate * length_2))
same_length_result = processor([input_1, input_2], sampling_rate=sample_rate)
different_length_result = processor([input_1, input_3], sampling_rate=sample_rate)
# Show normalized batched audios when using same length
print(same_length_result)
# Show normalized batched audios when using different length
print(different_length_result)
# Check same audio suffers different transformations according to length of audios in batch
np.testing.assert_array_equal(same_length_result["input_values"][0], different_length_result["input_values"][0])
Expected behavior
A successful assert. Both processed inputs should be equal, with a mean close to 0 and a standard deviation close to 1.
The text was updated successfully, but these errors were encountered:
Hi @patrickvonplaten,
Thank you so very much! Sorry for not responding earlier. I've tried the latest patch release version and everything works as it should!
When fine-tuning
facebook/wav2vec2-large-robust-ft-swbd-300h
I noticed I couldn't reproduce past training results from transformers version 4.9.2 now on 4.10. I noticed that inputs are not being correctly normalized with zero mean and unit variance in this new version. This seems to happen whenreturn_attention_mask=True
, audios in a batch input have different lengths and no padding is done.Environment info
transformers
version: 4.10.0Who can help
@patrickvonplaten
@sgugger
Information
Model I am using (Bert, XLNet ...): Wav2Vec 2.0
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
facebook/wav2vec2-large-robust-ft-swbd-300h
Sample code to replicate the error:
Expected behavior
A successful assert. Both processed inputs should be equal, with a mean close to 0 and a standard deviation close to 1.
The text was updated successfully, but these errors were encountered: