You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, using any of the Wav2Vec 2.0 models available on the 馃hub and make a fine-tuning process to resolve a speech classification task implies creating a new class that inherit his behaviour from the Wav2Vec2PreTrainedModel class. Although creating this types of models can be done with a bit of research, I find too complicated to just use a fine-tuned model when shared on the 馃hub, because you need to have access to the code of the model class in order to instantiate it and retrieve the model with the from_pretrained() method (which may or may not be available at that time).
I think that adding a class to the 馃transformers library like Wav2Vec2ForSpeechClassification (i.e. the same way that works for the BertForSequenceClassification models and others similar) will be a very nice feature in order to not just be able to fine-tune Wav2Vec 2.0 for classification tasks but also it would simplify and accelerate the way one can use a shared model.
Motivation
Speech has always been a very awesome field of research both in the way a user interacts with a physical system, and vice versa. Taking this into account, and with the great news of having the new Wav2Vec 2.0 model integrated on the 馃transformers library 馃帀, I started a research project on Speech Emotion Recognition (SER) with the idea of fine-tune a Wav2Vec 2.0 model in this type of emotional datasets. The results that I've obtained are very promising and the model seems to work extremely well, so I decided to put the fine-tuned model on the 馃hub (wip). Additionally, I saw on the 馃 discussion forums a topic about this same task of SER implementation with its corresponding model on the 馃hub, which have the same issue when importig it.
With all this, I think that the number of use cases of the Wav2Vec2 model for speech classification tasks are huge and having a feature like this one implemented would simplify a lot the way other developers and researchers can work with this type of pretrained models.
Your contribution
I can start working in a new PR to overcome this situation by implementing the Wav2Vec2ForSpeechClassification class that I mentioned before in the library. I already have the code working and in fact it's pretty similar to the other nlp models that include the SequenceClassification feature.
The idea behind this is to have a much more simplified and generalized way to use and train this models, getting as final result this snippet for a straight forward use of them.
Let me know if this feature fits the needs of the library in terms of simplicity and integration, and I will start a new PR with these changes. Also let me know if it is useful and cover an adecuate number of use cases, making it worth of implementing.
Thank you all for your amazing work 馃
The text was updated successfully, but these errors were encountered:
I'm only seeing your issue now sadly :-/ Super sorry to not have answered sooner. @anton-l is working on an official Wav2Vec2- and HubertForSequenceClassification at the moment, here: #13153 which should serve your needs then :-)
It would be great if you could take a look at #13153 to see whether this design/architecture fits your needs
Thanks a lot for your answer! As I'm seeing on the issue #13153 , it seems like it's pretty much the same as I was proposing here, so I think it'll do the job for this kind of audio classification tasks. I'll try it when it comes out but it seems to be fine by the moment. Great!
Only one thing, I've work mostly in PyTorch but as I was checking the code I've seen that there's no TensorFlow version of these models (neither for Hubert or Wav2Vec2), do you think it's relevant to implement them? If so maybe I can help with that, but I don't know if it's something critical.
Anyway, is there anything else I can do to help you with this? Just let me know.
Adding a Wav2Vec2ForSpeechClassification class 馃殌
Right now, using any of the Wav2Vec 2.0 models available on the 馃hub and make a fine-tuning process to resolve a speech classification task implies creating a new class that inherit his behaviour from the Wav2Vec2PreTrainedModel class. Although creating this types of models can be done with a bit of research, I find too complicated to just use a fine-tuned model when shared on the 馃hub, because you need to have access to the code of the model class in order to instantiate it and retrieve the model with the
from_pretrained()
method (which may or may not be available at that time).I think that adding a class to the 馃transformers library like
Wav2Vec2ForSpeechClassification
(i.e. the same way that works for theBertForSequenceClassification
models and others similar) will be a very nice feature in order to not just be able to fine-tune Wav2Vec 2.0 for classification tasks but also it would simplify and accelerate the way one can use a shared model.Motivation
Speech has always been a very awesome field of research both in the way a user interacts with a physical system, and vice versa. Taking this into account, and with the great news of having the new Wav2Vec 2.0 model integrated on the 馃transformers library 馃帀, I started a research project on Speech Emotion Recognition (SER) with the idea of fine-tune a Wav2Vec 2.0 model in this type of emotional datasets. The results that I've obtained are very promising and the model seems to work extremely well, so I decided to put the fine-tuned model on the 馃hub (wip). Additionally, I saw on the 馃 discussion forums a topic about this same task of SER implementation with its corresponding model on the 馃hub, which have the same issue when importig it.
With all this, I think that the number of use cases of the Wav2Vec2 model for speech classification tasks are huge and having a feature like this one implemented would simplify a lot the way other developers and researchers can work with this type of pretrained models.
Your contribution
I can start working in a new PR to overcome this situation by implementing the
Wav2Vec2ForSpeechClassification
class that I mentioned before in the library. I already have the code working and in fact it's pretty similar to the other nlp models that include the SequenceClassification feature.The idea behind this is to have a much more simplified and generalized way to use and train this models, getting as final result this snippet for a straight forward use of them.
Let me know if this feature fits the needs of the library in terms of simplicity and integration, and I will start a new PR with these changes. Also let me know if it is useful and cover an adecuate number of use cases, making it worth of implementing.
Thank you all for your amazing work 馃
The text was updated successfully, but these errors were encountered: