New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Data2Vec #15507
Add Data2Vec #15507
Changes from 57 commits
faa4d1d
9e79329
47e6bbc
5e70a95
2eddba9
991e6d9
caeb28d
4c0565e
3517038
54d0f37
81f36f7
7c3ec90
2a652c4
65219fb
432fef5
6aa237d
c82c7b2
96c0899
dd40020
5705951
aa38abf
d0936e8
3a058d7
6d73033
a555792
5b7dff1
492f510
5e96b70
34d045c
d239d63
3df8a74
3e6cd53
91c03bd
7fa2e2a
988bbf0
b796850
0ad60a6
913de4f
8e5902f
2830260
23b7de8
9e5ac32
0d2cf13
68491bc
0b6e4ba
5ae35a7
d3e6d27
b55f326
4ff05bb
f216196
c98558e
2d260f2
050a159
fee4f8d
d0a7cf9
2b958d2
89d8f9b
86cc898
a2de595
dafc36d
985ed72
79994cf
b368a6e
e2dbdb2
e670461
3d21d60
16c8361
4dc16f3
3f1efe1
bf8dd78
7917766
cdbb4b7
578cfd8
cda97e9
03b41fc
244e52e
d4c82e2
79eaf6c
17f5f86
309a43d
acd9cd4
c306788
b440a4b
835ca7f
ab52bbf
ec8d4b3
5c98ce2
f8880bc
0a0a8de
2b26499
c5d3736
84c6ad1
d447c90
46c0c88
960fe56
f88162c
1a24eae
71d8b74
bbd3846
b1365f7
796ab6e
71be483
65e80dd
553fb11
6d8c952
4f22fcb
45fb62f
8cde36a
c6a49e9
b926661
432e42d
a3ce025
3dc71fe
0780b03
de7f649
f095e35
7fb3234
cdf60e8
166217f
98df301
7d1a3e7
02d9e5e
5b93a64
a149c18
7708caa
b9e1fe3
3389304
0716731
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# Data2Vec | ||
|
||
## Overview | ||
|
||
The Data2Vec model was proposed in [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://scontent-sjc3-1.xx.fbcdn.net/v/t39.8562-6/271974914_483120576492438_4239522333319653600_n.pdf?_nc_cat=107&ccb=1-5&_nc_sid=ae5e01&_nc_ohc=7huShTb_QZIAX-N7SYx&_nc_ht=scontent-sjc3-1.xx&oh=00_AT_lXXL69mjqmdVWbaLh4Ro6DY17aFeO5vA9I-mIpyNieg&oe=6205C411) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. | ||
Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. | ||
Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets. | ||
|
||
The abstract from the paper is the following: | ||
|
||
*While the general idea of self-supervised learning is identical across modalities, the actual algorithms and | ||
objectives differ widely because they were developed with a single modality in mind. To get us closer to general | ||
self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, | ||
NLP or computer vision. The core idea is to predict latent representations of the full input data based on a | ||
masked view of the input in a selfdistillation setup using a standard Transformer architecture. | ||
Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which | ||
are local in nature, data2vec predicts contextualized latent representations that contain information from | ||
the entire input. Experiments on the major benchmarks of speech recognition, image classification, and | ||
natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches. | ||
Models and code are available at www.github.com/pytorch/fairseq/tree/master/examples/data2vec.* | ||
|
||
Tips: | ||
|
||
- This implementation has a shared encoder for all different modalities and different pre-processors for each modality. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Think this needs to be updated. Maybe we can just write that both Data2VecAudio and Data2VecText have been pretrained using the same method There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed it, let me know what you think |
||
For example, in the case of text, preprocessing is identical to [`RobertaModel`], including tokenization. | ||
|
||
This model was contributed by [edugp](https://huggingface.co/edugp). | ||
The original code can be found [here](https://github.com/pytorch/fairseq/tree/main/examples/data2vec). | ||
|
||
|
||
## Data2VecTextConfig | ||
|
||
[[autodoc]] Data2VecTextConfig | ||
|
||
## Data2VecAudioConfig | ||
|
||
[[autodoc]] Data2VecAudioConfig | ||
|
||
## Data2VecAudioModel | ||
|
||
[[autodoc]] Data2VecAudioModel | ||
- forward | ||
|
||
|
||
## Data2VecAudioForAudioFrameClassification | ||
|
||
[[autodoc]] Data2VecAudioForAudioFrameClassification | ||
- forward | ||
|
||
## Data2VecAudioForCTC | ||
|
||
[[autodoc]] Data2VecAudioForCTC | ||
- forward | ||
|
||
## Data2VecAudioForSequenceClassification | ||
|
||
[[autodoc]] Data2VecAudioForSequenceClassification | ||
- forward | ||
|
||
## Data2VecAudioForXVector | ||
|
||
[[autodoc]] Data2VecAudioForXVector | ||
- forward | ||
|
||
## Data2VecTextModel | ||
|
||
[[autodoc]] Data2VecTextModel | ||
- forward | ||
|
||
## Data2VecTextForCausalLM | ||
|
||
[[autodoc]] Data2VecTextForCausalLM | ||
- forward | ||
|
||
## Data2VecTextForMaskedLM | ||
|
||
[[autodoc]] Data2VecTextForMaskedLM | ||
- forward | ||
|
||
## Data2VecTextForSequenceClassification | ||
|
||
[[autodoc]] Data2VecTextForSequenceClassification | ||
- forward | ||
|
||
## Data2VecTextForMultipleChoice | ||
|
||
[[autodoc]] Data2VecTextForMultipleChoice | ||
- forward | ||
|
||
## Data2VecTextForTokenClassification | ||
|
||
[[autodoc]] Data2VecTextForTokenClassification | ||
- forward | ||
|
||
## Data2VecTextForQuestionAnswering | ||
|
||
[[autodoc]] Data2VecTextForQuestionAnswering | ||
- forward |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,7 @@ | |
convnext, | ||
cpm, | ||
ctrl, | ||
data2vec, | ||
deberta, | ||
deberta_v2, | ||
deit, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -121,6 +121,8 @@ | |
("unispeech-sat", "UniSpeechSatConfig"), | ||
("unispeech", "UniSpeechConfig"), | ||
("wavlm", "WavLMConfig"), | ||
("data2vec-audio", "Data2VecAudioConfig"), | ||
("data2vec-text", "Data2VecTextConfig"), | ||
] | ||
) | ||
|
||
|
@@ -177,6 +179,8 @@ | |
("xlnet", "XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("xlm", "XLM_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("roberta", "ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("data2vec-text", "DATA2VEC_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("data2vec-audio", "DATA2VEC_AUDIO_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("distilbert", "DISTILBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("albert", "ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
("camembert", "CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"), | ||
|
@@ -321,10 +325,14 @@ | |
("xlsr_wav2vec2", "XLSR-Wav2Vec2"), | ||
("mluke", "mLUKE"), | ||
("layoutxlm", "LayoutXLM"), | ||
("data2vec-audio", "Data2VecAudio"), | ||
("data2vec-text", "Data2VecText"), | ||
] | ||
) | ||
|
||
SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict([("openai-gpt", "openai")]) | ||
SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict( | ||
[("openai-gpt", "openai"), ("data2vec-audio", "data2vec"), ("data2vec-text", "data2vec")] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this ok @sgugger ? Added it like this now |
||
) | ||
|
||
|
||
def model_type_to_module_name(key): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link looks a bit weird - can we change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is surely an arxiv link no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! I'll change it to this link