Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OPT #17088

Merged
merged 158 commits into from May 12, 2022
Merged

Add OPT #17088

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
c8cf718
First version - OPT model
younesbelkada May 4, 2022
9ee623d
Final changes
younesbelkada May 4, 2022
0484ca1
few changes
younesbelkada May 4, 2022
b931db8
few changes
younesbelkada May 4, 2022
681dfc5
fix style issues
younesbelkada May 4, 2022
1e21983
few changes
younesbelkada May 4, 2022
1363221
Update src/transformers/models/auto/tokenization_auto.py
younesbelkada May 4, 2022
8427279
add gen tests
younesbelkada May 4, 2022
5e8e2f5
few changes
younesbelkada May 4, 2022
be0e434
few changes
younesbelkada May 4, 2022
51db79e
some changes
younesbelkada May 5, 2022
99001d3
fix code quality
younesbelkada May 5, 2022
a777bbc
major changes
younesbelkada May 6, 2022
38f7463
rm useless classes
younesbelkada May 6, 2022
c6f3a69
Removed autodoc calls to non-existant classes
ArthurZucker May 6, 2022
30d3db2
Update src/transformers/__init__.py
younesbelkada May 6, 2022
f903445
Update src/transformers/__init__.py
younesbelkada May 6, 2022
bb4ab4a
Update src/transformers/models/auto/modeling_tf_auto.py
younesbelkada May 6, 2022
2a6e288
Replaced OPTTokeniser with GPT2 tokenizer
ArthurZucker May 6, 2022
cb853fd
added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokeni…
ArthurZucker May 6, 2022
337e71f
Removed OPTTokenizer
ArthurZucker May 6, 2022
0d9130f
make style
ArthurZucker May 6, 2022
290b7f0
Make style replaces
ArthurZucker May 6, 2022
096eb74
make repo consistency
ArthurZucker May 6, 2022
020843a
Removed PretrainedOPTModel
ArthurZucker May 6, 2022
c63d9f8
fix opt.mdx removed other heads
ArthurZucker May 6, 2022
8b6e496
fix init, removed 3 heads
ArthurZucker May 6, 2022
0303f2b
removed heads
ArthurZucker May 6, 2022
2c0327d
finished cleaning head
ArthurZucker May 6, 2022
4aa6ab2
removed seauence classif and question answering
ArthurZucker May 6, 2022
752f512
removed unused imports
ArthurZucker May 6, 2022
14eeb13
removed useless dummy object for QA, SC and CG
ArthurZucker May 6, 2022
9c96f09
removed tests for removed useless dummy object for QA, SC and CG
ArthurZucker May 6, 2022
54fc962
Removed head_mask using encoder layers which don't exist
ArthurZucker May 6, 2022
06f42ca
fixed test
ArthurZucker May 6, 2022
76e52ac
fix line
ArthurZucker May 6, 2022
556c2f4
added OPT to toctree
ArthurZucker May 6, 2022
1460025
Updated model path with pushed weigths
ArthurZucker May 6, 2022
db100a5
fix model path
ArthurZucker May 6, 2022
d16d40d
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker May 6, 2022
c10f347
fixed code quality
ArthurZucker May 6, 2022
f1fe820
fixed embeddings and generation tests
ArthurZucker May 6, 2022
9b9c65b
update paths
ArthurZucker May 7, 2022
4fb9608
clean comments
ArthurZucker May 7, 2022
ab57047
removed OPTClassificationHead for sentence classification
ArthurZucker May 8, 2022
0c1c791
renamed hidden layer
ArthurZucker May 9, 2022
ac50b44
renamed num layers to standard num_hidden_layers
ArthurZucker May 9, 2022
1505de5
num_attention_heads fix
ArthurZucker May 9, 2022
8ace67b
changes for 125m
younesbelkada May 9, 2022
80296cb
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada May 9, 2022
752c1d2
add first version for 125m
younesbelkada May 9, 2022
77e6e04
add first version - flax
younesbelkada May 9, 2022
1564dac
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada May 9, 2022
abd1f3c
add new version
younesbelkada May 9, 2022
23ff89c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada May 9, 2022
5c5c858
causal LM output
ArthurZucker May 9, 2022
41fad01
Merge branch 'opt-350-m' of github.com:younesbelkada/transformers int…
ArthurZucker May 9, 2022
27b55c9
replace output type with BaseModelOutputWithPastAndCrossAttentions
ArthurZucker May 9, 2022
aebd19e
revert working config from 150m to 350m
ArthurZucker May 9, 2022
d0723aa
clean
ArthurZucker May 9, 2022
7575749
removed decoder input ids
ArthurZucker May 9, 2022
66e8298
fixed embed dim
ArthurZucker May 9, 2022
8d4920e
more embed_dim issues
ArthurZucker May 9, 2022
c005840
make style + removed enc_dec test
ArthurZucker May 9, 2022
84eb497
update falx model
ArthurZucker May 9, 2022
043a109
removed troublesome copy
ArthurZucker May 9, 2022
8ba7cbc
added is_encoder_decoder=False to config
ArthurZucker May 9, 2022
2099b5f
added set_input emb fuinction to model class
ArthurZucker May 9, 2022
1c9580f
requires torch on embed test
ArthurZucker May 9, 2022
9f6291d
use head mask instead of decoder head mask input param solves a test
ArthurZucker May 9, 2022
740fcf5
8 test remaining, update
ArthurZucker May 9, 2022
f8c276b
Updated create_and_check_decoder_model_past_large_inputs
ArthurZucker May 9, 2022
fff035f
Make style
ArthurZucker May 9, 2022
30ed9f6
update op tokenizer with condition
ArthurZucker May 9, 2022
69c7ae6
make style
ArthurZucker May 9, 2022
ff09958
See if I can push
patrickvonplaten May 10, 2022
0555b92
some clean up
patrickvonplaten May 10, 2022
5491431
remove linear head hack
patrickvonplaten May 10, 2022
521822f
save intermediate
patrickvonplaten May 10, 2022
61e8023
save correct attention
patrickvonplaten May 10, 2022
7b27a91
add copied from from bart
patrickvonplaten May 10, 2022
26729d7
Merge branch 'main' of https://github.com/huggingface/transformers in…
patrickvonplaten May 10, 2022
7661453
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
25a40b1
fix part of the reviewss
ArthurZucker May 11, 2022
aefa63d
Merge pull request #2 from younesbelkada/opt_branch/opt-350-m
ArthurZucker May 11, 2022
f3b5e24
same changes in naming / conversion
patrickvonplaten May 11, 2022
0365e27
correct mask
patrickvonplaten May 11, 2022
929be23
more fixes
patrickvonplaten May 11, 2022
f6b032b
delete FlaxOPT and TfOPT
ArthurZucker May 11, 2022
d633832
clean traces of Flax and Tf
ArthurZucker May 11, 2022
85ce8e8
fix mask
patrickvonplaten May 11, 2022
d6fc7f3
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
patrickvonplaten May 11, 2022
95d7ead
fixed positionnal embedding length when past key value is provoded
ArthurZucker May 11, 2022
412bdab
get 125m, 6.7b to work
patrickvonplaten May 11, 2022
974d44c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
patrickvonplaten May 11, 2022
cc1b4c9
Added do_layer_norm
ArthurZucker May 11, 2022
156866d
solved mismatch in load dictionnary
ArthurZucker May 11, 2022
849afd3
clean up preapre opt input dict
ArthurZucker May 11, 2022
1acb47a
fixed past key value as bool
ArthurZucker May 11, 2022
668246b
fix previus
ArthurZucker May 11, 2022
769b9d6
fixed return dict False tuple issue
ArthurZucker May 11, 2022
def917e
All tests are passing
ArthurZucker May 11, 2022
5131932
Make style
ArthurZucker May 11, 2022
5ec0766
Ignore OPTDecoder non tested
ArthurZucker May 11, 2022
2ed32a8
make fix-copies
ArthurZucker May 11, 2022
1db5f2b
make repo consistency
ArthurZucker May 11, 2022
f57a0b5
small fix
ArthurZucker May 11, 2022
5f96836
removed uselss @torch.no_grad decorator
ArthurZucker May 11, 2022
70c2196
make styl;e
ArthurZucker May 11, 2022
49e905d
fix previous opt test
ArthurZucker May 11, 2022
2c1bce4
style
ArthurZucker May 11, 2022
9c3f0c0
make style
ArthurZucker May 11, 2022
29987ed
added opt documentation
ArthurZucker May 11, 2022
145838f
update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
ArthurZucker May 11, 2022
e2c932b
up
patrickvonplaten May 11, 2022
3bf333d
more fixes
patrickvonplaten May 11, 2022
b24ac4b
model & config work
patrickvonplaten May 11, 2022
2e1d4f4
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
6737d09
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
994c104
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
136983b
added comment on padding hack (+2)
ArthurZucker May 11, 2022
6834c7b
cleaup
ArthurZucker May 11, 2022
014674d
review update
ArthurZucker May 11, 2022
2c7102b
docstring for missing arg
ArthurZucker May 11, 2022
598ef8d
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
0a58092
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
66c807a
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
fd91198
Update src/transformers/models/opt/__init__.py
ArthurZucker May 11, 2022
dfb00c0
update pretrained map
ArthurZucker May 11, 2022
f6c587c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
ArthurZucker May 11, 2022
7923a46
update path and tests
ArthurZucker May 11, 2022
192c407
make style
ArthurZucker May 11, 2022
ab1c4fb
styling
ArthurZucker May 11, 2022
0215920
make consistency
ArthurZucker May 11, 2022
6de5a2d
add gpt2 tok new
patrickvonplaten May 11, 2022
4dbd565
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
patrickvonplaten May 11, 2022
484f24f
more tok fixes
patrickvonplaten May 11, 2022
e3b7c4b
Update src/transformers/models/auto/tokenization_auto.py
patrickvonplaten May 11, 2022
46f6401
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
27437f7
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
c325b0a
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
5fc7b7b
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
133a465
Update tests/models/opt/test_modeling_opt.py
ArthurZucker May 11, 2022
109abdc
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
d69db00
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
51cba40
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
6554537
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
200ac36
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
368620a
Update based on reviews
ArthurZucker May 11, 2022
e1bbc22
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
ArthurZucker May 11, 2022
e53e8f7
Apply suggestions from code review
patrickvonplaten May 12, 2022
4c1e494
make style
patrickvonplaten May 12, 2022
4c9c360
make tokenizer auto tests pass
patrickvonplaten May 12, 2022
6055da9
apply Lysandre suggestion
patrickvonplaten May 12, 2022
22c89b4
Merge branch 'main' of https://github.com/huggingface/transformers in…
patrickvonplaten May 12, 2022
776b42c
finish tests
patrickvonplaten May 12, 2022
c39f5bd
add some good tokenizer tests
patrickvonplaten May 12, 2022
d8070cd
improve docs slighly
patrickvonplaten May 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Expand Up @@ -294,6 +294,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
1 change: 1 addition & 0 deletions README_ko.md
Expand Up @@ -273,6 +273,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
1 change: 1 addition & 0 deletions README_zh-hans.md
Expand Up @@ -297,6 +297,7 @@ conda install -c huggingface transformers
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (来自 Meta AI) 伴随论文 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
Expand Down
1 change: 1 addition & 0 deletions README_zh-hant.md
Expand Up @@ -309,6 +309,7 @@ conda install -c huggingface transformers
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Expand Up @@ -270,6 +270,8 @@
title: Nyströmformer
- local: model_doc/openai-gpt
title: OpenAI GPT
- local: model_doc/opt
patrickvonplaten marked this conversation as resolved.
Show resolved Hide resolved
title: OPT
- local: model_doc/gpt2
title: OpenAI GPT2
- local: model_doc/gptj
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/index.mdx
Expand Up @@ -115,6 +115,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down Expand Up @@ -231,6 +232,7 @@ Flax), PyTorch, and/or TensorFlow.
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| OPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ |
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ |
Expand Down
47 changes: 47 additions & 0 deletions docs/source/en/model_doc/opt.mdx
@@ -0,0 +1,47 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# OPT

## Overview

The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI.
OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.


The abstract from the paper is the following:

*Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.*

Tips:
- OPT has the same architecture as [`BartDecoder`].
- Contrary to GPT2, OPT adds the EOS token `</s>` to the beginning of every prompt. **Note**: Make sure to pass `use_fast=False` when loading OPT's tokenizer with [`AutoTokenizer`] to get the correct tokenizer.

This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Younes Belkada](https://huggingface.co/ybelkada), and [Patrick Von Platen](https://huggingface.co/patrickvonplaten).
The original code can be found [here](https://github.com/facebookresearch/metaseq).


## OPTConfig

[[autodoc]] OPTConfig

## OPTModel

[[autodoc]] OPTModel
- forward


## OPTForCausalLM

[[autodoc]] OPTForCausalLM
- forward

13 changes: 12 additions & 1 deletion src/transformers/__init__.py
Expand Up @@ -247,6 +247,7 @@
"NystromformerConfig",
],
"models.openai": ["OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OpenAIGPTConfig", "OpenAIGPTTokenizer"],
"models.opt": ["OPTConfig"],
"models.pegasus": ["PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP", "PegasusConfig", "PegasusTokenizer"],
"models.perceiver": ["PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP", "PerceiverConfig", "PerceiverTokenizer"],
"models.phobert": ["PhobertTokenizer"],
Expand Down Expand Up @@ -1323,6 +1324,14 @@
"load_tf_weights_in_openai_gpt",
]
)
_import_structure["models.opt"].extend(
[
"OPT_PRETRAINED_MODEL_ARCHIVE_LIST",
"OPTForCausalLM",
"OPTModel",
"OPTPreTrainedModel",
]
)
_import_structure["models.pegasus"].extend(
["PegasusForCausalLM", "PegasusForConditionalGeneration", "PegasusModel", "PegasusPreTrainedModel"]
)
Expand Down Expand Up @@ -2373,7 +2382,6 @@
"FlaxBartPreTrainedModel",
]
)

_import_structure["models.beit"].extend(
[
"FlaxBeitForImageClassification",
Expand All @@ -2382,6 +2390,7 @@
"FlaxBeitPreTrainedModel",
]
)

_import_structure["models.bert"].extend(
[
"FlaxBertForCausalLM",
Expand Down Expand Up @@ -2718,6 +2727,7 @@
from .models.mt5 import MT5Config
from .models.nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig
from .models.openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig, OpenAIGPTTokenizer
from .models.opt import OPTConfig
from .models.pegasus import PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP, PegasusConfig, PegasusTokenizer
from .models.perceiver import PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP, PerceiverConfig, PerceiverTokenizer
from .models.phobert import PhobertTokenizer
Expand Down Expand Up @@ -3630,6 +3640,7 @@
OpenAIGPTPreTrainedModel,
load_tf_weights_in_openai_gpt,
)
from .models.opt import OPT_PRETRAINED_MODEL_ARCHIVE_LIST, OPTForCausalLM, OPTModel, OPTPreTrainedModel
from .models.pegasus import (
PegasusForCausalLM,
PegasusForConditionalGeneration,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Expand Up @@ -87,6 +87,7 @@
mt5,
nystromformer,
openai,
opt,
pegasus,
perceiver,
phobert,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Expand Up @@ -98,6 +98,7 @@
("megatron-bert", "MegatronBertConfig"),
("mpnet", "MPNetConfig"),
("bart", "BartConfig"),
("opt", "OPTConfig"),
("blenderbot", "BlenderbotConfig"),
("reformer", "ReformerConfig"),
("longformer", "LongformerConfig"),
Expand Down Expand Up @@ -190,6 +191,7 @@
("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bert", "BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bart", "BART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("opt", "OPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("mbart", "MBART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("openai-gpt", "OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -301,6 +303,7 @@
("mbart", "mBART"),
("megatron-bert", "MegatronBert"),
("bart", "BART"),
("opt", "OPT"),
("reformer", "Reformer"),
("longformer", "Longformer"),
("roberta", "RoBERTa"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Expand Up @@ -93,6 +93,7 @@
("xlm-roberta-xl", "XLMRobertaXLModel"),
("xlm-roberta", "XLMRobertaModel"),
("bart", "BartModel"),
("opt", "OPTModel"),
("longformer", "LongformerModel"),
("roberta", "RobertaModel"),
("data2vec-text", "Data2VecTextModel"),
Expand Down Expand Up @@ -261,6 +262,7 @@
("xlm-prophetnet", "XLMProphetNetForCausalLM"),
("prophetnet", "ProphetNetForCausalLM"),
("bart", "BartForCausalLM"),
("opt", "OPTForCausalLM"),
("mbart", "MBartForCausalLM"),
("pegasus", "PegasusForCausalLM"),
("marian", "MarianForCausalLM"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Expand Up @@ -137,6 +137,7 @@
("openai-gpt", ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None)),
("gpt2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("gptj", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("opt", ("GPT2Tokenizer", None)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a fast tokenizer in a follow-up PR that is able to prepend the bos_token at the beginning. For this a new converter needs to be written.

("transfo-xl", ("TransfoXLTokenizer", None)),
(
"xlnet",
Expand Down