Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLOOM #17474

Merged
merged 195 commits into from Jun 9, 2022
Merged

BLOOM #17474

Show file tree
Hide file tree
Changes from 188 commits
Commits
Show all changes
195 commits
Select commit Hold shift + click to select a range
c3a209e
adding template
thomwolf Mar 21, 2022
c08d29f
update model
thomwolf Mar 21, 2022
95e2500
model update
thomwolf Mar 22, 2022
90232f9
update conf for debug model
thomwolf Mar 22, 2022
a439f1a
update conversion
thomwolf Mar 22, 2022
73cfd5f
update conversion script
thomwolf Apr 11, 2022
c66f8c0
update conversion script
thomwolf Apr 11, 2022
780e7c7
fix missing keys check
thomwolf Apr 11, 2022
7c30689
add tests to test the tokenizer in the local machine
younesbelkada Apr 13, 2022
a9851a5
Change variable name
younesbelkada Apr 13, 2022
e40ddf2
add tests on xnli dataset
younesbelkada Apr 13, 2022
43173bd
add more description
younesbelkada Apr 13, 2022
919839d
add descriptions + clearer code
younesbelkada Apr 13, 2022
e8c1e82
clearer code
younesbelkada Apr 13, 2022
69c7928
adding new tests + skipping few tests because of env problems
younesbelkada Apr 13, 2022
267faac
change comment
younesbelkada Apr 15, 2022
e9ab687
add dtype on the configuration
younesbelkada Apr 15, 2022
9f30687
add test embeddings
younesbelkada Apr 15, 2022
16e018e
add hardcoded test
younesbelkada Apr 15, 2022
6923335
fix dtype issue
younesbelkada Apr 15, 2022
4db67cf
adding torch.float16 to config
younesbelkada Apr 19, 2022
4a13037
adding more metrics (min, max, mean)
younesbelkada Apr 19, 2022
32b77a4
add sum
younesbelkada Apr 19, 2022
cf02880
now the test passes with almost equal
younesbelkada Apr 19, 2022
9172116
add files for conversion - test passes on cpu gpu
younesbelkada Apr 22, 2022
32ca7bc
add final changes
younesbelkada Apr 22, 2022
a47a4e6
cleaning code
younesbelkada Apr 22, 2022
f50d041
add new args in the docstring
younesbelkada Apr 22, 2022
c719264
fix one liner function
younesbelkada Apr 22, 2022
203b17b
remove macros
younesbelkada Apr 22, 2022
3c843ac
remove forward attention
younesbelkada Apr 22, 2022
e2d845e
clean up init funtion
younesbelkada Apr 22, 2022
a2d3c0d
add comments on the issue
younesbelkada Apr 22, 2022
71240cd
rm scale mask softmax
younesbelkada Apr 22, 2022
ccf61a8
do make style
younesbelkada Apr 22, 2022
e6c802d
fix dtype in init
younesbelkada Apr 22, 2022
4c3944a
fixing for loop on att probs
younesbelkada Apr 25, 2022
44afc62
fix style with black
younesbelkada Apr 25, 2022
675c9ad
fix style + doc error
younesbelkada Apr 25, 2022
4fb6453
fix and debug CI errors (docs + style)
younesbelkada Apr 25, 2022
69e0e8f
some updates
younesbelkada Apr 29, 2022
a0ece0f
make use cache working
younesbelkada May 10, 2022
c366f27
add changes
younesbelkada May 5, 2022
975d3d6
add changes
younesbelkada May 10, 2022
a6d9231
test commit
younesbelkada May 10, 2022
e3437fb
final changes
younesbelkada May 10, 2022
3a710f4
changes - model + conversion
younesbelkada May 12, 2022
cc38b40
move to correct dir
younesbelkada May 12, 2022
bef7ff9
put ,
younesbelkada May 12, 2022
417c77c
fex fixes
younesbelkada May 12, 2022
4bf01e9
fix tokenizer autodoc
younesbelkada May 12, 2022
b79967c
fix minor CI issues
younesbelkada May 12, 2022
943e820
fix minor CI issues
younesbelkada May 12, 2022
9c8f385
fix minor CI issues
younesbelkada May 12, 2022
8400b48
Merge branch 'main' into bigscience176b
younesbelkada May 12, 2022
489cb22
fix style issue
younesbelkada May 12, 2022
3168103
fix minor import issues
younesbelkada May 12, 2022
bf936c7
fix few issues
younesbelkada May 12, 2022
4d64adc
remove def main on the test
younesbelkada May 12, 2022
8ba0768
add require torch
younesbelkada May 12, 2022
69a4675
replace decorator with 'with'
younesbelkada May 12, 2022
c210216
fix style
younesbelkada May 12, 2022
fadd48c
change to bloom
younesbelkada May 13, 2022
cc8457c
add quick fix tokenizer
younesbelkada May 13, 2022
afa817d
fix tokenizer file
younesbelkada May 13, 2022
e580cb5
fix tokenizer
younesbelkada May 13, 2022
19be820
fix import issue
younesbelkada May 13, 2022
4661d71
add bloom to readme
younesbelkada May 13, 2022
9ea7654
fix consistency
younesbelkada May 13, 2022
0c693f3
Update docs/source/en/model_doc/bloom.mdx
younesbelkada May 13, 2022
56e5aab
Apply suggestions from code review
younesbelkada May 13, 2022
7109db1
fix doc issue
younesbelkada May 13, 2022
c9e7dd8
small fix - modeling test
younesbelkada May 13, 2022
1ac3e5e
some changes
younesbelkada May 15, 2022
7f80ef3
remove useless division
younesbelkada May 15, 2022
b748343
more tests should pass
younesbelkada May 16, 2022
886737b
more tests should pass
younesbelkada May 16, 2022
5f601a4
more tests should pass
younesbelkada May 17, 2022
40a529b
let's try this one
younesbelkada May 17, 2022
3474a10
Merge branch 'main' into bigscience176b
younesbelkada May 17, 2022
21e3cb8
refactor
younesbelkada May 17, 2022
d784580
major changes
younesbelkada May 17, 2022
3687e53
modify readme
younesbelkada May 17, 2022
274b975
small fixes
younesbelkada May 17, 2022
181d011
small fix
younesbelkada May 17, 2022
9ccd19c
remove old test file from fetcher
younesbelkada May 17, 2022
a5691f1
fix small typo
younesbelkada May 17, 2022
82022d4
major change
younesbelkada May 17, 2022
5d8da14
remove onnx config
younesbelkada May 18, 2022
0668055
major changes
younesbelkada May 18, 2022
967b6f2
make style
younesbelkada May 18, 2022
cdf41e8
small change
younesbelkada May 18, 2022
20b4b32
adding a slow test + commenting old ones for now
younesbelkada May 18, 2022
2861f30
make style
younesbelkada May 18, 2022
4cfa5c7
Apply suggestions from code review
younesbelkada May 18, 2022
7f08d87
make style
younesbelkada May 18, 2022
8d49b08
fix duplicates
younesbelkada May 19, 2022
15e8509
cleaning comments on config
younesbelkada May 19, 2022
79a8611
clean a bit conversion file
younesbelkada May 19, 2022
9dea87f
refacor a bit modeling file
younesbelkada May 19, 2022
d5fc5c1
refactor tokenizer file
younesbelkada May 19, 2022
1c5b48d
fix tokenization test issue
younesbelkada May 19, 2022
5f91e98
fix tokenization issue #2
younesbelkada May 19, 2022
6305189
fix tokenization issue second try
younesbelkada May 19, 2022
734dc43
Merge branch 'bigscience176b' of https://github.com/younesbelkada/tra…
younesbelkada May 19, 2022
7d66d6f
fix test issue
younesbelkada May 19, 2022
6fa2fc1
make style + add suggestions
younesbelkada May 19, 2022
49be5c8
change test fetcher
younesbelkada May 19, 2022
c5d67c9
try this one
younesbelkada May 20, 2022
3d990f1
possible final changes
younesbelkada May 20, 2022
8a98470
make style
younesbelkada May 20, 2022
5535e34
try fix padding side issue
younesbelkada May 20, 2022
f0b5f5e
fix side
younesbelkada May 20, 2022
4284e5e
fix padding issue
younesbelkada May 20, 2022
de1f55c
fix ko-readme
younesbelkada May 20, 2022
531824a
fix config auto
younesbelkada May 20, 2022
540f579
cleaning modeling file
younesbelkada May 20, 2022
f5dcc7b
keep bloom in caps in ko
younesbelkada May 20, 2022
9a017ff
update config docs
younesbelkada May 20, 2022
40649de
remove pretraining_pp
younesbelkada May 20, 2022
743631b
remove model parallel
younesbelkada May 20, 2022
2706bf6
update config
younesbelkada May 23, 2022
5b9a94b
fix duplicates
younesbelkada May 23, 2022
8a0999b
fix fetcher
younesbelkada May 23, 2022
b4fd549
fix refactor issue
younesbelkada May 23, 2022
17638f8
try to remove alibi
younesbelkada May 24, 2022
7db6afe
small fixes
younesbelkada May 24, 2022
13a565b
put correct values
younesbelkada May 24, 2022
1450762
fix attention mask loop
younesbelkada May 24, 2022
76be6d1
small fixes:
younesbelkada May 24, 2022
ad3b871
small fixes
younesbelkada May 25, 2022
0ee5e5b
small changes
younesbelkada May 25, 2022
a9e14b5
small changes
younesbelkada May 25, 2022
ba04fec
small fixes
younesbelkada May 25, 2022
a643689
major changes
younesbelkada May 25, 2022
69904bf
fix readmes
younesbelkada May 25, 2022
3232a28
major changes
younesbelkada May 25, 2022
a967a8a
refactor a bit
younesbelkada May 25, 2022
77eabf1
refactor a bit
younesbelkada May 25, 2022
8767fe8
put correct name on test
younesbelkada May 25, 2022
307cd5f
change docstring
younesbelkada May 25, 2022
690323a
small changes
younesbelkada May 25, 2022
cce9498
fix small nit
younesbelkada May 25, 2022
88acef4
minor fix
younesbelkada May 26, 2022
b85e01a
minor fix
younesbelkada May 26, 2022
e1db789
forward contrib credits from PR14084
sIncerass May 26, 2022
401ca5e
Apply suggestions from code review
younesbelkada May 27, 2022
bf219d2
apply modifications
younesbelkada May 27, 2022
e768ab6
resolve softmax upcast
younesbelkada May 27, 2022
ffd36c7
Apply suggestions from code review
younesbelkada May 28, 2022
733d7f8
Update src/transformers/models/bloom/modeling_bloom.py
younesbelkada May 28, 2022
a3741d4
final changes modeling
younesbelkada May 28, 2022
06d98db
Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'
younesbelkada May 30, 2022
6844146
merge commit
younesbelkada May 30, 2022
1883390
Apply suggestions from code review
younesbelkada May 30, 2022
bd3dbfa
apply suggestions
younesbelkada May 30, 2022
ea1ab4e
Merge branch 'main' into bigscience176b
younesbelkada May 30, 2022
2969d59
Fix gradient checkpointing
younesbelkada May 31, 2022
04efa88
add slow but exact
younesbelkada May 31, 2022
8662ebf
add accelerate compatibility
younesbelkada May 31, 2022
1302638
forward contrib credits
younesbelkada May 31, 2022
fe672cc
Apply suggestions from code review
younesbelkada May 31, 2022
3db958b
fix torch device on tests
younesbelkada May 31, 2022
d998c8c
make style
younesbelkada May 31, 2022
47e5722
Apply suggestions from code review
younesbelkada May 31, 2022
cd1f5c1
fix nits
younesbelkada May 31, 2022
1d45173
remove final nits
younesbelkada May 31, 2022
41d2573
fix doc
younesbelkada May 31, 2022
a6e0b42
Update src/transformers/__init__.py
younesbelkada May 31, 2022
56e2928
Update src/transformers/models/bloom/modeling_bloom.py
younesbelkada May 31, 2022
5003f50
apply suggestions
younesbelkada May 31, 2022
8f3bffe
put test torchscript to false
younesbelkada May 31, 2022
ba1d9fc
Update src/transformers/models/bloom/modeling_bloom.py
younesbelkada Jun 1, 2022
95f8dc7
fix alibi
younesbelkada Jun 1, 2022
948e0fb
add small doc
younesbelkada Jun 1, 2022
694ee1c
make quality
younesbelkada Jun 1, 2022
7d026b4
replace torch.nn
younesbelkada Jun 2, 2022
9c2f84a
remove token type emb
younesbelkada Jun 3, 2022
1ca7c1a
fix fused op + output bias
younesbelkada Jun 3, 2022
fab4f62
add fused op
younesbelkada Jun 3, 2022
bd0bd2f
remove fused op
younesbelkada Jun 3, 2022
8b481ce
make quality
younesbelkada Jun 3, 2022
2907274
small changes
younesbelkada Jun 5, 2022
bce5132
Update src/transformers/models/bloom/modeling_bloom.py
younesbelkada Jun 5, 2022
5eecde7
fix slow
younesbelkada Jun 5, 2022
4d54e7f
make style
younesbelkada Jun 5, 2022
a24ae86
add accelerate support
younesbelkada Jun 6, 2022
f723b3e
add bloom to deepspeed tests
stas00 Jun 6, 2022
fadb367
minor changes
younesbelkada Jun 8, 2022
8ef8f6d
Apply suggestions from code review
younesbelkada Jun 8, 2022
1d63415
minor change
younesbelkada Jun 8, 2022
24c1df8
slow tests pass
younesbelkada Jun 8, 2022
de5b1c2
Apply suggestions from code review
younesbelkada Jun 8, 2022
6262d77
Update docs/source/en/model_doc/bloom.mdx
younesbelkada Jun 8, 2022
3a59eb3
minor changes:
younesbelkada Jun 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Expand Up @@ -240,6 +240,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Expand Down
1 change: 1 addition & 0 deletions README_ko.md
Expand Up @@ -221,6 +221,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Expand Down
1 change: 1 addition & 0 deletions README_zh-hans.md
Expand Up @@ -245,6 +245,7 @@ conda install -c huggingface transformers
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
Expand Down
1 change: 1 addition & 0 deletions README_zh-hant.md
Expand Up @@ -257,6 +257,7 @@ conda install -c huggingface transformers
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Expand Up @@ -174,6 +174,8 @@
title: Blenderbot
- local: model_doc/blenderbot-small
title: Blenderbot Small
- local: model_doc/bloom
title: BLOOM
- local: model_doc/bort
title: BORT
- local: model_doc/byt5
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/index.mdx
Expand Up @@ -63,6 +63,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Expand Down Expand Up @@ -189,6 +190,7 @@ Flax), PyTorch, and/or TensorFlow.
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ |
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ |
| BLOOM | ❌ | ✅ | ✅ | ❌ | ❌ |
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ |
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ |
Expand Down
47 changes: 47 additions & 0 deletions docs/source/en/model_doc/bloom.mdx
@@ -0,0 +1,47 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# BLOOM

## Overview

Bloom model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact.
The architecture of Bloom is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.
younesbelkada marked this conversation as resolved.
Show resolved Hide resolved
Several smaller versions of the models have been trained on the same dataset. Bloom is available in the following versions:
younesbelkada marked this conversation as resolved.
Show resolved Hide resolved

- [bloom-350m](https://huggingface.co/bigscience/bloom-350m)
- [bloom-760m](https://huggingface.co/bigscience/bloom-760m)
- [bloom-1b3](https://huggingface.co/bigscience/bloom-1b3)
- [bloom-2b5](https://huggingface.co/bigscience/bloom-2b5)
- [bloom-6b3](https://huggingface.co/bigscience/bloom-6b3)
- [bloom](https://huggingface.co/bigscience/bloom) (175B parameters)


## BloomConfig

[[autodoc]] BloomConfig
- all

## BloomModel

[[autodoc]] BloomModel
- forward

## BloomTokenizerFast

[[autodoc]] BloomTokenizerFast
- all

## BloomForCausalLM

[[autodoc]] BloomForCausalLM
- forward
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Expand Up @@ -156,6 +156,7 @@
"BlenderbotSmallConfig",
"BlenderbotSmallTokenizer",
],
"models.bloom": ["BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP", "BloomConfig"],
"models.bort": [],
"models.byt5": ["ByT5Tokenizer"],
"models.camembert": ["CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CamembertConfig"],
Expand Down Expand Up @@ -495,6 +496,7 @@
_import_structure["models.big_bird"].append("BigBirdTokenizerFast")
_import_structure["models.blenderbot"].append("BlenderbotTokenizerFast")
_import_structure["models.blenderbot_small"].append("BlenderbotSmallTokenizerFast")
_import_structure["models.bloom"].append("BloomTokenizerFast")
_import_structure["models.camembert"].append("CamembertTokenizerFast")
_import_structure["models.clip"].append("CLIPTokenizerFast")
_import_structure["models.convbert"].append("ConvBertTokenizerFast")
Expand Down Expand Up @@ -853,6 +855,14 @@
"BigBirdPegasusPreTrainedModel",
]
)
_import_structure["models.bloom"].extend(
[
"BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST",
"BloomForCausalLM",
"BloomModel",
"BloomPreTrainedModel",
]
)
_import_structure["models.blenderbot"].extend(
[
"BLENDERBOT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -2718,6 +2728,7 @@
BlenderbotSmallConfig,
BlenderbotSmallTokenizer,
)
from .models.bloom import BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP, BloomConfig
from .models.byt5 import ByT5Tokenizer
from .models.camembert import CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, CamembertConfig
from .models.canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig, CanineTokenizer
Expand Down Expand Up @@ -3025,6 +3036,7 @@
from .models.big_bird import BigBirdTokenizerFast
from .models.blenderbot import BlenderbotTokenizerFast
from .models.blenderbot_small import BlenderbotSmallTokenizerFast
from .models.bloom import BloomTokenizerFast
from .models.camembert import CamembertTokenizerFast
from .models.clip import CLIPTokenizerFast
from .models.convbert import ConvBertTokenizerFast
Expand Down Expand Up @@ -3340,6 +3352,12 @@
BlenderbotSmallModel,
BlenderbotSmallPreTrainedModel,
)
from .models.bloom import (
BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST,
BloomForCausalLM,
BloomModel,
BloomPreTrainedModel,
)
from .models.camembert import (
CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
CamembertForCausalLM,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Expand Up @@ -31,6 +31,7 @@
bigbird_pegasus,
blenderbot,
blenderbot_small,
bloom,
bort,
byt5,
camembert,
Expand Down
5 changes: 3 additions & 2 deletions src/transformers/models/auto/configuration_auto.py
Expand Up @@ -38,6 +38,7 @@
("bigbird_pegasus", "BigBirdPegasusConfig"),
("blenderbot", "BlenderbotConfig"),
("blenderbot-small", "BlenderbotSmallConfig"),
("bloom", "BloomConfig"),
("camembert", "CamembertConfig"),
("canine", "CanineConfig"),
("clip", "CLIPConfig"),
Expand All @@ -51,7 +52,6 @@
("deberta", "DebertaConfig"),
("deberta-v2", "DebertaV2Config"),
("decision_transformer", "DecisionTransformerConfig"),
("decision_transformer", "DecisionTransformerConfig"),
("deit", "DeiTConfig"),
("detr", "DetrConfig"),
("distilbert", "DistilBertConfig"),
Expand Down Expand Up @@ -153,6 +153,7 @@
("bigbird_pegasus", "BIGBIRD_PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bloom", "BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("camembert", "CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("canine", "CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("clip", "CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -258,6 +259,7 @@
("bigbird_pegasus", "BigBirdPegasus"),
("blenderbot", "Blenderbot"),
("blenderbot-small", "BlenderbotSmall"),
("bloom", "BLOOM"),
("bort", "BORT"),
("byt5", "ByT5"),
("camembert", "CamemBERT"),
Expand Down Expand Up @@ -356,7 +358,6 @@
("van", "VAN"),
("vilt", "ViLT"),
("vision-encoder-decoder", "Vision Encoder decoder"),
("vision-encoder-decoder", "Vision Encoder decoder"),
("vision-text-dual-encoder", "VisionTextDualEncoder"),
("visual_bert", "VisualBert"),
("vit", "ViT"),
Expand Down