Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LayoutLMv2 + LayoutXLM #12604

Merged
merged 114 commits into from Aug 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
4744ab8
First commit
NielsRogge Jul 6, 2021
0a3a7e4
Make style
NielsRogge Jul 6, 2021
39da694
Fix dummy objects
NielsRogge Jul 6, 2021
5b18c28
Add Detectron2 config
NielsRogge Jul 6, 2021
5348460
Add LayoutLMv2 pooler
NielsRogge Jul 6, 2021
9be733e
More improvements, add documentation
NielsRogge Jul 6, 2021
3814c54
More improvements
NielsRogge Jul 6, 2021
125ada5
Add model tests
NielsRogge Jul 6, 2021
76a5a0f
Add clarification regarding image input
NielsRogge Jul 6, 2021
480ebe1
Improve integration test
NielsRogge Jul 6, 2021
5b2f585
Fix bug
NielsRogge Jul 6, 2021
060a684
Fix another bug
NielsRogge Jul 6, 2021
5e61df4
Fix another bug
NielsRogge Jul 6, 2021
e731f67
Fix another bug
NielsRogge Jul 6, 2021
d0ca865
More improvements
NielsRogge Jul 6, 2021
604dd9b
Make more tests pass
NielsRogge Jul 6, 2021
b4c172e
Make more tests pass
NielsRogge Jul 7, 2021
7fb70b5
Improve integration test
NielsRogge Jul 7, 2021
e6c1318
Remove gradient checkpointing and add head masking
NielsRogge Jul 7, 2021
aaef300
Add integration test
NielsRogge Jul 7, 2021
b470d03
Add LayoutLMv2ForSequenceClassification to the tests
NielsRogge Jul 7, 2021
dfe5ea7
Add LayoutLMv2ForQuestionAnswering
NielsRogge Jul 8, 2021
59c1cf6
More improvements
NielsRogge Jul 8, 2021
33ffd98
More improvements
NielsRogge Jul 8, 2021
aa15dbf
Small improvements
NielsRogge Jul 9, 2021
28b576a
Fix _LazyModule
NielsRogge Jul 9, 2021
6229e02
Fix fast tokenizer
NielsRogge Jul 9, 2021
d9ff738
Move sync_batch_norm to a separate method
NielsRogge Jul 12, 2021
c681657
Replace dummies by requires_backends
NielsRogge Jul 13, 2021
fa97538
Move calculation of visual bounding boxes to separate method + update…
NielsRogge Jul 13, 2021
ba0bc0e
Add models to main init
NielsRogge Jul 13, 2021
cd67bfa
First draft
NielsRogge Jul 15, 2021
287abfa
More improvements
NielsRogge Jul 15, 2021
8c0948f
More improvements
NielsRogge Jul 15, 2021
88be5de
More improvements
NielsRogge Jul 16, 2021
373811f
More improvements
NielsRogge Jul 16, 2021
48c53c0
More improvements
NielsRogge Jul 16, 2021
b92db14
Remove is_split_into_words
NielsRogge Jul 16, 2021
fcb505a
More improvements
NielsRogge Jul 16, 2021
86bb3ab
Simply tesseract - no use of pandas anymore
NielsRogge Jul 16, 2021
0ae53ff
Add LayoutLMv2Processor
NielsRogge Jul 16, 2021
1adbaf8
Update is_pytesseract_available
NielsRogge Jul 17, 2021
d5cf7c2
Fix bugs
NielsRogge Jul 17, 2021
0382104
Improve feature extractor
NielsRogge Jul 18, 2021
d06248d
Fix bug
NielsRogge Jul 18, 2021
075590b
Add print statement
NielsRogge Jul 18, 2021
b6b277e
Add truncation of bounding boxes
NielsRogge Jul 18, 2021
258060a
Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer
NielsRogge Jul 19, 2021
2a166ca
Improve tokenizer tests
NielsRogge Jul 19, 2021
ab3b0ef
Make more tokenizer tests pass
NielsRogge Jul 19, 2021
214b491
Make more tests pass, add integration tests
NielsRogge Jul 19, 2021
0ae6e3b
Finish integration tests
NielsRogge Jul 19, 2021
ea84ad6
More improvements
NielsRogge Jul 19, 2021
bba6100
More improvements - update API of the tokenizer
NielsRogge Jul 20, 2021
ebc2541
More improvements
NielsRogge Jul 20, 2021
93d93b7
Remove support for VQA training
NielsRogge Jul 20, 2021
0b4c97b
Remove some files
NielsRogge Jul 20, 2021
5a24365
Improve feature extractor
NielsRogge Jul 20, 2021
f04672c
Improve documentation and one more tokenizer test
NielsRogge Jul 21, 2021
98ca2a2
Make quality and small docs improvements
NielsRogge Jul 26, 2021
7804d69
Add batched tests for LayoutLMv2Processor, remove fast tokenizer
NielsRogge Jul 26, 2021
0ea905b
Add truncation of labels
NielsRogge Jul 26, 2021
8db4e13
Apply suggestions from code review
NielsRogge Jul 27, 2021
0e7d10e
Improve processor tests
NielsRogge Jul 28, 2021
4bccc97
Fix failing tests and add suggestion from code review
NielsRogge Jul 28, 2021
fd12133
Fix tokenizer test
NielsRogge Jul 28, 2021
23d0570
Add detectron2 CI job
NielsRogge Aug 5, 2021
40c1b6d
Simplify CI job
NielsRogge Aug 5, 2021
124dd86
Comment out non-detectron2 jobs and specify number of processes
NielsRogge Aug 5, 2021
c59bffe
Add pip install torchvision
NielsRogge Aug 5, 2021
c299ff0
Add durations to see which tests are slow
NielsRogge Aug 5, 2021
f7ea2fe
Fix tokenizer test and make model tests smaller
NielsRogge Aug 5, 2021
da85fbc
Frist draft
NielsRogge Aug 6, 2021
0401e4d
Use setattr
NielsRogge Aug 6, 2021
2e43af8
Possible fix
LysandreJik Aug 6, 2021
e6d6efc
Proposal with configuration
LysandreJik Aug 6, 2021
546bfb9
First draft of fast tokenizer
NielsRogge Aug 5, 2021
507d724
More improvements
NielsRogge Aug 6, 2021
4101b29
Enable fast tokenizer tests
NielsRogge Aug 6, 2021
a582226
Make more tests pass
NielsRogge Aug 6, 2021
67cca2f
Make more tests pass
NielsRogge Aug 7, 2021
2379176
More improvements
NielsRogge Aug 13, 2021
c8151e7
Addd padding to fast tokenizer
NielsRogge Aug 13, 2021
d6ea661
Mkae more tests pass
NielsRogge Aug 13, 2021
b0c7eca
Make more tests pass
NielsRogge Aug 13, 2021
7613c27
Make all tests pass for fast tokenizer
NielsRogge Aug 13, 2021
38934e9
Make fast tokenizer support overflowing boxes and labels
NielsRogge Aug 13, 2021
066a9ec
Add support for overflowing_labels to slow tokenizer
NielsRogge Aug 13, 2021
4446c8a
Add support for fast tokenizer to the processor
NielsRogge Aug 16, 2021
42ebf01
Update processor tests for both slow and fast tokenizers
NielsRogge Aug 16, 2021
5082dbd
Add head models to model mappings
NielsRogge Aug 16, 2021
b703ea2
Make style & quality
NielsRogge Aug 16, 2021
b22011d
Remove Detectron2 config file
NielsRogge Aug 16, 2021
beb6f69
Add configurable option to label all subwords
NielsRogge Aug 17, 2021
be1eaa1
Fix test
LysandreJik Aug 17, 2021
659bd94
Skip visual segment embeddings in test
NielsRogge Aug 17, 2021
66b5cbe
Use ResNet-18 backbone in tests instead of ResNet-101
NielsRogge Aug 17, 2021
ba5a44f
Proposal
LysandreJik Aug 17, 2021
a8e6997
Re-enable all jobs on CI
NielsRogge Aug 17, 2021
3ab8384
Fix installation of tesseract
NielsRogge Aug 17, 2021
c417b04
Fix failing test
NielsRogge Aug 17, 2021
84b33b8
Fix index table
NielsRogge Aug 17, 2021
6142f97
Add LayoutXLM doc page, first draft of code examples
NielsRogge Aug 18, 2021
2c5d412
Improve documentation a lot
NielsRogge Aug 18, 2021
e86b4cf
Update expected boxes for Tesseract 4.0.0 beta
NielsRogge Aug 19, 2021
9eb372b
Use offsets to create labels instead of checking if they start with ##
NielsRogge Aug 20, 2021
e5c5f61
Update expected boxes for Tesseract 4.1.1
NielsRogge Aug 26, 2021
5114199
Fix conflict
NielsRogge Aug 26, 2021
d429492
Make variable names cleaner, add docstring, add link to notebooks
NielsRogge Aug 26, 2021
2e0a4f9
Revert "Fix conflict"
NielsRogge Aug 26, 2021
748308f
Revert to make integration test pass
NielsRogge Aug 26, 2021
e273e77
Apply suggestions from @LysandreJik's review
NielsRogge Aug 26, 2021
a72f080
Address @patrickvonplaten's comments
NielsRogge Aug 27, 2021
2391ca5
Remove fixtures DocVQA in favor of dataset on the hub
NielsRogge Aug 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 39 additions & 0 deletions .circleci/config.yml
Expand Up @@ -798,6 +798,44 @@ jobs:
- run: pip install requests
- run: python ./utils/link_tester.py

run_tests_layoutlmv2:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install .[torch,testing,vision]
- run: pip install torchvision
- run: python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
- run: sudo apt install tesseract-ocr
- run: pip install pytesseract
- save_cache:
key: v0.4-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python utils/tests_fetcher.py | tee test_preparation.txt
- store_artifacts:
path: ~/transformers/test_preparation.txt
- run: |
if [ -f test_list.txt ]; then
python -m pytest -n 1 tests/*layoutlmv2* --dist=loadfile -s --make-reports=tests_layoutlmv2 --durations=100
fi
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports

# TPU JOBS
run_examples_tpu:
docker:
Expand Down Expand Up @@ -852,6 +890,7 @@ workflows:
- run_tests_onnxruntime
- run_tests_hub
- build_doc
- run_tests_layoutlmv2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @sgugger for this especially - the LayoutLMv2 tests require detectron2 to be installed, which takes quite a while - as well as pytesseract. Therefore, we've opted for a separate job so as to not weigh down the existing tests. Eventually, in the best of worlds, these tests would only trigger if some changes have been detected in the files. It's already somewhat the case, but the whole installation step still happens regardless. This would imply running the test fetcher one step above so that it may decide on which jobs to run.

This would help for troublesome models such as TAPAS, where installations are cumbersome.

Let's discuss this in the near future!

- deploy_doc: *workflow_filters
nightly:
triggers:
Expand Down
2 changes: 2 additions & 0 deletions README.md
Expand Up @@ -244,6 +244,8 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/transformers/model_doc/led.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LUKE](https://huggingface.co/transformers/model_doc/luke.html)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
Expand Down
75 changes: 43 additions & 32 deletions docs/source/index.rst
Expand Up @@ -202,99 +202,106 @@ Supported models
34. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
35. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
35. :doc:`LayoutLMv2 <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutLMv2:
Multi-modal Pre-training for Visually-Rich Document Understanding <https://arxiv.org/abs/2012.14740>`__ by Yang Xu,
Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min
Zhang, Lidong Zhou.
36. :doc:`LayoutXLM <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutXLM:
Multimodal Pre-training for Multilingual Visually-rich Document Understanding <https://arxiv.org/abs/2104.08836>`__
by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
37. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
36. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
38. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
37. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
39. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
38. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
40. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
by Hao Tan and Mohit Bansal.
39. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
41. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
Machine Translation <https://arxiv.org/abs/2010.11125>`__ by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi
Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman
Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
40. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
42. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
Translator Team.
41. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
43. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
42. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
44. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
43. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
45. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
44. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
46. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
45. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
47. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
Jianfeng Lu, Tie-Yan Liu.
46. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
48. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
47. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
49. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
Mohammad Saleh and Peter J. Liu.
48. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
50. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
49. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
51. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
50. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
52. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
pre-trained language models <https://arxiv.org/pdf/2010.12821.pdf>`__ by Hyung Won Chung, Thibault Févry, Henry
Tsai, M. Johnson, Sebastian Ruder.
51. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
53. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
52. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
54. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
53. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
55. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
54. `Splinter <https://huggingface.co/transformers/master/model_doc/splinter.html>`__ (from Tel Aviv University),
56. `Splinter <https://huggingface.co/transformers/master/model_doc/splinter.html>`__ (from Tel Aviv University),
released together with the paper `Few-Shot Question Answering by Pretraining Span Selection
<https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
55. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
57. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
Krishna, and Kurt W. Keutzer.
56. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
58. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
57. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
59. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
Francesco Piccinno and Julian Martin Eisenschlos.
58. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
60. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
59. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
61. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
60. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
62. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
61. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
63. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
Zhou, Abdelrahman Mohamed, Michael Auli.
62. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
64. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
63. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
65. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
64. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
66. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
Zettlemoyer and Veselin Stoyanov.
65. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
67. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
66. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
68. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.

Expand Down Expand Up @@ -372,6 +379,8 @@ Flax), PyTorch, and/or TensorFlow.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
Expand Down Expand Up @@ -550,6 +559,8 @@ Flax), PyTorch, and/or TensorFlow.
model_doc/herbert
model_doc/ibert
model_doc/layoutlm
model_doc/layoutlmv2
model_doc/layoutxlm
model_doc/led
model_doc/longformer
model_doc/luke
Expand Down