Transformers

State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.

This is the documentation of our repository transformers.

Features

High performance on NLU and NLG tasks
Low barrier to entry for educators and practitioners

State-of-the-art NLP for everyone:

Deep learning researchers
Hands-on practitioners
AI/ML/NLP teachers and educators

Lower compute costs, smaller carbon footprint:

Researchers can share trained models instead of always retraining
Practitioners can reduce compute time and production costs
8 architectures with over 30 pretrained models, some in more than 100 languages

Choose the right framework for every part of a model's lifetime:

Train state-of-the-art models in 3 lines of code
Deep interoperability between TensorFlow 2.0 and PyTorch models
Move a single model between TF2.0/PyTorch frameworks at will
Seamlessly pick the right framework for training, evaluation, production

Experimental support for Flax with a few models right now, expected to grow in the coming months.

All the model checkpoints are seamlessly integrated from the huggingface.co model hub where they are uploaded directly by users and organizations.

Current number of checkpoints:

GET STARTED contains a quick tour, the installation instructions and some useful information about our philosophy and a glossary.
USING 🤗 TRANSFORMERS contains general tutorials on how to use the library.
ADVANCED GUIDES contains more advanced guides that are more specific to a given script or part of the library.
RESEARCH focuses on tutorials that have less to do with how to use the library but more about general research in transformers model
The three last section contain the documentation of each public class and function, grouped in:
- MAIN CLASSES for the main classes exposing the important APIs of the library.
- MODELS for the classes and functions related to each model implemented in the library.
- INTERNAL HELPERS for the classes and functions we use internally.

The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts and conversion utilities for the following models:

:doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
:doc:`BART <model_doc/bart>` (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
:doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
:doc:`BERT <model_doc/bert>` (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
:doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
:doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
:doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
:doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
:doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
:doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
:doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
:doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
:doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
:doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
:doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
:doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
:doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
:doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
:doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
:doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
:doc:`MarianMT <model_doc/marian>` Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
:doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
:doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
:doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
:doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
:doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
:doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
:doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. ultilingual BERT into DistilmBERT and a German version of DistilBERT.
:doc:`SqueezeBert <model_doc/squeezebert>` released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
:doc:`T5 <model_doc/t5>` (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
:doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
:doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
:doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
:doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
:doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.

The table below represents the current support in the library for each of those models, whether they have a Python tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in PyTorch, TensorFlow and/or Flax.

.. rst-class:: center-aligned-table

Model	Tokenizer slow	Tokenizer fast	PyTorch support	TensorFlow support	Flax Support
ALBERT	✅	✅	✅	✅	❌
BART	✅	✅	✅	✅	❌
BERT	✅	✅	✅	✅	✅
Bert Generation	✅	❌	✅	❌	❌
Blenderbot	✅	❌	✅	✅	❌
CTRL	✅	❌	✅	✅	❌
CamemBERT	✅	✅	✅	✅	❌
DPR	✅	✅	✅	✅	❌
DeBERTa	✅	❌	✅	❌	❌
DistilBERT	✅	✅	✅	✅	❌
ELECTRA	✅	✅	✅	✅	❌
Encoder decoder	❌	❌	✅	❌	❌
FairSeq Machine-Translation	✅	❌	✅	❌	❌
FlauBERT	✅	❌	✅	✅	❌
Funnel Transformer	✅	✅	✅	✅	❌
LXMERT	✅	✅	✅	✅	❌
LayoutLM	✅	✅	✅	❌	❌
Longformer	✅	✅	✅	✅	❌
MPNet	✅	✅	✅	✅	❌
Marian	✅	❌	✅	✅	❌
MobileBERT	✅	✅	✅	✅	❌
OpenAI GPT	✅	✅	✅	✅	❌
OpenAI GPT-2	✅	✅	✅	✅	❌
Pegasus	✅	✅	✅	✅	❌
ProphetNet	✅	❌	✅	❌	❌
RAG	✅	❌	✅	❌	❌
Reformer	✅	✅	✅	❌	❌
RetriBERT	✅	✅	✅	❌	❌
RoBERTa	✅	✅	✅	✅	✅
SqueezeBERT	✅	✅	✅	❌	❌
T5	✅	✅	✅	✅	❌
TAPAS	✅	❌	✅	❌	❌
Transformer-XL	✅	❌	✅	✅	❌
XLM	✅	❌	✅	✅	❌
XLM-RoBERTa	✅	✅	✅	✅	❌
XLMProphetNet	✅	❌	✅	❌	❌
XLNet	✅	✅	✅	✅	❌
mBART	✅	✅	✅	✅	❌
mT5	✅	✅	✅	✅	❌

.. toctree::
    :maxdepth: 2
    :caption: Get started

    quicktour
    installation
    philosophy
    glossary

.. toctree::
    :maxdepth: 2
    :caption: Using 🤗 Transformers

    task_summary
    model_summary
    preprocessing
    training
    model_sharing
    tokenizer_summary
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
    examples
    custom_datasets
    notebooks
    converting_tensorflow_models
    migration
    contributing
    testing
    serialization

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
    perplexity
    benchmarks

.. toctree::
    :maxdepth: 2
    :caption: Main Classes

    main_classes/callback
    main_classes/configuration
    main_classes/logging
    main_classes/model
    main_classes/optimizer_schedules
    main_classes/output
    main_classes/pipelines
    main_classes/processors
    main_classes/tokenizer
    main_classes/trainer

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
    model_doc/auto
    model_doc/bart
    model_doc/barthez
    model_doc/bert
    model_doc/bertgeneration
    model_doc/blenderbot
    model_doc/camembert
    model_doc/ctrl
    model_doc/deberta
    model_doc/dialogpt
    model_doc/distilbert
    model_doc/dpr
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
    model_doc/fsmt
    model_doc/funnel
    model_doc/layoutlm
    model_doc/longformer
    model_doc/lxmert
    model_doc/marian
    model_doc/mbart
    model_doc/mobilebert
    model_doc/mpnet
    model_doc/mt5
    model_doc/gpt
    model_doc/gpt2
    model_doc/pegasus
    model_doc/prophetnet
    model_doc/rag
    model_doc/reformer
    model_doc/retribert
    model_doc/roberta
    model_doc/squeezebert
    model_doc/t5
    model_doc/tapas
    model_doc/transformerxl
    model_doc/xlm
    model_doc/xlmprophetnet
    model_doc/xlmroberta
    model_doc/xlnet

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

    internal/modeling_utils
    internal/pipelines_utils
    internal/tokenization_utils
    internal/trainer_utils
    internal/generation_utils

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

Transformers

Features

Contents

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

Transformers

Features

Contents