Add Switch transformers #19323

younesbelkada · 2022-10-04T16:59:56Z

What does this PR do?

This PR attempts to add Switch Transformers from t5x with @ArthurZucker & @thomwolf

The architecture seems to be similar to a t5 architecture (the architecture is copied from T5), where the FF layer is slightly modified, introducing the first Mixture of Experts (MoE) architecture inside transformers library.

paper: https://arxiv.org/abs/2101.03961
weights: https://github.com/google-research/t5x/blob/eb42c2524bf65c8a46624f1a9b9e034d9bc65b14/docs/models.md#converted-mesh-tensorflow-checkpoints
original modeling code: https://github.com/google/flaxformer/tree/b725bd2a51d70e866d819c92de166fbf24425e6a/flaxformer/architectures/moe

TODOs:

Make the forward pass run
Convert the weights in Pytorch format and upload them on the Hub
Match the logits between the original implementation and ours

- remove `tf` modeling files

- Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement

- completed the docstring in `router.py` - added more args in the config

…to add_switch_transformers

tests/pipelines/test_pipelines_summarization.py

…lkada/transformers into add_switch_transformers

- add better casting for `Linear8bitLt` - remove `torchscript` tests

…lkada/transformers into add_switch_transformers

younesbelkada · 2022-11-04T17:53:14Z

Thanks a lot @sgugger for your comments!
Would love to have another round of review as I added some modification for accelerate and bnb compatibility 🙏

- doctest - remove `_keys_to_ignore_on_load_unexpected`

sgugger

Looking great!

src/transformers/models/switch_transformers/configuration_switch_transformers.py

sgugger · 2022-11-04T19:43:34Z

src/transformers/models/switch_transformers/modeling_switch_transformers.py

+        """
+        router_probs, router_logits = self._compute_router_probabilities(hidden_states)
+
+        # Flax code for reference TODO check what happens with padded inputs here


Flagging this just in case!

src/transformers/models/switch_transformers/modeling_switch_transformers.py

sgugger · 2022-11-04T19:45:00Z

src/transformers/models/switch_transformers/modeling_switch_transformers.py

+    _keys_to_ignore_on_load_missing = [
+        r"encoder.embed_tokens.weight",
+        r"decoder.embed_tokens.weight",
+        r"lm_head.weight",
+    ]


Nit: can take less vertical space.

Managed to change it for the attributes above in 16e7ff5 but not for this one (not sure why the decorator above is not affected by the vertical formatting though 🤔

LysandreJik

Impressive contribution! Great to have so many pretrained checkpoints at release

src/transformers/models/switch_transformers/configuration_switch_transformers.py

LysandreJik · 2022-11-04T21:14:58Z

src/transformers/models/switch_transformers/modeling_switch_transformers.py

+        }
+        return dummy_inputs
+
+    def _init_weights(self, module):


Impressive method!

src/transformers/models/switch_transformers/configuration_switch_transformers.py

…ch_transformers.py

…mers

HuggingFaceDocBuilderDev · 2022-11-15T10:36:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

HuggingFaceDocBuilderDev · 2022-11-15T10:56:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

HuggingFaceDocBuilderDev · 2022-11-15T11:10:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

younesbelkada · 2022-11-15T12:06:38Z

Failing tests seems to be unrelated to this PR, merging!

* first commit * add more comments * add router v1 * clean up - remove `tf` modeling files * clean up - remove `tf` modeling files * clean up * v0 routers * added more router - Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement * last router * improved docstring - completed the docstring in `router.py` - added more args in the config * v0 sparse mlp * replace wrong naming * forward pass run * update MOE layer * small router update * fixup * consistency * remove scatter router * remove abstract layer * update test and model for integration testing * v1 conversion * update * hardcode hack * all keys match * add gin conversion, without additional libraries * update conversion sctipy * delete router file * update tests wrt router deletion * fix router issues * update expert code * update, logits match, code needsREFACTORING * Refactor code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * add generate tests Co-authored-by: younesbelkada <younesbelkada@gmail.com> * add support for router loss Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix forward error * refactor a bit * remove `FlaxSwitchTransformers` modules * more tests pass * Update code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fixup * fix tests * fix doc * fix doc + tokenization * fix tokenizer test * fix test * fix loss output * update code for backward pass * add loss support * update documentation * fix documentation, clean tokenizer * more doc fix, cleanup example_switch * fix failing test * fix test * fix test * fix loss issue * move layer * update doc and fix router capacity usage * fixup * add sparse mlp index for documentation on hub * fixup * test sparse mix architecture * Apply suggestions from code review * Update docs/source/en/model_doc/switch_transformers.mdx * fixup on update * fix tests * fix another test * attempt fix * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * try * all tests pass * fix jitter noise * Apply suggestions from code review * doc tests pass * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove assert * change config order * fix readme japanese * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove parallelizable tests + add one liners * remove ONNX config * fix nits - add `T5Tokenizer` in auto mapping - remove `Switch Transformers` from ONNX supported models * remove `_get_router` * remove asserts * add check in test for `router_dtype` * add `SwitchTransformersConfig` in `run_pipeline_test` * Update tests/pipelines/test_pipelines_summarization.py * add huge model conversion script * fix slow tests - add better casting for `Linear8bitLt` - remove `torchscript` tests * add make dir * style on new script * fix nits - doctest - remove `_keys_to_ignore_on_load_unexpected` * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py * add google as authors * fix year * remove last `assert` statements * standardize vertical spaces * fix failing import * fix another failing test * Remove strange àuthorized_keys` * removing todo and padding that is never used Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: ybelkada <younes@huggingface.co> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur@huggingface.co>

younesbelkada and others added 22 commits October 4, 2022 12:24

first commit

d45930b

add more comments

59c6512

add router v1

0906870

clean up

9c7643c

- remove `tf` modeling files

clean up

cddfce7

- remove `tf` modeling files

clean up

85c34e9

v0 routers

d5d092c

added more router

62d34bd

- Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement

last router

2eea820

improved docstring

a65c7e4

- completed the docstring in `router.py` - added more args in the config

v0 sparse mlp

7f1026d

Merge branch 'main' of https://github.com/huggingface/transformers in…

7e888c9

…to add_switch_transformers

replace wrong naming

4847670

forward pass run

eeb2877

update MOE layer

5f42b6b

small router update

60ab566

fixup

ae2fbc4

consistency

d7ba596

remove scatter router

1ae5563

remove abstract layer

60ec299

update test and model for integration testing

6181bfa

v1 conversion

a6a7d57

younesbelkada force-pushed the add_switch_transformers branch 3 times, most recently from 7e4ff1f to 1397231 Compare October 20, 2022 15:43

ArthurZucker added 2 commits October 20, 2022 17:04

update

2e2be49

hardcode hack

6ede608

ArthurZucker force-pushed the add_switch_transformers branch from 1397231 to 6ede608 Compare October 20, 2022 17:12

ybelkada and others added 2 commits October 24, 2022 10:22

all keys match

b9cac05

add gin conversion, without additional libraries

6276ce7

younesbelkada commented Nov 4, 2022

View reviewed changes

tests/pipelines/test_pipelines_summarization.py Outdated Show resolved Hide resolved

younesbelkada and others added 7 commits November 4, 2022 15:35

Update tests/pipelines/test_pipelines_summarization.py

7c3e5aa

add huge model conversion script

1326126

Merge branch 'add_switch_transformers' of https://github.com/younesbe…

a3269a2

…lkada/transformers into add_switch_transformers

fix slow tests

c7dec49

- add better casting for `Linear8bitLt` - remove `torchscript` tests

add make dir

edc7723

Merge branch 'add_switch_transformers' of https://github.com/younesbe…

4fc8b67

…lkada/transformers into add_switch_transformers

style on new script

d3a7795

younesbelkada requested a review from sgugger November 4, 2022 17:53

fix nits

437eef7

- doctest - remove `_keys_to_ignore_on_load_unexpected`

sgugger approved these changes Nov 4, 2022

View reviewed changes

LysandreJik approved these changes Nov 4, 2022

View reviewed changes

younesbelkada commented Nov 7, 2022

View reviewed changes

src/transformers/models/switch_transformers/configuration_switch_transformers.py Outdated Show resolved Hide resolved

younesbelkada and others added 8 commits November 7, 2022 11:42

Update src/transformers/models/switch_transformers/configuration_swit…

f82f0cf

…ch_transformers.py

add google as authors

cbb4c77

fix year

2f58069

remove last assert statements

2dd9cff

standardize vertical spaces

16e7ff5

Merge remote-tracking branch 'upstream/main' into add_switch_transfor…

98c62ef

…mers

fix failing import

1208b86

fix another failing test

fc4921d

Remove strange àuthorized_keys`

21010f7

removing todo and padding that is never used

0188acd

younesbelkada merged commit 163ac3d into huggingface:main Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Switch transformers #19323

Add Switch transformers #19323

younesbelkada commented Oct 4, 2022 •

edited

younesbelkada commented Nov 4, 2022

sgugger left a comment

sgugger Nov 4, 2022

sgugger Nov 4, 2022

younesbelkada Nov 7, 2022

LysandreJik left a comment

LysandreJik Nov 4, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

younesbelkada commented Nov 15, 2022

Add Switch transformers #19323

Add Switch transformers #19323

Conversation

younesbelkada commented Oct 4, 2022 • edited

What does this PR do?

TODOs:

younesbelkada commented Nov 4, 2022

sgugger left a comment

Choose a reason for hiding this comment

sgugger Nov 4, 2022

Choose a reason for hiding this comment

sgugger Nov 4, 2022

Choose a reason for hiding this comment

younesbelkada Nov 7, 2022

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Nov 4, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 15, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

younesbelkada commented Nov 15, 2022

younesbelkada commented Oct 4, 2022 •

edited