Deberta tf #12972

kamalkraj · 2021-08-01T12:03:21Z

What does this PR do?

TFDeBERTa implementation
@patrickvonplaten, @LysandreJik

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sgugger

Thanks for your PR! My main concern is the added dependency, but I think we don't really need it. I've left pointers on how to avoid using it.

setup.py

src/transformers/__init__.py

src/transformers/models/deberta/modeling_tf_deberta.py

sgugger · 2021-08-06T11:18:08Z

As a result of #13023 , you will need to rebase your PR on master and solve the merge conflicts (basically, you will just need to re-add the models in the auto-mappings as strings). Let us know if you need any help with that.

LysandreJik

This looks good! I'd love for @Rocketknight1 to go over the TF code, and pinging @BigBird01 as he's the author of the model and contributed the PyTorch version.

src/transformers/models/deberta/modeling_tf_deberta.py

BigBird01 · 2021-08-06T19:40:37Z

Glad to see a tf version! Thank you!

moved weights to build and fixed name scope added missing , bug fixes to enable graph mode execution updated setup.py fixing typo fix imports embedding mask fix added layer names avoid autmatic incremental names +XSoftmax cleanup added names to layer disable keras_serializable Distangled attention output shape hidden_size==None using symbolic inputs test for Deberta tf make style Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> removed tensorflow-probability removed blank line

@Rocketknight1

+torch_gather tf implementation from @Rocketknight1

to same as pt model

src/transformers/models/deberta/modeling_tf_deberta.py

patrickvonplaten · 2021-08-08T11:14:46Z

src/transformers/models/deberta/modeling_tf_deberta.py

+        self.dropout = StableDropout(config.hidden_dropout_prob, name="dropout")
+
+    def build(self, input_shape: tf.TensorShape):
+        with tf.name_scope("word_embeddings"):


not a huge fan of the tf.name_scope(...) here as it makes the weight naming less flexible - could we instead craete three new tf.keras.layers.Layer classes (one for each embedding) to have a 1-to-1 translation from PyTorch?

using https://huggingface.co/transformers/internal/modeling_utils.html#transformers.modeling_tf_utils.TFSharedEmbeddings ?

Sorry - I just noticed that this has become the standard in modeling_tf_bert.py as well:

transformers/src/transformers/models/bert/modeling_tf_bert.py

Line 147 in c4e1586

def build(self, input_shape: tf.TensorShape):

=> all good then! Please ignore my previous comment :-)

tests/test_modeling_tf_deberta.py

patrickvonplaten

Thanks for the great PR! The PR seems to be in a very good shape already. Mostly left nits, but would be happy if we could:

Add a TFDeberta prefix to most layer classes. This helps when looking up modules later (I know that in PyTocrh we also didn't append a Deberta prefix to all modules, but we should have done this IMO.
Avoid using with tf.name_scope and instead replicate the PyTorch weight structure 1-to-1

LysandreJik

Thank you, @kamalkraj! Would you be interested in contributing the TensorFlow version of the DeBERTa-v2 model too? :)

kamalkraj · 2021-08-12T10:34:18Z

@LysandreJik
Yes, I am interested in contributing the DeBERTa-v2 model also

sgugger reviewed Aug 2, 2021

View reviewed changes

sgugger requested review from LysandreJik and Rocketknight1 August 2, 2021 06:26

LysandreJik reviewed Aug 6, 2021

View reviewed changes

Rocketknight1 reviewed Aug 6, 2021

View reviewed changes

src/transformers/models/deberta/modeling_tf_deberta.py Outdated Show resolved Hide resolved

kamalkraj force-pushed the deberta-tf branch from f8470ae to f84fa5e Compare August 7, 2021 13:42

kamalkraj added 7 commits August 7, 2021 22:29

removed tf experimental api

2301fe8

+torch_gather tf implementation from @Rocketknight1

layername DeBERTa --> deberta

07d9f8d

copyright fix

d9cbd74

added docs for TFDeberta & make style

28e7090

layer_name change to fix load from pt model

626e612

layer_name change as pt model

3cfa015

SequenceClassification layername change,

45916c7

to same as pt model

kamalkraj requested review from Rocketknight1, sgugger and LysandreJik August 8, 2021 08:28

switched to keras built-in LayerNormalization

a15bc1a