New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deberta tf #12972
Deberta tf #12972
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! My main concern is the added dependency, but I think we don't really need it. I've left pointers on how to avoid using it.
As a result of #13023 , you will need to rebase your PR on master and solve the merge conflicts (basically, you will just need to re-add the models in the auto-mappings as strings). Let us know if you need any help with that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I'd love for @Rocketknight1 to go over the TF code, and pinging @BigBird01 as he's the author of the model and contributed the PyTorch version.
Glad to see a tf version! Thank you! |
moved weights to build and fixed name scope added missing , bug fixes to enable graph mode execution updated setup.py fixing typo fix imports embedding mask fix added layer names avoid autmatic incremental names +XSoftmax cleanup added names to layer disable keras_serializable Distangled attention output shape hidden_size==None using symbolic inputs test for Deberta tf make style Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> removed tensorflow-probability removed blank line
+torch_gather tf implementation from @Rocketknight1
to same as pt model
self.dropout = StableDropout(config.hidden_dropout_prob, name="dropout") | ||
|
||
def build(self, input_shape: tf.TensorShape): | ||
with tf.name_scope("word_embeddings"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a huge fan of the tf.name_scope(...)
here as it makes the weight naming less flexible - could we instead craete three new tf.keras.layers.Layer
classes (one for each embedding) to have a 1-to-1 translation from PyTorch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I just noticed that this has become the standard in modeling_tf_bert.py
as well:
def build(self, input_shape: tf.TensorShape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great PR! The PR seems to be in a very good shape already. Mostly left nits, but would be happy if we could:
- Add a
TFDeberta
prefix to most layer classes. This helps when looking up modules later (I know that in PyTocrh we also didn't append aDeberta
prefix to all modules, but we should have done this IMO. - Avoid using
with tf.name_scope
and instead replicate the PyTorch weight structure 1-to-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @kamalkraj! Would you be interested in contributing the TensorFlow version of the DeBERTa-v2 model too? :)
@LysandreJik |
What does this PR do?
TFDeBERTa implementation
@patrickvonplaten, @LysandreJik
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.