New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deberta_v2 tf #13120
Deberta_v2 tf #13120
Conversation
@Rocketknight1 If i replace the gather function with experimental NumPy take_along_axis works - https://gist.github.com/kamalkraj/73ad5fa2b84de7e201e05464e11a4fec |
Hi @kamalkraj, do you know what shape the inputs are to the gather/take_along_axis? I'm going to try to construct a small test case that fails for my gather function but not for take_along_axis. If you can find a simple test case that fails, feel free to send that too so I can fix the function! |
Hi @Rocketknight1 |
In all of those cases, it looks like the TF |
No. TF Actually, in runtime, this branch never gets called transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py Lines 766 to 771 in e2f07c0
because both query_layer and key_layer are of the same size transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py Lines 571 to 572 in e2f07c0
|
Hi @BigBird01, I was going through
Because query_layer and key_layer shapes are -> the above condition may be needed for |
Hi @kamalkraj, can you share the exact glue task / command you used? I still can't reproduce the bug - I tried this:
This seemed to work fine with |
@Rocketknight1
I also opened another pull request to remove from PyTorch model also. #13145 |
Hi @Rocketknight1 , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @kamalkraj! I only left some nits - looks good to me to merge!
The only thing that confused me a bit was the if-else logic depending on whether hidden_states
is of type sequences, e.g. here: https://github.com/huggingface/transformers/pull/13120/files#r694681300
-> when would that be the case?
Hi @patrickvonplaten , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tremendous work putting this all together @kamalkraj!
Most of my comments are regarding the # Copied from
statements that could be added to most classes here, it seems even the actual model classes like TFDebertaV2ForSequenceClassification
could benefit from them.
_CHECKPOINT_FOR_DOC = "kamalkraj/deberta-v2-xlarge" | ||
|
||
TF_DEBERTA_V2_PRETRAINED_MODEL_ARCHIVE_LIST = [ | ||
"kamalkraj/deberta-v2-xlarge", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to migrate the TF checkpoint to the official one in microsoft/deberta-v2-xlarge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
Hi @LysandreJik, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks like a very solid PR now, and if tests are passing with good performance then I think it should be just about ready to go, assuming everyone else is in agreement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work @kamalkraj!
Is this code compatible with model.fit? |
What does this PR do?
Deberta-v2 TF
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@LysandreJik @patrickvonplaten