Deberta_v2 tf #13120

kamalkraj · 2021-08-13T16:16:37Z

What does this PR do?

Deberta-v2 TF

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@LysandreJik @patrickvonplaten

kamalkraj · 2021-08-13T16:28:18Z

@Rocketknight1
#12972 (comment)
gather function fails while running run_glue.py from examples

If i replace the gather function with experimental NumPy take_along_axis works - https://gist.github.com/kamalkraj/73ad5fa2b84de7e201e05464e11a4fec

Rocketknight1 · 2021-08-16T11:54:18Z

Hi @kamalkraj, do you know what shape the inputs are to the gather/take_along_axis? I'm going to try to construct a small test case that fails for my gather function but not for take_along_axis. If you can find a simple test case that fails, feel free to send that too so I can fix the function!

kamalkraj · 2021-08-16T12:33:35Z

Hi @Rocketknight1
I have tried few tests for torch.gather when you initially shared the function. notebook link- https://colab.research.google.com/drive/1ujI6zKTuuryAO2Nfw9U1ZftyZyC4VUVS?usp=sharing

Rocketknight1 · 2021-08-16T16:24:47Z

In all of those cases, it looks like the TF torch_gather function gets the same results as the actual torch.gather, right? Is there a difference?

kamalkraj · 2021-08-16T16:41:45Z

No. TF torch_gather function gets the same output as torch.gather.

Actually, in runtime, this branch never gets called

transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py

Lines 766 to 771 in e2f07c0

    
           if query_layer.size(-2) != key_layer.size(-2): 
        
               p2c_att = torch.gather( 
        
                   p2c_att, 
        
                   dim=-2, 
        
                   index=pos_index.expand(p2c_att.size()[:2] + (pos_index.size(-2), key_layer.size(-2))), 
        
               )

because both query_layer and key_layer are of the same size

transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py

Lines 571 to 572 in e2f07c0

    
           self.query_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True) 
        
           self.key_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)

kamalkraj · 2021-08-16T20:02:54Z

Hi @BigBird01,

I was going through deberta-v2 implementation inside huggingface and as per my understanding, for deberta-v2 the below branch will be never executed.

transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py

Line 766 in e2f07c0

if query_layer.size(-2) != key_layer.size(-2):

Because query_layer and key_layer shapes are ->
[batch_size * num_attention_heads, sequence_length, attention_head_size]

the above condition may be needed for deberta. But Huggingface has separate implementation for deberta and deberta-v2
if my assumption is correct we can remove those never executed control flow branches from the deberta-v2 code.

BigBird01 · 2021-08-16T22:43:30Z

Yes. We can remove it to make the code clear. Thanks! Pengcheng From: Kamal Raj ***@***.***> Sent: Monday, August 16, 2021 1:03 PM To: huggingface/transformers ***@***.***> Cc: Pengcheng He ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/transformers] Deberta_v2 tf (#13120) Hi @BigBird01<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBigBird01&data=04%7C01%7CPengcheng.H%40microsoft.com%7C5d5abaf3549d4964849008d960f0deb8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637647409915898630%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lc6grZfUDwbI8XqIUK4JLTj3W%2F2evr6AkgrG2N27TeY%3D&reserved=0>, I was going through deberta-v2 implementation inside huggingface and as per my understanding, for deberta-v2 the below branch will be never executed. https://github.com/huggingface/transformers/blob/e2f07c01e93611fbd96f85204c9a2129bc81862b/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L766<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhuggingface%2Ftransformers%2Fblob%2Fe2f07c01e93611fbd96f85204c9a2129bc81862b%2Fsrc%2Ftransformers%2Fmodels%2Fdeberta_v2%2Fmodeling_deberta_v2.py%23L766&data=04%7C01%7CPengcheng.H%40microsoft.com%7C5d5abaf3549d4964849008d960f0deb8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637647409915898630%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nMC317uM%2Fsa7XTmnF1bvG9Blnabdawhxuu9jayoY8GA%3D&reserved=0> Because query_layer and key_layer shapes are -> [batch_size * num_attention_heads, sequence_length, attention_head_size] the above condition may be needed for deberta. Huggingface has separate implementation for deberta and deberta-v2 if my assumption is correct we can remove those never executed control flow branches from the deberta-v2 code. - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhuggingface%2Ftransformers%2Fpull%2F13120%23issuecomment-899782332&data=04%7C01%7CPengcheng.H%40microsoft.com%7C5d5abaf3549d4964849008d960f0deb8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637647409915908587%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZWPmlym37Xufrg18IhWl9hLPiz74rzOMrqKZViwC6Bg%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAJDNDRTGVQBRQ5QKLPWTKXTT5FVHVANCNFSM5CD4MTLA&data=04%7C01%7CPengcheng.H%40microsoft.com%7C5d5abaf3549d4964849008d960f0deb8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637647409915908587%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Jtx3lLmO%2FV7%2BUkmH6%2BJgHoLfKXgkPtDK%2FueTvkN4u%2Bs%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7CPengcheng.H%40microsoft.com%7C5d5abaf3549d4964849008d960f0deb8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637647409915918545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a1Gav7n4ZNejVJ4ufuDq0t0QC2G%2FWdsQyTuN2ctyckg%3D&reserved=0> or Android<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26utm_campaign%3Dnotification-email&data=04%7C01%7CPengcheng.H%40microsoft.com%7C5d5abaf3549d4964849008d960f0deb8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637647409915918545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lkAkdc34Y4JxscuaZWrLFjlyn3JrgNzuDNe3GkqZtl8%3D&reserved=0>.

Rocketknight1 · 2021-08-17T10:50:42Z

@Rocketknight1
#12972 (comment)
gather function fails while running run_glue.py from examples

If i replace the gather function with experimental NumPy take_along_axis works - https://gist.github.com/kamalkraj/73ad5fa2b84de7e201e05464e11a4fec

Hi @kamalkraj, can you share the exact glue task / command you used? I still can't reproduce the bug - I tried this:

python run_glue.py --model_name_or_path kamalkraj/deberta-v2-xlarge --task_name mnli --do_train --do_eval --do_predict --output_dir output

This seemed to work fine with torch_gather.

kamalkraj · 2021-08-17T11:12:56Z

@Rocketknight1
the issue is solved with this commit 90c122d .

torch_gather function under those if condition was creating the issue. I removed those conditions as it was unnecessary .
You can see the discussion #13120 (comment)

I also opened another pull request to remove from PyTorch model also. #13145

kamalkraj · 2021-08-24T08:13:50Z

Hi @Rocketknight1 ,
#13145 is merged to master. Now the TF implementation is the same as the torch Implementation. and runs without any issues

src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py

tests/test_modeling_tf_deberta_v2.py

patrickvonplaten

Very nice work @kamalkraj! I only left some nits - looks good to me to merge!

The only thing that confused me a bit was the if-else logic depending on whether hidden_states is of type sequences, e.g. here: https://github.com/huggingface/transformers/pull/13120/files#r694681300
-> when would that be the case?

kamalkraj · 2021-08-24T10:14:52Z

Hi @patrickvonplaten ,
thanks for the review.
committed changes.

LysandreJik

Tremendous work putting this all together @kamalkraj!

Most of my comments are regarding the # Copied from statements that could be added to most classes here, it seems even the actual model classes like TFDebertaV2ForSequenceClassification could benefit from them.

docs/source/index.rst

LysandreJik · 2021-08-24T13:08:15Z

src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py

+_CHECKPOINT_FOR_DOC = "kamalkraj/deberta-v2-xlarge"
+
+TF_DEBERTA_V2_PRETRAINED_MODEL_ARCHIVE_LIST = [
+    "kamalkraj/deberta-v2-xlarge",


We would need to migrate the TF checkpoint to the official one in microsoft/deberta-v2-xlarge

src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py

kamalkraj · 2021-08-24T14:53:48Z

Hi @LysandreJik,
committed changes.

Rocketknight1

Overall, this looks like a very solid PR now, and if tests are passing with good performance then I think it should be just about ready to go, assuming everyone else is in agreement!

LysandreJik

Thanks for your work @kamalkraj!

sh0416 · 2022-05-12T12:46:29Z

Is this code compatible with model.fit?

kamalkraj added 2 commits August 13, 2021 21:43

Deberta_v2 tf

ef3447c

added new line at the end of file, make style

ed2fb77

+V2, typo

7f315f8

LysandreJik requested review from Rocketknight1, LysandreJik and patrickvonplaten August 16, 2021 08:19

kamalkraj mentioned this pull request Aug 17, 2021

remove unwanted control-flow code from DeBERTa-V2 #13145

Merged

5 tasks

remove never executed branch of code

90c122d