Fix TFDebertaV2ConvLayer in TFDebertaV2Model #16031

ydshieh · 2022-03-09T20:55:21Z

What does this PR do?

Fix a CI failure for TFDebertaV2Model, caused by the mistake in TFDebertaV2ConvLayer.

Remark

This test test_inference_no_head also fails with the version in #13120. I think this slow test was not run manually to ensure it works before being merged to master.

Code to demonstrate the issue and the effect of this PR

This is adapted from test_inference_no_head

########## Prep ########## 

import numpy as np
import torch
import tensorflow as tf
from transformers import DebertaV2Model, TFDebertaV2Model

input_ids = np.array([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]], dtype=np.int32)
attention_mask = np.array([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=np.int32)

########## PT ########## 

pt_model = DebertaV2Model.from_pretrained("microsoft/deberta-v2-xlarge")

input_ids_pt = torch.from_numpy(input_ids)
attention_mask_pt = torch.from_numpy(attention_mask)
pt_output = pt_model(input_ids_pt, attention_mask=attention_mask_pt)[0]

# compare the actual values for a slice.
pt_expected_slice = torch.tensor(
    [[[0.2356, 0.1948, 0.0369], [-0.1063, 0.3586, -0.5152], [-0.6399, -0.0259, -0.2525]]]
)
pt_output_slice = pt_output[:, 1:4, 1:4]

pt_slice_diff = np.abs(pt_expected_slice.detach().to("cpu").numpy() - pt_output_slice.detach().to("cpu").numpy())
max_pt_slice_diff = np.amax(pt_slice_diff)

print(f"max_pt_slice_diff = {max_pt_slice_diff}")

########## TF ##########

tf_model = TFDebertaV2Model.from_pretrained("microsoft/deberta-v2-xlarge")

input_ids_tf = tf.constant(input_ids)
attention_mask_tf = tf.constant(attention_mask)
tf_output = tf_model(input_ids_tf, attention_mask=attention_mask_tf)[0]

# compare the actual values for a slice.
tf_expected_slice = tf.constant(
    [[[0.2356, 0.1948, 0.0369], [-0.1063, 0.3586, -0.5152], [-0.6399, -0.0259, -0.2525]]]
)
tf_output_slice = tf_output[:, 1:4, 1:4]

tf_slice_diff = tf_expected_slice.numpy() - tf_output_slice.numpy()
max_tf_slice_diff = np.amax(tf_slice_diff)

print(f"max_tf_slice_diff = {max_tf_slice_diff}")

########## PT-TF ########## 

max_pt_tf_diff = np.amax(np.abs(pt_output.detach().to("cpu").numpy() - tf_output.numpy()))
print(f"maximal pt_tf_diff = {max_pt_tf_diff}")

This scripts gives

Before this PR

max_pt_slice_diff = 5.037523806095123e-05
max_tf_slice_diff = 0.5608187317848206
maximal pt_tf_diff = 5.981985092163086

With this PR:

max_pt_slice_diff = 5.037523806095123e-05
max_tf_slice_diff = 4.8374757170677185e-05
maximal pt_tf_diff = 0.000133514404296875

ydshieh · 2022-03-09T20:56:31Z

src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py

@@ -313,7 +313,7 @@ def call(
        rmask = tf.cast(1 - input_mask, tf.bool)
        out = tf.where(tf.broadcast_to(tf.expand_dims(rmask, -1), shape_list(out)), 0.0, out)
        out = self.dropout(out, training=training)
-        hidden_states = self.conv_act(out)
+        out = self.conv_act(out)


This is the main place being fixed

ydshieh · 2022-03-09T20:57:07Z

src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py


-            output_states = output * mask
+            output_states = output * input_mask


Think this is the correct logic.

sgugger

Thanks a lot for fixing, LGTM!

HuggingFaceDocBuilderDev · 2022-03-09T21:00:36Z

The documentation is not available anymore as the PR was closed or merged.

gante

🔥

fix

abdcc7f

ydshieh requested review from patrickvonplaten, sgugger and gante March 9, 2022 20:55

ydshieh commented Mar 9, 2022

View reviewed changes

sgugger approved these changes Mar 9, 2022

View reviewed changes

patrickvonplaten approved these changes Mar 10, 2022

View reviewed changes

gante approved these changes Mar 10, 2022

View reviewed changes

ydshieh merged commit 2f463ef into huggingface:master Mar 10, 2022

ydshieh deleted the fix_tf_deberta_v2_conv_layer branch March 10, 2022 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TFDebertaV2ConvLayer in TFDebertaV2Model #16031

Fix TFDebertaV2ConvLayer in TFDebertaV2Model #16031

ydshieh commented Mar 9, 2022

ydshieh Mar 9, 2022

ydshieh Mar 9, 2022

sgugger left a comment

HuggingFaceDocBuilderDev commented Mar 9, 2022 •

edited

gante left a comment


		output_states = output * mask
		output_states = output * input_mask

Fix TFDebertaV2ConvLayer in TFDebertaV2Model #16031

Fix TFDebertaV2ConvLayer in TFDebertaV2Model #16031

Conversation

ydshieh commented Mar 9, 2022

What does this PR do?

Remark

Code to demonstrate the issue and the effect of this PR

ydshieh Mar 9, 2022

Choose a reason for hiding this comment

ydshieh Mar 9, 2022

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 9, 2022 • edited

gante left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 9, 2022 •

edited