Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix batch matching for batch mat mul #7062

Merged
merged 4 commits into from Nov 21, 2022
Merged

Conversation

Linchenn
Copy link
Collaborator

@Linchenn Linchenn commented Nov 18, 2022

Fix #7061 for CPU, WASM (native implementation) and WebGL backends.

This error is because of batch mismatch when A has more dimensions than B, and vice versa.

For example: A's shape is [2,4,3,3] and B's shape is [4,3,3]. Then A's batch is the first two dimensions [2, 4] while B's batch is the first dimension [4], so B's batch is supposed to be broadcasted to be [2, 4]. Then, to compute output[1][0][0][0], we have to do dot product of A[1][0] [0][...] with B[0] [...][0], but the current algorithm is doing dot product of A[1][0][0][...] with B[3][...][0].

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

Copy link
Member

@mattsoulanille mattsoulanille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit related to modulus. Thanks!

tfjs-backend-cpu/src/kernels/BatchMatMul.ts Outdated Show resolved Hide resolved
tfjs-backend-wasm/src/cc/batch_mat_mul_impl.cc Outdated Show resolved Hide resolved
Comment on lines +118 to +121
a3dValues[batchIndexA * aBatch + i * aOuterStep + k * aInnerStep];
const bVal =
b3dValues[k * bInnerStep + j * bOuterStep + batchOffsetB];
// tslint:disable-next-line: max-line-length
b3dValues[k * bInnerStep + j * bOuterStep + batchIndexB * bBatch];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why k matched with aInnerStep but did not then match with bOuterStep, but I see now that bOuterStep and bInnerStep are opposite to aOuterStep and aInnerStep (lines 82 - 87). They don't refer to 'rows' and 'columns' of the matrices being multiplied, where you would step a's row index with b's column index for the dot product.

No action necessary.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we can move the batch variable assignment up to the loop of bi, given those assignment only depends on the bi value not other loop variables. It can be a separate PR.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @Linchenn and @mattsoulanille)

@Linchenn Linchenn merged commit e6ada3a into tensorflow:master Nov 21, 2022
@Linchenn
Copy link
Collaborator Author

cc @qjia7 @xhcao In case WebGPU has similar problem. This problem happens in BatchMatMul when A's batch size does not match B's batch size. For details, you could take a look at my descriptions.

@Linchenn Linchenn deleted the fixMam branch November 21, 2022 21:24
@qjia7
Copy link
Collaborator

qjia7 commented Nov 23, 2022

cc @qjia7 @xhcao In case WebGPU has similar problem. This problem happens in BatchMatMul when A's batch size does not match B's batch size. For details, you could take a look at my descriptions.

Thanks Lin. Yes, webgpu has the similar issue. @xhcao will work on it. Thank you!

Linchenn added a commit to Linchenn/tfjs that referenced this pull request Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect result by matmul 4D x 3D
4 participants