Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[webgpu] update perf test #6438

Merged
merged 8 commits into from Oct 12, 2022
Merged

Conversation

haoyunfeix
Copy link
Contributor

@haoyunfeix haoyunfeix commented May 23, 2022

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@haoyunfeix haoyunfeix force-pushed the webgpu_update_perf_test branch 2 times, most recently from 05264a8 to 2d2c688 Compare September 30, 2022 01:29
@haoyunfeix haoyunfeix marked this pull request as ready for review September 30, 2022 01:29
@haoyunfeix
Copy link
Contributor Author

@qjia7 @xhcao @gyagp @axinging PTAL

@qjia7 qjia7 requested review from xhcao and gyagp September 30, 2022 04:41
Copy link
Collaborator

@gyagp gyagp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@xhcao xhcao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment is that why there are some many variables and all these ones are defined in functions. Could we define some of them in the global scope or wrap some variables together to reduce some definations.

const m = tf.matMul(tensorsWarmUp.tensorA, tensorsWarmUp.tensorB);
result.dispose();
m.dispose();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tf.profile,

  1. I think we mainly profile const result = tf.matMul(tensors[i].tensorA, tensors[i].tensorB), would including warmup model and warmup gpu parts influence the correctness?
  2. Why we still need to execute tf.matMul(tensorsWarmUp.tensorA, tensorsWarmUp.tensorB) in the 3rd for-loop, I think the cost time of it may be larger than tf.matMul(tensors[i].tensorA, tensors[i].tensorB), which is the real main workload.
  3. How to make sure that all matMul commands are executed one by one? I think they may be executed synchronously here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, warmup model is for the first run of each matmul shape(costing large shader generate and buffer creating time ... which leads to a significant deviation from the average value) and warmup gpu is to make every shape of inputs could be measured in the same frequency.
  2. The idea to insert a large warmup tensor between each main workload is also to keep high frequency, the real main workload will pull down the frequency if they execute together, please see below pictures:
    with: keep frequency at almost 1200M
    image
    without: Visible jitter, break the measurement accurate (only happens on WebGPU)
    image
  3. You're right, serial execution would be considered in another PR.

for (let i = 0; i < inputs.length; i++) {
let dimAOuter = parseInt(inputs[i].split(',')[0]);
let dimInner = parseInt(inputs[i].split(',')[1]);
let dimBOuter = parseInt(inputs[i].split(',')[2]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we wrap the dimAOuter | dimInner | dimBOuter into a string here, and should encode and decode the string? Why not wrap this value into an array or a structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each row has a rerun button to help rerun a single case for double check. And this button catches shape infos(e.g. [16, 512, 16]) from HTML element, which value must be a string. So here I used the raw data what I get from the page.

Copy link
Collaborator

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

As we synced offline, we can do more refactor based on it.

@qjia7 qjia7 merged commit 6391b20 into tensorflow:master Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants