New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webgpu] update perf test #6438
Conversation
05264a8
to
2d2c688
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment is that why there are some many variables and all these ones are defined in functions. Could we define some of them in the global scope or wrap some variables together to reduce some definations.
const m = tf.matMul(tensorsWarmUp.tensorA, tensorsWarmUp.tensorB); | ||
result.dispose(); | ||
m.dispose(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tf.profile
,
- I think we mainly profile
const result = tf.matMul(tensors[i].tensorA, tensors[i].tensorB)
, would includingwarmup model
andwarmup gpu
parts influence the correctness? - Why we still need to execute
tf.matMul(tensorsWarmUp.tensorA, tensorsWarmUp.tensorB)
in the 3rd for-loop, I think the cost time of it may be larger thantf.matMul(tensors[i].tensorA, tensors[i].tensorB)
, which is the real main workload. - How to make sure that all matMul commands are executed one by one? I think they may be executed synchronously here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes,
warmup model
is for the first run of each matmul shape(costing large shader generate and buffer creating time ... which leads to a significant deviation from the average value) andwarmup gpu
is to make every shape of inputs could be measured in the same frequency. - The idea to insert a large warmup tensor between each main workload is also to keep high frequency, the real main workload will pull down the frequency if they execute together, please see below pictures:
with: keep frequency at almost 1200M
without: Visible jitter, break the measurement accurate (only happens on WebGPU)
- You're right, serial execution would be considered in another PR.
for (let i = 0; i < inputs.length; i++) { | ||
let dimAOuter = parseInt(inputs[i].split(',')[0]); | ||
let dimInner = parseInt(inputs[i].split(',')[1]); | ||
let dimBOuter = parseInt(inputs[i].split(',')[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we wrap the dimAOuter | dimInner | dimBOuter
into a string here, and should encode and decode the string? Why not wrap this value into an array or a structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each row has a rerun
button to help rerun a single case for double check. And this button catches shape infos(e.g. [16, 512, 16]) from HTML element, which value must be a string. So here I used the raw data what I get from the page.
07bc3bc
to
b035bec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
As we synced offline, we can do more refactor based on it.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is