Skip to content

Conversation

@haoyunfeix
Copy link
Contributor

@haoyunfeix haoyunfeix commented May 23, 2022

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@haoyunfeix haoyunfeix force-pushed the webgpu_update_perf_test branch 2 times, most recently from 05264a8 to 2d2c688 Compare September 30, 2022 01:29
@haoyunfeix haoyunfeix marked this pull request as ready for review September 30, 2022 01:29
@haoyunfeix
Copy link
Contributor Author

@qjia7 @xhcao @gyagp @axinging PTAL

@qjia7 qjia7 requested review from gyagp and xhcao September 30, 2022 04:41
Copy link

@gyagp gyagp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@xhcao xhcao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment is that why there are some many variables and all these ones are defined in functions. Could we define some of them in the global scope or wrap some variables together to reduce some definations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tf.profile,

  1. I think we mainly profile const result = tf.matMul(tensors[i].tensorA, tensors[i].tensorB), would including warmup model and warmup gpu parts influence the correctness?
  2. Why we still need to execute tf.matMul(tensorsWarmUp.tensorA, tensorsWarmUp.tensorB) in the 3rd for-loop, I think the cost time of it may be larger than tf.matMul(tensors[i].tensorA, tensors[i].tensorB), which is the real main workload.
  3. How to make sure that all matMul commands are executed one by one? I think they may be executed synchronously here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, warmup model is for the first run of each matmul shape(costing large shader generate and buffer creating time ... which leads to a significant deviation from the average value) and warmup gpu is to make every shape of inputs could be measured in the same frequency.
  2. The idea to insert a large warmup tensor between each main workload is also to keep high frequency, the real main workload will pull down the frequency if they execute together, please see below pictures:
    with: keep frequency at almost 1200M
    image
    without: Visible jitter, break the measurement accurate (only happens on WebGPU)
    image
  3. You're right, serial execution would be considered in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we wrap the dimAOuter | dimInner | dimBOuter into a string here, and should encode and decode the string? Why not wrap this value into an array or a structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each row has a rerun button to help rerun a single case for double check. And this button catches shape infos(e.g. [16, 512, 16]) from HTML element, which value must be a string. So here I used the raw data what I get from the page.

@haoyunfeix haoyunfeix force-pushed the webgpu_update_perf_test branch from 07bc3bc to b035bec Compare October 11, 2022 07:49
Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

As we synced offline, we can do more refactor based on it.

@qjia7 qjia7 merged commit 6391b20 into tensorflow:master Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants