Add script to fetch benchmark results for execuTorch #11734

yangw-dev · 2025-06-16T18:29:30Z

Summary

Provide methods and script to fetch all execuTorch benchamrk data from HUD API into two dataset,private and public, the script will:

fetch all data from HUD API from input time range in UTC
clean out records and tables with only FAILURE_REPORT due to job-level failures
get all private table metrics, generate table_name and find intersected public table metrics
generate private and public table groups
output data

OutputType:

run with excel-sheet export
run with csv export
run with dataframe format print
run with json format print

See more guidance in README.md

the data is similar to the excel sheet generated manually in #10982
The result should be the same as the hud per model datatable:

helper methods: common.py

provide common.py helper method to convert back csv and excel sheets back to {"groupInfo":{}, "df":df.DataFrame} format.

Signed-off-by: Yang Wang <[email protected]>

pytorch-bot · 2025-06-16T18:29:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11734

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 99df0fe with merge base 9bb0735 ():

NEW FAILURES - The following jobs have failed:

Build Linux Wheels / pytorch/executorch / build-manywheel-py3_10-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build Linux Wheels / pytorch/executorch / upload / upload-manywheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x86_64
Build macOS Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build macOS Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_
pull / test-llava-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 7db129c6c587e8f72ee1db745cffc55aca1528bcf25527963711c5a4d678aa36 /exec failed with exit code 139

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Yang Wang <[email protected]>

yangw-dev · 2025-06-17T00:54:40Z

FYI, this method can be more general, but since only execuTorch is using it, i just make it execuTorch specific @huydhn

Signed-off-by: Yang Wang <[email protected]>

yangw-dev · 2025-06-17T01:15:18Z

the excel sheet has limit of sheet name len < 31, which can be easy to break in the future. @huydhn @guangy10 , I think instead of generate one file per category, maybe we can generate list of excel files stored in folders [private, public]

But right now with the hard-coded abbreviation, this works fine. THe excel sheet option is there in case people want to use it.

.ci/scripts/benchmark_tooling/README.md

Signed-off-by: Yang Wang <[email protected]>

huydhn

Stamped to unblock! Let's start using the script and improve it along the way

guangy10 · 2025-06-17T19:45:12Z

We configured a fixed list of matching names to list limited tables

I think we should make it more flexible as there are always new models, recipes, devices added. For example, we recently add more models (see on dash) from huggingface/optimum-executorch to the benchmark infra, and the list will keep expanding.

Similarly with new "devices" or "backends" available, we want to be able to query the results via the script as well.

the excel sheet has limit of sheet name len < 31, which can be easy to break in the future. @huydhn @guangy10 , I think instead of generate one file per category, maybe we can generate list of excel files stored in folders [private, public]

Yeah noticed the limits when I manually created the excel sheet. Ideally I'd like to get rid of the excel sheet by wiring the outputs from db to the analysis script directly. Given what is currently supported in this PR, what does the workflow look like if I want to rerun the analysis? That is, how is this script interfaced to the analysis script?

.ci/scripts/benchmark_tooling/README.md

.ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py

Signed-off-by: Yang Wang <[email protected]>

yangw-dev · 2025-06-18T16:50:19Z

@pytorchbot label "release notes: none"

Signed-off-by: Yang Wang <[email protected]>

.ci/scripts/benchmark_tooling/README.md

guangy10 · 2025-06-18T21:05:41Z

.ci/scripts/benchmark_tooling/README.md

+Notice, the filter needs full name matchings with correct format, to see all the options of the filter choices, please run the script with `--print-all-table-info`, and pay attention to section `Full list of table info from HUD API` with the field 'info', which contains normalized data we use to filter records from the original metadata 'groupInfo'.
+The filter block any record if it does not in any of the filter keywords.
+
+- `--devices`: Filter by specific device names (e.g., "samsung-galaxy-s22-5g", "samsung-galaxy-s22plus-5g")


Is this field always refer to private device?

What is the format to specify multiple devices? comma-separated or something else, should it be clarified here?

All supported devices can be found here (Note it's mixing of private and public), should --help list all supported devices so users can copy/paste easily.

We should be consistent on names of these devices in all places. https://github.com/pytorch/executorch/blob/main/.ci/scripts/gather_benchmark_configs.py#L20-L29

I think this can be general, I can associate with the list to provide list of choices when user try to add to it, if underscore needs to be consistent, i can change some thing, but

I changed everything to private devices now since we first filter wiht private devices, i only add two private_device_pool options there

.ci/scripts/benchmark_tooling/README.md

guangy10

Thank you for iterating on this! There are few critical issues to be fixed in order to merge

Signed-off-by: Yang Wang <[email protected]>

yangw-dev · 2025-06-19T01:04:34Z

@guangy10
I updated logics
Fetch and process benchmark data from HUD API for ExecutorchBenchmark.

This class provides methods to:
1. Fetch all benchmark data for a specified time range
2. Get all private device info within the time range
3. Filter the private device data if filter is provided, for backends and models we expect human-readable models, notice, everything is first based on private devices.
4. Then use the filtered private device data to find matched the public device data using [model, backend, device, arch]
3. Export results in various formats (JSON, DataFrame, Excel, CSV)

Signed-off-by: Yang Wang <[email protected]>

final

3e8fa8f

Signed-off-by: Yang Wang <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 16, 2025

final

07896ea

Signed-off-by: Yang Wang <[email protected]>

yangw-dev changed the title ~~final~~ Add script to fetch benchmark results for execuTorch Jun 17, 2025

yangw-dev added 4 commits June 16, 2025 17:21

final

79f7788

Signed-off-by: Yang Wang <[email protected]>

final

101d631

Signed-off-by: Yang Wang <[email protected]>

final

83e76fe

Signed-off-by: Yang Wang <[email protected]>

final

3b4047f

Signed-off-by: Yang Wang <[email protected]>

yangw-dev requested review from guangy10 and huydhn June 17, 2025 00:54

final

1da3e87

Signed-off-by: Yang Wang <[email protected]>

yangw-dev marked this pull request as ready for review June 17, 2025 01:02

huydhn reviewed Jun 17, 2025

View reviewed changes

.ci/scripts/benchmark_tooling/README.md Outdated Show resolved Hide resolved

yangw-dev requested a review from huydhn June 17, 2025 17:41

yangw-dev self-assigned this Jun 17, 2025

yangw-dev added 9 commits June 17, 2025 10:58

final

2f604a0

Signed-off-by: Yang Wang <[email protected]>

final

9fa50a4

Signed-off-by: Yang Wang <[email protected]>

final

b56863d

Signed-off-by: Yang Wang <[email protected]>

final

b2ad5b6

Signed-off-by: Yang Wang <[email protected]>

final

4aced24

Signed-off-by: Yang Wang <[email protected]>

final

ab6e6cf

Signed-off-by: Yang Wang <[email protected]>

final

8e6956d

Signed-off-by: Yang Wang <[email protected]>

final

87ba460

Signed-off-by: Yang Wang <[email protected]>

final

5d22567

Signed-off-by: Yang Wang <[email protected]>

huydhn approved these changes Jun 17, 2025

View reviewed changes

guangy10 requested changes Jun 17, 2025

View reviewed changes

yangw-dev added 3 commits June 18, 2025 09:20

final

f8ba103

Signed-off-by: Yang Wang <[email protected]>

final

0fb1b2b

Signed-off-by: Yang Wang <[email protected]>

Merge branch 'main' into addScript

2878e96

pytorch-bot bot added the release notes: none Do not include this in the release notes label Jun 18, 2025

yangw-dev added 8 commits June 18, 2025 10:36

final

04dbd97

Signed-off-by: Yang Wang <[email protected]>

final

95b30a4

Signed-off-by: Yang Wang <[email protected]>

fix error test

d22af04

Signed-off-by: Yang Wang <[email protected]>

fix error test

25769da

Signed-off-by: Yang Wang <[email protected]>

Merge branch 'main' into addScript

48d9d34

fix error test

51b326a

Signed-off-by: Yang Wang <[email protected]>

fix error test

13261da

Signed-off-by: Yang Wang <[email protected]>

Merge branch 'main' into addScript

ffe6839

guangy10 reviewed Jun 18, 2025

View reviewed changes

.ci/scripts/benchmark_tooling/README.md Outdated Show resolved Hide resolved

guangy10 reviewed Jun 18, 2025

View reviewed changes

.ci/scripts/benchmark_tooling/README.md Outdated Show resolved Hide resolved

guangy10 reviewed Jun 18, 2025

View reviewed changes

.ci/scripts/benchmark_tooling/README.md Outdated Show resolved Hide resolved

guangy10 reviewed Jun 18, 2025

View reviewed changes

.ci/scripts/benchmark_tooling/README.md Outdated Show resolved Hide resolved

guangy10 requested changes Jun 18, 2025

View reviewed changes

yangw-dev requested a review from guangy10 June 18, 2025 21:30

yangw-dev added 5 commits June 18, 2025 17:43

fix error test

d7a6652

Signed-off-by: Yang Wang <[email protected]>

fix error test

9182682

Signed-off-by: Yang Wang <[email protected]>

fix error test

33de04f

Signed-off-by: Yang Wang <[email protected]>

fix error test

68bf6f5

Signed-off-by: Yang Wang <[email protected]>

fix error test

3c2cbd2

Signed-off-by: Yang Wang <[email protected]>

yangw-dev added 4 commits June 18, 2025 18:04

Merge branch 'main' into addScript

8de90bf

fix error test

990ff44

Signed-off-by: Yang Wang <[email protected]>

Merge branch 'main' into addScript

ed48f5f

fix error test

99df0fe

Signed-off-by: Yang Wang <[email protected]>

Add script to fetch benchmark results for execuTorch #11734

Are you sure you want to change the base?

Add script to fetch benchmark results for execuTorch #11734

Uh oh!

Conversation

yangw-dev commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

helper methods: common.py

Uh oh!

pytorch-bot bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11734

❌ 5 New Failures

Uh oh!

yangw-dev commented Jun 17, 2025

Uh oh!

yangw-dev commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

huydhn left a comment

Choose a reason for hiding this comment

Uh oh!

guangy10 commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yangw-dev commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

guangy10 Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yangw-dev Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

yangw-dev Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

guangy10 left a comment

Choose a reason for hiding this comment

Uh oh!

yangw-dev commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yangw-dev commented Jun 16, 2025 •

edited

Loading

pytorch-bot bot commented Jun 16, 2025 •

edited

Loading

yangw-dev commented Jun 17, 2025 •

edited

Loading

guangy10 Jun 18, 2025 •

edited

Loading

yangw-dev Jun 19, 2025 •

edited

Loading

yangw-dev commented Jun 19, 2025 •

edited

Loading