Skip to content

[Benchmark][feature request] Prepare for execuTorch failure handling #6391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Mar 13, 2025

Conversation

yangw-dev
Copy link
Contributor

@yangw-dev yangw-dev commented Mar 12, 2025

Description

Issue: #6294
Prepare mobile_job yml to generate benchmark record when job fails.

Background:

When a git benchmark job failed (or some of the mobile job failed), we need to generate a benchmark record to indicate that model has failures.

For instace, a benchmark job with name:benchmark-on-device (ic3, coreml_fp16, apple_iphone_15, arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios)
when the whole job failed, we want to indicate that the model ic3 with backend coreml_fp16 and IOS for all metrics is failed
when one of the devices in job is failed, (IPHONE 15 with os 17.1), we want to indicate that the model ic3 with backend coreml_fp16 for IPHONE 15 with os 17.1 is failed, but others are success

key: always generate the artifact json with git job name.

Change Details

  • [yaml]add logic to generate artifact.json if any previous step fails and there is no expected artifact.json, this makes sure we always has the artifact json with git job name
  • [script] add a flag --new-json-output-format to toggle the mobile job to generate artifact.json with new format.
    • see example of new json result (s3 link)
  • [script] add git_job_name, run_report and job_reports to artifacts.json
    • git_job_name: used to build benchmark record if a git job failed [ a trick way to grab model info]
    • job_reports & run_report: we currently don't have extra info about mobile job concolusions, this can be used to upload to time_series or notification system for failure details.

prs that simulate failure cases for generating logics

Mimic step failed before the benchmark test (no json generated):#6397
Mimic step benchmark test failed but with artifact: #6398
ExecuTorch Sync Test: pytorch/executorch#9204

Details

when the flag is on, artifact.json is converted from

[ 
   ....
]

to

{
   "git_job_name": str
    "artifacts":[ ],
    "run_report":{}
    "job_reports":[....]
}

This flag is temporary to in case the logics are in sync between repos.

Copy link

vercel bot commented Mar 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Mar 12, 2025 11:16pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 12, 2025
@yangw-dev yangw-dev requested a review from ZainRizvi March 12, 2025 05:08
@yangw-dev yangw-dev marked this pull request as ready for review March 12, 2025 05:08
@yangw-dev yangw-dev marked this pull request as draft March 12, 2025 19:12
@yangw-dev yangw-dev removed the request for review from ZainRizvi March 12, 2025 19:12
@yangw-dev yangw-dev changed the title [Benchmark] Add flag to enbale/disable new json format in mobile artifacts file [Benchmark] Prepare for execuTorch failure handling Mar 12, 2025
@yangw-dev yangw-dev requested a review from ZainRizvi March 12, 2025 21:02
@yangw-dev yangw-dev requested a review from huydhn March 12, 2025 21:03
@yangw-dev yangw-dev marked this pull request as ready for review March 12, 2025 21:03
@yangw-dev yangw-dev merged commit 136177d into main Mar 13, 2025
10 checks passed
@yangw-dev yangw-dev deleted the addNewJsonOutput branch March 13, 2025 00:37
Camyll pushed a commit that referenced this pull request Mar 13, 2025
# Description
Issue: #6294
Prepare mobile_job yml to generate benchmark record when job fails.

## Background:  
When a git benchmark job failed (or some of the mobile job failed), we
need to generate a benchmark record to indicate that model has failures.

For instace, a benchmark job with name:`benchmark-on-device (ic3,
coreml_fp16, apple_iphone_15,
arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) `
when the whole job failed, we want to indicate that the model ic3 with
backend coreml_fp16 and IOS for all metrics is failed
when one of the devices in job is failed, (IPHONE 15 with os 17.1), we
want to indicate that the model ic3 with backend coreml_fp16 for IPHONE
15 with os 17.1 is failed, but others are success

key: always generate the artifact json with git job name.

## Change Details
- [yaml]add logic to generate artifact.json if any previous step fails
and there is no expected artifact.json, this makes sure we always has
the artifact json with git job name
- [script] add a flag `--new-json-output-format` to toggle the mobile
job to generate artifact.json with new format.
- see example of new json result ([s3
link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline))
- [script] add git_job_name, run_report and job_reports to
artifacts.json
- git_job_name: used to build benchmark record if a git job failed [ a
trick way to grab model info]
- job_reports & run_report: we currently don't have extra info about
mobile job concolusions, this can be used to upload to time_series or
notification system for failure details.



## prs that simulate failure cases for generating logics
Mimic step failed before the benchmark test (no json
generated):#6397
Mimic step benchmark test failed but with artifact:
#6398
ExecuTorch Sync Test: pytorch/executorch#9204


## Details
when the flag is on, artifact.json is converted from 
```
[ 
   ....
]
```
to

```
{
   "git_job_name": str
    "artifacts":[ ],
    "run_report":{}
    "job_reports":[....]
}

```
This flag is temporary to in case the logics are in sync between repos.
yangw-dev added a commit that referenced this pull request Apr 3, 2025
# Description
Issue: #6294
Prepare mobile_job yml to generate benchmark record when job fails.

## Background:  
When a git benchmark job failed (or some of the mobile job failed), we
need to generate a benchmark record to indicate that model has failures.

For instace, a benchmark job with name:`benchmark-on-device (ic3,
coreml_fp16, apple_iphone_15,
arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) `
when the whole job failed, we want to indicate that the model ic3 with
backend coreml_fp16 and IOS for all metrics is failed
when one of the devices in job is failed, (IPHONE 15 with os 17.1), we
want to indicate that the model ic3 with backend coreml_fp16 for IPHONE
15 with os 17.1 is failed, but others are success

key: always generate the artifact json with git job name.

## Change Details
- [yaml]add logic to generate artifact.json if any previous step fails
and there is no expected artifact.json, this makes sure we always has
the artifact json with git job name
- [script] add a flag `--new-json-output-format` to toggle the mobile
job to generate artifact.json with new format.
- see example of new json result ([s3
link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline))
- [script] add git_job_name, run_report and job_reports to
artifacts.json
- git_job_name: used to build benchmark record if a git job failed [ a
trick way to grab model info]
- job_reports & run_report: we currently don't have extra info about
mobile job concolusions, this can be used to upload to time_series or
notification system for failure details.



## prs that simulate failure cases for generating logics
Mimic step failed before the benchmark test (no json
generated):#6397
Mimic step benchmark test failed but with artifact:
#6398
ExecuTorch Sync Test: pytorch/executorch#9204


## Details
when the flag is on, artifact.json is converted from 
```
[ 
   ....
]
```
to

```
{
   "git_job_name": str
    "artifacts":[ ],
    "run_report":{}
    "job_reports":[....]
}

```
This flag is temporary to in case the logics are in sync between repos.
@yangw-dev yangw-dev changed the title [Benchmark] Prepare for execuTorch failure handling [Benchmark][feature request] Prepare for execuTorch failure handling May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants