Skip to content

Add script to fetch benchmark results for execuTorch #11734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
3e8fa8f
final
yangw-dev Jun 16, 2025
07896ea
final
yangw-dev Jun 17, 2025
79f7788
final
yangw-dev Jun 17, 2025
101d631
final
yangw-dev Jun 17, 2025
83e76fe
final
yangw-dev Jun 17, 2025
3b4047f
final
yangw-dev Jun 17, 2025
1da3e87
final
yangw-dev Jun 17, 2025
2f604a0
final
yangw-dev Jun 17, 2025
9fa50a4
final
yangw-dev Jun 17, 2025
b56863d
final
yangw-dev Jun 17, 2025
b2ad5b6
final
yangw-dev Jun 17, 2025
4aced24
final
yangw-dev Jun 17, 2025
ab6e6cf
final
yangw-dev Jun 17, 2025
8e6956d
final
yangw-dev Jun 17, 2025
87ba460
final
yangw-dev Jun 17, 2025
5d22567
final
yangw-dev Jun 17, 2025
7a032e1
final
yangw-dev Jun 18, 2025
d33ce72
final
yangw-dev Jun 18, 2025
702478b
final
yangw-dev Jun 18, 2025
03cff40
final
yangw-dev Jun 18, 2025
44b792b
final
yangw-dev Jun 18, 2025
37965f3
final
yangw-dev Jun 18, 2025
908ee26
final
yangw-dev Jun 18, 2025
4d30776
final
yangw-dev Jun 18, 2025
e27779e
final
yangw-dev Jun 18, 2025
f8ba103
final
yangw-dev Jun 18, 2025
0fb1b2b
final
yangw-dev Jun 18, 2025
2878e96
Merge branch 'main' into addScript
yangw-dev Jun 18, 2025
04dbd97
final
yangw-dev Jun 18, 2025
95b30a4
final
yangw-dev Jun 18, 2025
d22af04
fix error test
yangw-dev Jun 18, 2025
25769da
fix error test
yangw-dev Jun 18, 2025
48d9d34
Merge branch 'main' into addScript
yangw-dev Jun 18, 2025
51b326a
fix error test
yangw-dev Jun 18, 2025
13261da
fix error test
yangw-dev Jun 18, 2025
ffe6839
Merge branch 'main' into addScript
yangw-dev Jun 18, 2025
d7a6652
fix error test
yangw-dev Jun 19, 2025
9182682
fix error test
yangw-dev Jun 19, 2025
33de04f
fix error test
yangw-dev Jun 19, 2025
68bf6f5
fix error test
yangw-dev Jun 19, 2025
3c2cbd2
fix error test
yangw-dev Jun 19, 2025
8de90bf
Merge branch 'main' into addScript
yangw-dev Jun 19, 2025
990ff44
fix error test
yangw-dev Jun 19, 2025
ed48f5f
Merge branch 'main' into addScript
yangw-dev Jun 19, 2025
99df0fe
fix error test
yangw-dev Jun 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .ci/docker/requirements-ci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,6 @@ matplotlib>=3.9.4
myst-parser==0.18.1
sphinx_design==0.4.1
sphinx-copybutton==0.5.0

# script unit test requirements
yaspin==3.1.0
166 changes: 166 additions & 0 deletions .ci/scripts/benchmark_tooling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Executorch Benchmark Tooling

A library providing tools for fetching, processing, and analyzing ExecutorchBenchmark data from the HUD Open API. This tooling helps compare performance metrics between private and public devices with identical settings.

## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Tools](#tools)
- [get_benchmark_analysis_data.py](#get_benchmark_analysis_datapy)
- [Quick Start](#quick-start)
- [Command Line Options](#command-line-options)
- [Example Usage](#example-usage)
- [Working with Output Files](#working-with-output-files-csv-and-excel)
- [Python API Usage](#python-api-usage)
- [Running Unit Tests](#running-unit-tests)

## Overview

The Executorch Benchmark Tooling provides a suite of utilities designed to:

- Fetch benchmark data from HUD Open API for specified time ranges
- Clean and process data by filtering out failures
- Compare metrics between private and public devices with matching configurations
- Generate analysis reports in various formats (CSV, Excel, JSON)
- Support filtering by device pools, backends, and models

This tooling is particularly useful for performance analysis, regression testing, and cross-device comparisons.

## Installation

Install dependencies:

```bash
pip install -r requirements.txt
```

## Tools

### get_benchmark_analysis_data.py

This script is mainly used to generate analysis data comparing private devices with public devices using the same settings.

It fetches benchmark data from HUD Open API for a specified time range, cleans the data by removing entries with FAILURE indicators, and retrieves all private device metrics along with equivalent public device metrics based on matching [model, backend, device_pool_names, arch] configurations. Users can filter the data by specifying private device_pool_names, backends, and models.

#### Quick Start

```bash
python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py \
--startTime "2025-06-11T00:00:00" \
--endTime "2025-06-17T18:00:00" \
--outputType "csv"
```

#### Command Line Options

##### Basic Options:
- `--startTime`: Start time in ISO format (e.g., "2025-06-11T00:00:00") (required)
- `--endTime`: End time in ISO format (e.g., "2025-06-17T18:00:00") (required)
- `--env`: Choose environment ("local" or "prod", default: "prod")
- `--no-silent`: Show processing logs (default: only show results & minimum logging)

##### Output Options:
- `--outputType`: Choose output format (default: "print")
- `print`: Display results in console
- `json`: Generate JSON file
- `df`: Display results in DataFrame format: `{'private': List[{'groupInfo':Dict,'df': DF},...],'public':List[{'groupInfo':Dict,'df': DF}]`
- `excel`: Generate Excel files with multiple sheets, the field in first row and first column contains the JSON string of the raw metadata
- `csv`: Generate CSV files in separate folders, the field in first row and first column contains the JSON string of the raw metadata
- `--outputDir`: Directory to save output files (default: current directory)

##### Filtering Options:

- `--private-device-pools`: Filter by private device pool names (e.g., "samsung-galaxy-s22-5g", "samsung-galaxy-s22plus-5g")
- `--backends`: Filter by specific backend names (e.g., "qnn-q8", "llama3-spinquan")
- `--models`: Filter by specific model names (e.g., "mv3", "meta-llama-llama-3.2-1b-instruct-qlora-int4-eo8")

#### Example Usage

Filter by multiple private device pools and models:
```bash
# This fetches all private table data for models 'llama-3.2-1B' and 'mv3'
python3 get_benchmark_analysis_data.py \
--startTime "2025-06-01T00:00:00" \
--endTime "2025-06-11T00:00:00" \
--private-device-pools 'apple_iphone_15_private' 'samsung_s22_private' \
--models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3'
```

Filter by specific device pool and models:
```bash
# This fetches all private iPhone table data for models 'llama-3.2-1B' and 'mv3',
# and associated public iPhone data
python3 get_benchmark_analysis_data.py \
--startTime "2025-06-01T00:00:00" \
--endTime "2025-06-11T00:00:00" \
--private-device-pools 'apple_iphone_15_private' \
--models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3'
```

#### Working with Output Files CSV and Excel

You can use methods in `common.py` to convert the file data back to DataFrame format. These methods read the first row in CSV/Excel files and return results with the format `list of {"groupInfo":DICT, "df":df.Dataframe{}}`.

```python
import logging
logging.basicConfig(level=logging.INFO)
from .ci.scripts.benchmark_tooling.common import read_all_csv_with_metadata, read_excel_with_json_header

# For CSV files (assuming the 'private' folder is in the current directory)
folder_path = './private'
res = read_all_csv_with_metadata(folder_path)
logging.info(res)

# For Excel files (assuming the Excel file is in the current directory)
file_path = "./private.xlsx"
res = read_excel_with_json_header(file_path)
logging.info(res)
```

#### Python API Usage

To use the benchmark fetcher in your own scripts:

```python
from .ci.scripts.benchmark_tooling.get_benchmark_analysis_data import ExecutorchBenchmarkFetcher

# Initialize the fetcher
fetcher = ExecutorchBenchmarkFetcher(env="prod", disable_logging=False)

# Fetch data for a specific time range
fetcher.run(
start_time="2025-06-11T00:00:00",
end_time="2025-06-17T18:00:00"
)

# Get results in different formats
# As DataFrames
df_results = fetcher.to_df()

# Export to Excel
fetcher.to_excel(output_dir="./results")

# Export to CSV
fetcher.to_csv(output_dir="./results")

# Export to JSON
json_path = fetcher.to_json(output_dir="./results")

# Get raw dictionary results
dict_results = fetcher.to_dict()

# Use the output_data method for flexible output
results = fetcher.output_data(output_type="excel", output_dir="./results")
```

## Running Unit Tests

The benchmark tooling includes unit tests to ensure functionality.

### Using pytest for unit tests

```bash
# From the executorch root directory
pytest -c /dev/null .ci/scripts/tests/test_get_benchmark_analysis_data.py
```
Empty file.
50 changes: 50 additions & 0 deletions .ci/scripts/benchmark_tooling/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import json
import os
from typing import Any, Dict, List

import pandas as pd


def read_excel_with_json_header(path: str) -> List[Dict[str, Any]]:
# Read all sheets into a dict of DataFrames, without altering
all_sheets = pd.read_excel(path, sheet_name=None, header=None, engine="openpyxl")

results = []
for sheet, df in all_sheets.items():
# Extract JSON string from A1 (row 0, col 0)
json_str = df.iat[0, 0]
meta = json.loads(json_str) if isinstance(json_str, str) else {}

# The actual data starts from the next row; treat row 1 as header
df_data = pd.read_excel(path, sheet_name=sheet, skiprows=1, engine="openpyxl")
results.append({"groupInfo": meta, "df": df_data})
print(f"successfully fetched {len(results)} sheets from {path}")
return results


def read_all_csv_with_metadata(folder_path: str) -> List[Dict[str, Any]]:
results = [] # {filename: {"meta": dict, "df": DataFrame}}
for fname in os.listdir(folder_path):
if not fname.lower().endswith(".csv"):
continue
path = os.path.join(folder_path, fname)
with open(path, "r", encoding="utf-8") as f:
first_line = f.readline().strip()
try:
meta = json.loads(first_line)
except json.JSONDecodeError:
meta = {}
df = pd.read_csv(path, skiprows=1)
results.append({"groupInfo": meta, "df": df})
print(f"successfully fetched {len(results)} sheets from {folder_path}")
return results


import logging

logging.basicConfig(level=logging.INFO)

# For Excel files (assuming the Excel file is in the current directory)
file_path = "./private.xlsx"
res = read_excel_with_json_header(file_path)
logging.info(res)
Loading
Loading