You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Misc fixes for release scripts to make it easier to use
Summary:
* removed requirement of setting VLLM_DIR environment, since benchmark is now a cli command
* reordered the evals and summarization of results to match better with the order of model card
Test Plan:
local manual runs achiving desired results
Reviewers:
Subscribers:
Tasks:
Tags:
By default, we release FP8, INT4, INT8-INT4 checkpoints, with model card pre-filled with template content, that can be modified later after we have eval results.
8
51
@@ -12,10 +55,10 @@ Examples:
12
55
# the logged in user
13
56
14
57
# release with default quant options (FP8, INT4, INT8-INT4)
This will update `pytorch/Phi-4-mini-instruct-FP8` without changing the model card.
43
86
44
-
## Eval
87
+
## Eval Scripts
45
88
After we run the release script for a model, we can find new models in the huggingface hub page for the user, e.g. https://huggingface.co/torchao-testing, the models will have a model card that's filled in with template content, such as information about the model and eval instructions, there are a few things we need to fill in, including 1. peak memory usage, 2. latency when running model with vllm and 3. quality measurement using lm-eval.
46
89
47
90
### Single Script
@@ -64,15 +107,15 @@ sh eval.sh --eval_type memory --model_ids Qwen/Qwen3-8B
64
107
```
65
108
66
109
#### Latency Eval
67
-
For latency eval, make sure vllm is cloned and installed from source,
68
-
and `VLLM_DIR` should be set to the source directory of the cloned vllm repo.
Copy file name to clipboardExpand all lines: .github/scripts/torchao_model_releases/eval_env_checks.sh
+1-6Lines changed: 1 addition & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -12,13 +12,8 @@ check_torch() {
12
12
}
13
13
14
14
check_vllm() {
15
-
# Check if VLLM_DIR is set
16
-
if [ -z"$VLLM_DIR" ];then
17
-
echo"Error: VLLM_DIR environment variable is not set. Please set it before running this script."
18
-
exit 1
19
-
fi
20
15
if! pip show vllm > /dev/null 2>&1;then
21
-
echo"Error: vllm package is NOT installed. please install from source: https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#set-up-using-python-only-build-without-compilation">&2
16
+
echo"Error: vllm package is NOT installed. please install with `pip install vllm`">&2
Once we have the checkpoint, we export it to ExecuTorch with a max_seq_length/max_context_length of 1024 to the XNNPACK backend as follows.
554
+
Once we have the checkpoint, we export it to ExecuTorch with a max_seq_length/max_context_length of 1024 to the XNNPACK backend as follows.
596
555
597
556
[TODO: fix config path in note where necessary]
598
557
(Note: ExecuTorch LLM export script requires config.json have certain key names. The correct config to use for the LLM export script is located at examples/models/qwen3/config/4b_config.json within the ExecuTorch repo.)
0 commit comments