You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: for initial release, please include `--populate_model_card_template` to populate model card template.
22
+
21
23
### AWQ-INT4
22
24
[AWQ](https://arxiv.org/abs/2306.00978) is a technique to improve accuracy for weight only quantization. It improves accuracy by preserving "salient" weight channels that has high impact on the accuracy of output, through multiplying the weight channel by a scale, and do the reverse for the correspnoding activation, since activation is not quantized, there is no additional loss from activation, while the quantization loss from weight can be reduced.
### Update checkpoints for a different user_id (e.g. pytorch)
36
+
Sometimes we may want to update the checkpoints for a different user id, without changing model card. For this we can use `--push_to_user_id`, e.g.
37
+
38
+
```
39
+
sh release.sh --model_id microsoft/Phi-4-mini-instruct --quants FP8 --push_to_hub --push_to_user_id pytorch
40
+
```
41
+
42
+
This will update `pytorch/Phi-4-mini-instruct-FP8` without changing the model card.
43
+
33
44
## Eval
34
45
After we run the release script for a model, we can find new models in the huggingface hub page for the user, e.g. https://huggingface.co/torchao-testing, the models will have a model card that's filled in with template content, such as information about the model and eval instructions, there are a few things we need to fill in, including 1. peak memory usage, 2. latency when running model with vllm and 3. quality measurement using lm-eval.
35
46
@@ -78,7 +89,7 @@ After environment is setup, we can run eval:
78
89
sh eval.sh --eval_type quality --model_ids Qwen/Qwen3-8B --tasks hellaswag,mmlu
79
90
```
80
91
81
-
#### Summarize results
92
+
#### Summarize results
82
93
After we have finished all evals for each model, we can summarize the results with:
83
94
```
84
95
sh summarize_results.sh --model_ids Qwen/Qwen3-8B pytorch/Qwen3-8B-INT4
Once we have the checkpoint, we export it to ExecuTorch with the XNNPACK backend as follows.
596
-
(ExecuTorch LLM export script requires config.json have certain key names. The correct config to use for the LLM export script is located at [TODO: fill in, e.g., examples/models/qwen3/config/4b_config.json] within the ExecuTorch repo.)
595
+
Once we have the checkpoint, we export it to ExecuTorch with a max_seq_length/max_context_length of 1024 to the XNNPACK backend as follows.
596
+
597
+
[TODO: fix config path in note where necessary]
598
+
(Note: ExecuTorch LLM export script requires config.json have certain key names. The correct config to use for the LLM export script is located at examples/models/qwen3/config/4b_config.json within the ExecuTorch repo.)
After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
618
+
619
+
(We try to keep these instructions up-to-date, but if you find they do not work, check out our [CI test in ExecuTorch](https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_torchao_huggingface_checkpoints.sh) for the latest source of truth, and let us know we need to update our model card.)
0 commit comments