You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Adds support for the new float8 scaling recipe in the official eval
scripts used to generate accuracy numbers in the README.
For now, I am using this as a smoke test that the scaling is working on
a real model - it is. We can add official benchmark results after we
hook up slayton's cuBLAS binding on H100, which should make the UEX of
running evals a lot better.
Test Plan:
Smoke test on LLama-3.1-8B, accuracy looks good
```
// download checkpoint
with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B
// prepare checkpoint
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B
// run bf16 eval on a single task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande'
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697}
// run float8 eval on the same task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477}
```
Reviewers:
Subscribers:
Tasks:
Tags:
ghstack-source-id: e87609a
ghstack-comment-id: 3474380821
Pull-Request: #3269
0 commit comments