Skip to content

Commit d47f9fe

Browse files
cccclaifacebook-github-bot
authored andcommitted
Docs for lower smaller models to mps/coreml/qnn (#3146)
Summary: Pull Request resolved: #3146 ghstack-source-id: 223235858 Reviewed By: mcr229, kirklandsign Differential Revision: D56340028 fbshipit-source-id: ef06142546ac54105ae87007cd82369917a22b3e
1 parent 0800594 commit d47f9fe

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

examples/models/llama2/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,16 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
260260
### Android
261261
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-android.html) to for full instructions on building the Android LLAMA Demo App.
262262
263+
## Optional: Smaller models delegated to other backends
264+
Currently we supported lowering the stories model to other backends, including, CoreML, MPS and QNN. Please refer to the instruction
265+
for each backend ([CoreML](https://pytorch.org/executorch/main/build-run-coreml.html), [MPS](https://pytorch.org/executorch/main/build-run-mps.html), [QNN](https://pytorch.org/executorch/main/build-run-qualcomm.html)) before trying to lower them. After the backend library is installed, the script to export a lowered model is
266+
267+
- Lower to CoreML: `python -m examples.models.llama2.export_llama -kv --coreml -c stories110M.pt -p params.json`
268+
- MPS: `python -m examples.models.llama2.export_llama -kv --mps -c stories110M.pt -p params.json`
269+
- QNN: `python -m examples.models.llama2.export_llama -kv --qnn -c stories110M.pt -p params.json`
270+
271+
The iOS LLAMA app supports the CoreML and MPS model and the Android LLAMA app supports the QNN model. On Android, it also allow to cross compiler the llama runner binary, push to the device and run.
272+
263273
# What is coming next?
264274
## Quantization
265275
- Enabling FP16 model to leverage smaller groupsize for 4-bit quantization.

0 commit comments

Comments
 (0)