Skip to content

Commit 36eb9c8

Browse files
authored
Docs for lower smaller models to mps/coreml/qnn (#3146) (#3178)
Summary: Pull Request resolved: #3146 ghstack-source-id: 223235858 Reviewed By: mcr229, kirklandsign Differential Revision: D56340028 fbshipit-source-id: ef06142546ac54105ae87007cd82369917a22b3e (cherry picked from commit d47f9fe)
1 parent efb7cf3 commit 36eb9c8

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

examples/models/llama2/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,16 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
238238
### Android
239239
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-android.html) to for full instructions on building the Android LLAMA Demo App.
240240
241+
## Optional: Smaller models delegated to other backends
242+
Currently we supported lowering the stories model to other backends, including, CoreML, MPS and QNN. Please refer to the instruction
243+
for each backend ([CoreML](https://pytorch.org/executorch/main/build-run-coreml.html), [MPS](https://pytorch.org/executorch/main/build-run-mps.html), [QNN](https://pytorch.org/executorch/main/build-run-qualcomm.html)) before trying to lower them. After the backend library is installed, the script to export a lowered model is
244+
245+
- Lower to CoreML: `python -m examples.models.llama2.export_llama -kv --coreml -c stories110M.pt -p params.json`
246+
- MPS: `python -m examples.models.llama2.export_llama -kv --mps -c stories110M.pt -p params.json`
247+
- QNN: `python -m examples.models.llama2.export_llama -kv --qnn -c stories110M.pt -p params.json`
248+
249+
The iOS LLAMA app supports the CoreML and MPS model and the Android LLAMA app supports the QNN model. On Android, it also allow to cross compiler the llama runner binary, push to the device and run.
250+
241251
# What is coming next?
242252
## Quantization
243253
- Enabling FP16 model to leverage smaller groupsize for 4-bit quantization.

0 commit comments

Comments
 (0)