You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/compilation/compile_models.rst
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,14 @@ Compile Models via MLC
4
4
======================
5
5
6
6
This page describes how to compile a model with MLC LLM. Model compilation takes model inputs, produces quantized model weights,
7
-
and optimized model lib for a given platform. It enables users to bring their own new model weights, try different quantization modes,
7
+
and optimizes model lib for a given platform. It enables users to bring their own new model weights, try different quantization modes,
8
8
and customize the overall model optimization flow.
9
9
10
10
.. note::
11
11
Before you proceed, please make sure that you have :ref:`install-tvm-unity` correctly installed on your machine.
12
12
TVM-Unity is the necessary foundation for us to compile models with MLC LLM.
13
13
If you want to build webgpu, please also complete :ref:`install-web-build`.
14
-
Please also follow the instruction in :ref:`deploy-cli` to obtain the CLI app that can be used to chat with the compiled model.
14
+
Please also follow the instructions in :ref:`deploy-cli` to obtain the CLI app that can be used to chat with the compiled model.
15
15
Finally, we strongly recommend you read :ref:`project-overview` first to get familiarized with the high-level terminologies.
16
16
17
17
@@ -25,7 +25,7 @@ Install MLC-LLM Package
25
25
Work with Source Code
26
26
^^^^^^^^^^^^^^^^^^^^^
27
27
28
-
The easiest way is to use MLC-LLM is to clone the repository, and compile models under the root directory of the repository.
28
+
The easiest way to use MLC-LLM is to clone the repository, and compile models under the root directory of the repository.
29
29
30
30
.. code:: bash
31
31
@@ -106,7 +106,7 @@ your personal computer.
106
106
xcrun: error: unable to find utility "metallib", not a developer tool or in PATH
107
107
108
108
, please check and make sure you have Command Line Tools for Xcode installed correctly.
109
-
You can use ``xcrun metal`` to validate: when it prints ``metal: error: no input files``, it means the Command Line Tools for Xcode is installed and can be found, and you can proceed the model compiling.
109
+
You can use ``xcrun metal`` to validate: when it prints ``metal: error: no input files``, it means the Command Line Tools for Xcode is installed and can be found, and you can proceed with the model compiling.
110
110
111
111
.. group-tab:: Android
112
112
@@ -172,7 +172,7 @@ We can check the output with the commands below:
172
172
tokenizer_config.json
173
173
174
174
We now chat with the model using the command line interface (CLI) app.
175
-
Follow the build from source instruction
175
+
Follow the build from the source instruction
176
176
177
177
.. code:: shell
178
178
@@ -271,7 +271,7 @@ We can check the output with the commands below:
271
271
tokenizer_config.json
272
272
273
273
The model lib ``dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm``
274
-
can be uploaded to internet. You can pass a ``model_lib_map`` field to WebLLM app config to use this library.
274
+
can be uploaded to the internet. You can pass a ``model_lib_map`` field to WebLLM app config to use this library.
275
275
276
276
277
277
Each compilation target produces a specific model library for the given platform. The model weight is shared across
@@ -311,7 +311,7 @@ In other cases you need to specify the model via ``--model``.
When running the compile command using ``--model``, please make sure you have placed the model to compile under ``dist/models/`` or other location on the disk.
314
+
When running the compile command using ``--model``, please make sure you have placed the model to compile under ``dist/models/`` or another location on the disk.
315
315
316
316
--hf-path HUGGINGFACE_NAME The name of the model's Hugging Face repository.
317
317
We will download the model to ``dist/models/HUGGINGFACE_NAME`` and load the model from this directory.
@@ -336,11 +336,11 @@ The following arguments are optional:
336
336
we will use the maximum sequence length from the ``config.json`` in the model directory.
337
337
--reuse-lib LIB_NAME Specifies the previously generated library to reuse.
338
338
This is useful when building the same model architecture with different weights.
339
-
You can refer to the :ref:`model distribution <distribute-model-step3-specify-model-lib>` page for detail of this argument.
339
+
You can refer to the :ref:`model distribution <distribute-model-step3-specify-model-lib>` page for details of this argument.
340
340
--use-cache When ``--use-cache=0`` is specified,
341
341
the model compilation will not use cached file from previous builds,
342
342
and will compile the model from the very start.
343
-
Using cache can help reduce the time needed to compile.
343
+
Using a cache can help reduce the time needed to compile.
344
344
--debug-dump Specifies whether to dump debugging files during compilation.
345
345
--use-safetensors Specifies whether to use ``.safetensors`` instead of the default ``.bin`` when loading in model weights.
346
346
@@ -354,7 +354,7 @@ This section lists compile commands for more models that you can try out.
354
354
.. tab:: Model: Llama-2-7B
355
355
356
356
Please `request for access <https://huggingface.co/meta-llama>`_ to the Llama-2 weights from Meta first.
357
-
After granted the access, please create directory ``dist/models`` and download the model to the directory.
357
+
After granted access, please create directory ``dist/models`` and download the model to the directory.
For iOS app, model libraries are statically packed into the app at the time of app building.
163
-
Therefore, the iOS app supports running any models whose model libraries are integrated into the app.
163
+
Therefore, the iOS app supports running any model whose model libraries are integrated into the app.
164
164
You can check the :ref:`list of supported model libraries <using-prebuilt-models-ios>`.
165
165
166
166
To download and run the compiled RedPajama-3B instruct model on iPhone, we need to reuse the integrated ``RedPajama-INCITE-Chat-3B-v1-q4f16_1`` model library.
@@ -198,7 +198,7 @@ Now we can download the model weights in iOS app and run the model by following
198
198
199
199
.. tab:: Step 4
200
200
201
-
When the download is finished, click into the model and enjoy.
201
+
When the download is finished, click on the model and enjoy.
Copy file name to clipboardExpand all lines: docs/compilation/get-vicuna-weight.rst
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ Getting Vicuna Weights
5
5
:local:
6
6
:depth: 2
7
7
8
-
`Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`_ is a open-source chatbot trained by fine-tuning `LLaMA <https://ai.facebook.com/blog/large-language-model-llama-meta-ai/>`_ on `ShartGPT <https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered>`_ data.
8
+
`Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`_ is an open-source chatbot trained by fine-tuning `LLaMA <https://ai.facebook.com/blog/large-language-model-llama-meta-ai/>`_ on `ShartGPT <https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered>`_ data.
9
9
10
10
Please note that the official Vicuna weights are delta weights applied to the LLaMA weights in order to comply with the LLaMA license. Users are responsible for applying these delta weights themselves.
11
11
@@ -14,7 +14,7 @@ In this tutorial, we will show how to apply the delta weights to LLaMA weights t
14
14
Install FastChat
15
15
----------------
16
16
17
-
FastChat offers convenient utility functions for applying delta to LLaMA weights. You can easily install it using pip.
17
+
FastChat offers convenient utility functions for applying the delta to LLaMA weights. You can easily install it using pip.
18
18
19
19
.. code-block:: bash
20
20
@@ -38,14 +38,14 @@ Then download the weights (both the LLaMA weight and Vicuna delta weight):
0 commit comments