Skip to content

Commit abd24dd

Browse files
authored
doc: clarify that --quantize is not needed for pre-quantized models (#2536)
1 parent c103760 commit abd24dd

File tree

3 files changed

+9
-2
lines changed

3 files changed

+9
-2
lines changed

docs/source/reference/launcher.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,9 @@ Options:
5555
## QUANTIZE
5656
```shell
5757
--quantize <QUANTIZE>
58-
Whether you want the model to be quantized
58+
Quantization method to use for the model. It is not necessary to specify this option for pre-quantized models, since the quantization method is read from the model configuration.
59+
60+
Marlin kernels will be used automatically for GPTQ/AWQ models.
5961

6062
[env: QUANTIZE=]
6163

flake.nix

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@
157157
pyright
158158
pytest
159159
pytest-asyncio
160+
redocly
160161
ruff
161162
syrupy
162163
]);

launcher/src/main.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,11 @@ struct Args {
367367
#[clap(long, env)]
368368
num_shard: Option<usize>,
369369

370-
/// Whether you want the model to be quantized.
370+
/// Quantization method to use for the model. It is not necessary to specify this option
371+
/// for pre-quantized models, since the quantization method is read from the model
372+
/// configuration.
373+
///
374+
/// Marlin kernels will be used automatically for GPTQ/AWQ models.
371375
#[clap(long, env, value_enum)]
372376
quantize: Option<Quantization>,
373377

0 commit comments

Comments
 (0)