Skip to content

Commit cd0e3d5

Browse files
reuankteleprint-me
authored andcommitted
[Server] Added --verbose option to README [no ci] (ggml-org#7335)
1 parent 657f980 commit cd0e3d5

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

examples/server/README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ The project is under active development, and we are [looking for feedback and co
1717

1818
**Command line options:**
1919

20-
- `--threads N`, `-t N`: Set the number of threads to use during generation. Not used if model layers are offloaded to GPU. The server is using batching. This parameter is used only if one token is to be processed on CPU backend.
20+
- `-v`, `--verbose`: Enable verbose server output. When using the `/completion` endpoint, this includes the tokenized prompt, the full request and the full response.
21+
- `-t N`, `--threads N`: Set the number of threads to use during generation. Not used if model layers are offloaded to GPU. The server is using batching. This parameter is used only if one token is to be processed on CPU backend.
2122
- `-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation. Not used if model layers are offloaded to GPU.
2223
- `--threads-http N`: Number of threads in the http server pool to process requests. Default: `max(std::thread::hardware_concurrency() - 1, --parallel N + 2)`
2324
- `-m FNAME`, `--model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`).
@@ -36,9 +37,7 @@ The project is under active development, and we are [looking for feedback and co
3637
- `--numa STRATEGY`: Attempt one of the below optimization strategies that may help on some NUMA systems
3738
- `--numa distribute`: Spread execution evenly over all nodes
3839
- `--numa isolate`: Only spawn threads on CPUs on the node that execution started on
39-
- `--numa numactl`: Use the CPU map provided by numactl. If run without this previously, it is recommended to drop the system
40-
page cache before using this. See https://github.com/ggerganov/llama.cpp/issues/1437
41-
40+
- `--numa numactl`: Use the CPU map provided by numactl. If run without this previously, it is recommended to drop the system page cache before using this. See https://github.com/ggerganov/llama.cpp/issues/1437
4241
- `--numa`: Attempt optimizations that may help on some NUMA systems.
4342
- `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains.
4443
- `--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.

0 commit comments

Comments
 (0)