Skip to content

Commit 236c4cf

Browse files
authored
Merge pull request ggml-org#456 from AgentJ-WR/patch-1
Show how to adjust context window in README.md
2 parents 7952ca5 + ea4fbad commit 236c4cf

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,15 @@ Below is a short example demonstrating how to use the high-level API to generate
105105
}
106106
```
107107

108+
### Adjusting the Context Window
109+
The context window of the Llama models determines the maximum number of tokens that can be processed at once. By default, this is set to 512 tokens, but can be adjusted based on your requirements.
110+
111+
For instance, if you want to work with larger contexts, you can expand the context window by setting the n_ctx parameter when initializing the Llama object:
112+
113+
```python
114+
llm = Llama(model_path="./models/7B/ggml-model.bin", n_ctx=2048)
115+
```
116+
108117
## Web Server
109118

110119
`llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.

0 commit comments

Comments
 (0)