Skip to content

Commit c1eeae1

Browse files
committed
docs: update s390x build docs to reflect nnpa disable
Signed-off-by: Aaron Teo <[email protected]>
1 parent 412f4c7 commit c1eeae1

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

docs/build-s390x.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,14 @@ cmake --build build --config Release -j $(nproc)
4242
cmake --build build --config Release -j $(nproc)
4343
```
4444

45-
- By default, NNPA is enabled when available. To disable it (not recommended):
45+
- By default, NNPA is disabled by default. To enable it:
4646

4747
```bash
4848
cmake -S . -B build \
4949
-DCMAKE_BUILD_TYPE=Release \
5050
-DGGML_BLAS=ON \
5151
-DGGML_BLAS_VENDOR=OpenBLAS \
52-
-DGGML_NNPA=OFF
52+
-DGGML_NNPA=ON
5353
5454
cmake --build build --config Release -j $(nproc)
5555
```
@@ -86,7 +86,7 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
8686

8787
You can find popular models pre-converted and verified at [s390x Verified Models](https://huggingface.co/collections/taronaeo/s390x-verified-models-672765393af438d0ccb72a08) or [s390x Runnable Models](https://huggingface.co/collections/taronaeo/s390x-runnable-models-686e951824198df12416017e).
8888

89-
These models have already been converted from `safetensors` to `GGUF Big-Endian` and their respective tokenizers verified to run correctly on IBM z15 and later system.
89+
These models have already been converted from `safetensors` to `GGUF` Big-Endian and their respective tokenizers verified to run correctly on IBM z15 and later system.
9090

9191
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
9292

@@ -95,11 +95,13 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
9595
The model you are trying to convert must be in `safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
9696

9797
Ensure that you have installed the required packages in advance
98+
9899
```bash
99100
pip3 install -r requirements.txt
100101
```
101102

102103
Convert the `safetensors` model to `GGUF`
104+
103105
```bash
104106
python3 convert_hf_to_gguf.py \
105107
--outfile model-name-be.f16.gguf \
@@ -147,7 +149,7 @@ Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by
147149

148150
### 2. NNPA Vector Intrinsics Acceleration
149151

150-
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
152+
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
151153

152154
### 3. zDNN Accelerator
153155

@@ -206,10 +208,15 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
206208
```
207209

208210
For example,
211+
209212
```bash
210213
CXXFLAGS="-include cstdint" pip3 install -r requirements.txt
211214
```
212215

216+
5. `-DGGML_NNPA=ON` generates gibberish output
217+
218+
Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`.
219+
213220
## Getting Help on IBM Z & LinuxONE
214221

215222
1. **Bugs, Feature Requests**
@@ -266,4 +273,4 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
266273
- 🚫 - acceleration unavailable, will still run using scalar implementation
267274
- ❓ - acceleration unknown, please contribute if you can test it yourself
268275

269-
Last Updated by **Aaron Teo ([email protected])** on July 21, 2025.
276+
Last Updated by **Aaron Teo ([email protected])** on July 25, 2025.

0 commit comments

Comments
 (0)