llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477
Closed
4 tasks done
Labels
model
Model specific issue
Uh oh!
There was an error while loading. Please reload this page.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
On v0.1.65 i expect GPU should work.
Current Behavior
My Kernel crashes presumably due to a memory issue. On v0.1.66-0.1.70 the model fails to load.
Environment and Context
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping: 0
CPU MHz: 2299.998
BogoMIPS: 4599.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 2 MiB
L3 cache: 45 MiB
NUMA node0 CPU(s): 0-15
$ uname -a
Linux username-tensorflow-gpu 5.10.0-23-cloud-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
$ python3 --version
Python 3.10.10
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
Failure Information (for bugs)
Kernel crashes, or model fails to load.
Crash:
Fails to load:
Steps to Reproduce
Use CUDA 12.1 and try run below:
Please help! I am using the model from https://huggingface.co/frankenstyle/ggml-q4-models/tree/main/models/llama/7B.
Thank you!
The text was updated successfully, but these errors were encountered: