llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477

jaymon0703 · 2023-07-14T02:49:02Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

On v0.1.65 i expect GPU should work.

Current Behavior

My Kernel crashes presumably due to a memory issue. On v0.1.66-0.1.70 the model fails to load.

Environment and Context

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping: 0
CPU MHz: 2299.998
BogoMIPS: 4599.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 2 MiB
L3 cache: 45 MiB
NUMA node0 CPU(s): 0-15

$ uname -a

Linux username-tensorflow-gpu 5.10.0-23-cloud-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux

$ python3 --version
Python 3.10.10

$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu

$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110

Failure Information (for bugs)

Kernel crashes, or model fails to load.

Crash:

Fails to load:

Steps to Reproduce

Use CUDA 12.1 and try run below:

%%time
n_gpu_layers = 8 # Change this value based on your model and your GPU VRAM pool.
n_batch = 128 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

llm = LlamaCpp(
    model_path="models/ggml-model-q4_0.bin",
    n_threads=8,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=2048,
    callback_manager=callback_manager, 
    verbose=True
)

Please help! I am using the model from https://huggingface.co/frankenstyle/ggml-q4-models/tree/main/models/llama/7B.

Thank you!

The text was updated successfully, but these errors were encountered:

jllllll · 2023-07-14T04:26:10Z

Likely an outdated model. Use this one: https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_0.bin

jaymon0703 · 2023-07-14T04:53:04Z

@jllllll You are a lifesaver. Now to find an appropriate model as that one gives lofty answers to my QA over Docs project.

EDIT: After some more testing, it seems quite reasonable. Thank you!

jaymon0703 closed this as completed Jul 14, 2023

gjmulder added llama.cpp Problem with llama.cpp shared lib model Model specific issue and removed llama.cpp Problem with llama.cpp shared lib labels Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477

llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477

jaymon0703 commented Jul 14, 2023 •

edited

Loading

jllllll commented Jul 14, 2023

Uh oh!

jaymon0703 commented Jul 14, 2023 •

edited

Loading

Uh oh!

llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477

llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477

Comments

jaymon0703 commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

jllllll commented Jul 14, 2023

Uh oh!

jaymon0703 commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaymon0703 commented Jul 14, 2023 •

edited

Loading

jaymon0703 commented Jul 14, 2023 •

edited

Loading