-
Notifications
You must be signed in to change notification settings - Fork 1.1k
llama-cpp-python 0.3.1 didn't use GPU( #1785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have the same issue |
Same here |
I though I provoked the problem, but apparently I might not be the only one. The model runs on CPU, as the GPU load / memory usage sticks to 0. Reporting my code / logs llama-cpp-python version: 0.3.1 code to load / use Llama model: def run_model(text: str) -> str:
model_name = 'bartowski/Mistral-Nemo-Instruct-2407-GGUF'
model = Llama.from_pretrained(
model_name,
cache_dir=models_root,
filename='Mistral-Nemo-Instruct-2407-Q6_K_L.gguf',
# verbose=False,
n_gpu_layers=-1,
n_ctx=10*1024,
main_gpu=1,
)
output = model(prompt=f'[INST]{text}[/INST]', echo=True, max_tokens=None)
summary = output['choices'][0]['text']
e_index = summary.find('[/INST]')
summary = summary[e_index+7:]
return summary nvidia-smi output:
Model initialization log:
|
That helped me solve the problem. For Windows 11 WSL2 Ubuntu 24.04 LTS (Yes, now it doesn't cause Windows to freeze):
sudo apt-get -y update && sudo apt-get -y upgrade
$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda-toolkit-12-8
$ sudo apt-get -y install cmake python3-pip cyrl libopenblas-dev libssl-dev
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]' --break-system-packages Note: UPDATE 09/04/2025: Added OpenSSL libraries required for compilation and installation of Llama-cpp. |
Working on Ubuntu 20
|
@AleefBilal Do you have Ubuntu 22.04 as your primary operating system? |
@blademoon No, it's Ubuntu 20 |
@AleefBilal I used the latest version of Cuda available. Have you already tried repeating the suggested solution? |
@artyomboyko I'm using the solution that I've suggested |
@AleefBilal OK |
This works for me, thanks so much! |
Bro I wasted whole 2 days trying everything and this worked god bless you! |
@lukaLLM You are not the only one who wasted days. Don't thank me, thank the open source community. Hope you be a part of it as well. :) |
Yeah plan to actually do it when I get some experience! |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Run Gemma-27b-it on GPU.
test2.py.txt
Current Behavior
Run model on CPU instead:
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
$ lscpu
Nvidia driver and tools:
$ uname -a
Linux MSK-PC-01 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ python3 --version
Python 3.10.12
$ make --version
$ g++ --version
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
pip install -U transformers bitsandbytes gradio accelerate
pip install llama-cpp-python
python3 test2.py
Note: Many issues seem to be regarding functional or performance issues / differences with
llama.cpp
. In these cases we need to confirm that you're comparing against the version ofllama.cpp
that was built with your python package, and which parameters you're passing to the context.Try the following:
git clone https://github.com/abetlen/llama-cpp-python
cd llama-cpp-python
rm -rf _skbuild/
# delete any old buildspython -m pip install .
cd ./vendor/llama.cpp
cmake
llama.cpp./main
with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, log an issue with llama.cppFailure Logs
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.
Example environment info:
The text was updated successfully, but these errors were encountered: