You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I am trying to run a llama-cpp-python model within a Docker container based on the nvidia/cuda:12.5.0-devel-ubuntu22.04 image. I expect CUDA to be detected and the model to utilize the GPU for inference without needing to specify --gpus all when running the container.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ .
RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama-cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 7
CPU MHz: 2200.186
BogoMIPS: 4400.37
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 12 MiB
L3 cache: 38.5 MiB
NUMA node0 CPU(s): 0-23
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW
sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx f
xsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology
nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2ap
ic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpci
d_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bm
i2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx
512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabili
ties
$ uname -a
Linux dl-big-poc 5.10.0-33-cloud-amd64 #1 SMP Debian 5.10.226-1 (2024-10-03) x86_64 GNU/Linux
SDK version:
$ python3 --version
Python 3.10.14
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Failure Information (for bugs)
This appears to be a bug related to CUDA detection within the Docker container when not using --gpus all.
Steps to Reproduce
Build the Docker image using the provided Dockerfile:
FROM nvidia/cuda:12.5.0-devel-ubuntu22.04
SHELL ["/bin/bash", "-c"]
# Set the working directory *before* copying filesWORKDIR /workspace
# Install necessary build toolsRUN apt-get update && apt-get upgrade -y \
&& apt-get install -y git build-essential \
python3 python3-pip gcc wget \
ocl-icd-opencl-dev opencl-headers clinfo \
libclblast-dev libopenblas-dev \
&& mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
ENV CUDA_DOCKER_ARCH=all
ENV GGML_CUDA=1
RUN python3 -m pip install --upgrade pip pytest cmake pydantic uvicorn fastapi
# Install llama-cpp-python with CUDA supportRUN CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1 \
pip install llama-cpp-python==0.3.2 --no-cache-dir --force-reinstall --upgrade
# Set Gunicorn timeoutENV GUNICORN_CMD_ARGS="--workers 1 --timeout 300"# Set default environment variablesENV MODEL_PATH="./model/test-llama-8B-abliterated.Q6_K.gguf"ENV N_CTX="8192"ENV N_GPU_LAYERS="-1"ENV MAIN_GPU="1"ENV N_THREADS="4"ENV MAX_TOKENS="512"ENV TEMPERATURE="0.0"# Copy filesCOPY main.py ./
COPY inference.py ./
COPY ./model ./model
# Run your FastAPI app on container startupCMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Run the container:
Bash
docker run -p 8000:8000 -e GUNICORN_CMD_ARGS="--workers 1 --timeout 300" -e MODEL_PATH="./model/test-llama-8B-abliterated.Q6_K.gguf" -e N_CTX=8192 -e N_GPU_LAYERS=-1 -e MAIN_GPU=1 -e TEMPERATURE=0.0 -e N_THREADS=20 -e MAX_TOKENS=512 local-llama-fastapi
Failure Logs
docker run -p 8000:8000 -e GUNICORN_CMD_ARGS="--workers 1 --timeout 300" -e MODEL_PATH="./model/test-llama-8B-abliterated.Q6_K.gguf" -e N_CTX=8192 -e N_GPU_LAYERS=-1 -e MAIN_GPU=1 -e TEMPERATURE=0.0 -e N_THREADS=20 -e MAX_TOKENS=512 local-llama-fastapi
==========
== CUDA ==
==========
CUDA Version 12.5.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/_ctypes_extensions.py", line 67, in load_shared_library
return ctypes.CDLL(str(lib_path), **cdll_args) # type: ignore
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcuda.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/uvicorn/main.py", line 412, in main
run(
File "/usr/local/lib/python3.10/dist-packages/uvicorn/main.py", line 579, in run
server.run()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/server.py", line 66, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/server.py", line 70, in serve
await self._serve(sockets)
File "/usr/local/lib/python3.10/dist-packages/uvicorn/server.py", line 77, in _serve
config.load()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/config.py", line 435, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.10/dist-packages/uvicorn/importer.py", line 19, in import_from_string
module = importlib.import_module(module_str)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/workspace/main.py", line 8, in <module>
from llama_cpp import Llama
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/__init__.py", line 1, in <module>
from .llama_cpp import *
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama_cpp.py", line 38, in <module>
_lib = load_shared_library(_lib_base_name, _base_path)
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/_ctypes_extensions.py", line 69, in load_shared_library
raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}")
RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory
The text was updated successfully, but these errors were encountered:
I expect CUDA to be detected and the model to utilize the GPU for inference without needing to specify --gpus all when running the container.
The --gpus all flag is required to expose GPU devices to the container, even when using NVIDIA CUDA base images - without it, the container won't have access to the GPU hardware.
@sergey21000 thanks for your prompt response. I am new to this. Then if I must specify the gpus flag all when running the docker, how do I specify this when I upload this to vertex ai. I don’t see an option to upload gcloud model and soecify it in the parameters. How does it ensure it picks all gpus
@nandhiniramanan5 Don't know about vertex.ai, but on local --gpus all is essential for docker to access cuda. However, on severs like runpod you can just host your docker container and it will utilize the GPUs by default.
Hope this helps a little, if not, let me know how you fixed it.
Expected Behavior
I am trying to run a llama-cpp-python model within a Docker container based on the
nvidia/cuda:12.5.0-devel-ubuntu22.04
image. I expect CUDA to be detected and the model to utilize the GPU for inference without needing to specify--gpus all
when running the container.WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama-cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 7
CPU MHz: 2200.186
BogoMIPS: 4400.37
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 12 MiB
L3 cache: 38.5 MiB
NUMA node0 CPU(s): 0-23
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW
sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx f
xsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology
nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2ap
ic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpci
d_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bm
i2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx
512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabili
ties
$ uname -a
Linux dl-big-poc 5.10.0-33-cloud-amd64 #1 SMP Debian 5.10.226-1 (2024-10-03) x86_64 GNU/Linux
$ python3 --version
Python 3.10.14
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Failure Information (for bugs)
This appears to be a bug related to CUDA detection within the Docker container when not using
--gpus all
.Steps to Reproduce
The text was updated successfully, but these errors were encountered: