-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
LocalAI version
v2.24.2
Environment
LXC under proxmox
Linux localAI 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 x86_64 x86_64 GNU/Linux
System has 2 GPUs and the 1st one is disabled.
card0 is a GTX 750
card1 is a Quadro P4000 (PCI passthrough)
nVidia v550.142
CUDA 12.4
Problem
localAI is not detecting the GPU driver despite recognizing the GPUs in the node
I suspect localAI uses the information from the first GPU to detect which driver is used. But since in this case card0 is not passed through, it might not be able to determine the driver correctly.
steps
using PCI passthrough in the host server, and installing nvidia drivers both in the host and the guest. Confirming that the drivers are installed correctly and running the install script. The installer correctly reports using the GPU, but the debug logs indicate that the driver is not detected.
Expected
Detection of the nvidia drivers on card1
Logs
The first 2 warnings pertain to a 2nd GPU installed on the host, but not passedthrough to the LXC
WARNING: failed to read int from file: open /sys/class/drm/card0/device/numa_node: no such file or directory
WARNING: error parsing the pci address "simple-framebuffer.0"
6:27AM DBG GPU count: 2 2 GPUs are indeed detected
6:27AM DBG GPU: card #0 @simple-framebuffer.0 this GPU is ignored
6:27AM DBG GPU: card #1 @0000:03:00.0 -> driver: '' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GP104GL [Quadro P4000]' this is the main GPU. it is recognized and enabled. However, the driver field is empty
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142 Driver Version: 550.142 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro P4000 Off | 00000000:03:00.0 Off | N/A |
| 46% 29C P8 5W / 105W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
GPU usage is always at 0% and there are never any active processes .
Additional context
localAI runs but is extremely slow because the GPU is detected but not activated. The nvidia drivers are active and detect the GPU correctly, but localAI apparently cannot.