Skip to content

Failure to detect GPU driver #4571

@jobidon

Description

@jobidon

LocalAI version
v2.24.2

Environment
LXC under proxmox
Linux localAI 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 x86_64 x86_64 GNU/Linux
System has 2 GPUs and the 1st one is disabled.
card0 is a GTX 750
card1 is a Quadro P4000 (PCI passthrough)
nVidia v550.142
CUDA 12.4

Problem
localAI is not detecting the GPU driver despite recognizing the GPUs in the node
I suspect localAI uses the information from the first GPU to detect which driver is used. But since in this case card0 is not passed through, it might not be able to determine the driver correctly.

steps
using PCI passthrough in the host server, and installing nvidia drivers both in the host and the guest. Confirming that the drivers are installed correctly and running the install script. The installer correctly reports using the GPU, but the debug logs indicate that the driver is not detected.

Expected
Detection of the nvidia drivers on card1

Logs
The first 2 warnings pertain to a 2nd GPU installed on the host, but not passedthrough to the LXC
WARNING: failed to read int from file: open /sys/class/drm/card0/device/numa_node: no such file or directory
WARNING: error parsing the pci address "simple-framebuffer.0"
6:27AM DBG GPU count: 2 2 GPUs are indeed detected
6:27AM DBG GPU: card #0 @simple-framebuffer.0 this GPU is ignored
6:27AM DBG GPU: card #1 @0000:03:00.0 -> driver: '' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GP104GL [Quadro P4000]' this is the main GPU. it is recognized and enabled. However, the driver field is empty

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro P4000                   Off |   00000000:03:00.0 Off |                  N/A |
| 46%   29C    P8              5W /  105W |       2MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

GPU usage is always at 0% and there are never any active processes .

Additional context
localAI runs but is extremely slow because the GPU is detected but not activated. The nvidia drivers are active and detect the GPU correctly, but localAI apparently cannot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions