Skip to content

clpeak and llama.cpp stuck at 100% CPU on 6.8.5 kernel #726

Closed
@notsyncing

Description

@notsyncing

As #710, @Disty0 writes:

New 6.8.5, 6.8.6 and 6.6.27 LTS kernels are unable to run using the GPU.
It detects and tries to run on the GPU but gets stuck with 100% single CPU core usage.
Happens on any OpenCL or SYCL app. (Kernel 6.8 is using the workaround provided in this thread.)

You can downgrade to Linux 6.8.4 for Arch Linux with these packages:
linux 6.8.4: https://archive.archlinux.org/packages/l/linux/linux-6.8.4.arch1-1-x86_64.pkg.tar.zst
linux-headers 6.8.4: https://archive.archlinux.org/packages/l/linux-headers/linux-headers-6.8.4.arch1-1-x86_64.pkg.tar.zst

This happens to me as well on both llama.cpp and clpeak.

clpeak output:

$ NEOReadDebugKeys=1 PrintDebugMessages=1 LogWaitingForCompletion=1 EventsDebugEnable=1 PrintKmdTimes=1 LogZEInfo=1 clpeak
WARNING: Failed to request OCL Turbo Boost
computeUnitsUsedForScratch: 4096
hwInfo: {512, 4096}: (16, 1, 32)

Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Arc(TM) A770 Graphics
    Driver version  : 24.09.28717.17 (Linux x64)
    Compute units   : 512
    Clock frequency : 2400 MHz
DeviceBinaryFormat::zebin : Unhandled SHT_NOTE section : .note.intelgt.metrics currently supports only : .note.intelgt.compat.
DeviceBinaryFormat::zebin::.ze_info : Minor version : 40 is newer than available in decoder : 39 - some features may be skipped


    Global memory bandwidth (GBPS)

Waiting for task count 0 at location 0x7fb1fbd65000 with timeout 0. Current value: 0

Waiting completed. Current value: 0

Waiting for task count 1 at location 0x7fb1fbd5f000 with timeout 0. Current value: 0

Waiting completed. Current value: 1
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0  totalPrivateMemorySize: 0
perHwThreadScratchSize: 0        totalScratchSize: 0
perHwThreadPrivateScratchSize: 0         totalPrivateScratchSize: 0
      float   : DIM:1   GWS:(33550336, 1, 1)    ELWS:(256, 1, 1)        Offset:(0, 0, 0)        AGWS:(33550336, 1, 1) LWS:(256, 1, 1) TWGS:(131056, 1, 1)     NWGS:(131056, 1, 1)     SWGS:(0, 0, 0)
devMode = 3, taskMode = 3.
devMode = 3, taskMode = 3.
preemption = 3.
DIM:1   GWS:(33550336, 1, 1)    ELWS:(256, 1, 1)        Offset:(0, 0, 0)        AGWS:(33550336, 1, 1)   LWS:(256, 1, 1)       TWGS:(131056, 1, 1)     NWGS:(131056, 1, 1)     SWGS:(0, 0, 0)
devMode = 3, taskMode = 3.
devMode = 3, taskMode = 3.
preemption = 3.

Waiting for task count 2 at location 0x7fb1fbd65000 with timeout 96. Current value: 0

Waiting completed. Current value: 0

Waiting for task count 2 at location 0x7fb1fbd65000 with timeout 0. Current value: 0

Then it stuck here and clpeak process consumes one cpu core (100% usage).

perf record -a when it stuck reports:

Samples: 537K of event 'cycles:P', Event count (approx.): 456413846417
Overhead       Samples  Command          Shared Object                                   Symbol
   9.08%         36420  clpeak           [kernel.kallsyms]                               [k] clear_bhb_loop                                                                                                                                ◆
   5.43%         21754  clpeak           [kernel.kallsyms]                               [k] __schedule                                                                                                                                    ▒
   4.73%         18984  clpeak           libc.so.6                                       [.] __sched_yield                                                                                                                                 ▒
   4.65%         18636  clpeak           [kernel.kallsyms]                               [k] _raw_spin_lock                                                                                                                                ▒
   4.64%         18584  clpeak           [vdso]                                          [.] __vdso_clock_gettime                                                                                                                          ▒
   4.61%         18482  clpeak           [kernel.kallsyms]                               [k] native_sched_clock                                                                                                                            ▒
   4.48%         17981  clpeak           [kernel.kallsyms]                               [k] psi_account_irqtime                                                                                                                           ▒
   4.01%         16069  clpeak           [kernel.kallsyms]                               [k] update_curr                                                                                                                                   ▒
   3.71%         14894  clpeak           [kernel.kallsyms]                               [k] syscall_exit_to_user_mode                                                                                                                     ▒
   3.69%         14808  clpeak           [kernel.kallsyms]                               [k] __calc_delta.constprop.0                                                                                                                      ▒
   3.66%         14687  clpeak           [kernel.kallsyms]                               [k] entry_SYSRETQ_unsafe_stack                                                                                                                    ▒
   3.50%         14043  clpeak           [kernel.kallsyms]                               [k] pick_next_task_fair                                                                                                                           ▒
   3.23%         12965  clpeak           libigdrcl.so                                    [.] 0x000000000005f724                                                                                                                            ▒
   2.76%         11064  clpeak           [kernel.kallsyms]                               [k] pick_eevdf                                                                                                                                    ▒
   2.61%         10453  clpeak           [kernel.kallsyms]                               [k] do_syscall_64                                                                                                                                 ▒
   2.40%          9623  clpeak           [kernel.kallsyms]                               [k] entry_SYSCALL_64                                                                                                                              ▒
   2.25%          9021  clpeak           [kernel.kallsyms]                               [k] update_min_vruntime                                                                                                                           ▒
   1.83%          7352  clpeak           [kernel.kallsyms]                               [k] update_curr_se                                                                                                                                ▒
   1.78%          7126  clpeak           [kernel.kallsyms]                               [k] entry_SYSCALL_64_after_hwframe                                                                                                                ▒
   1.56%          6269  clpeak           [kernel.kallsyms]                               [k] __cgroup_account_cputime                                                                                                                      ▒
   1.52%          6079  clpeak           [kernel.kallsyms]                               [k] record_times                                                                                                                                  ▒
   1.51%          6064  clpeak           [kernel.kallsyms]                               [k] update_rq_clock                                                                                                                               ▒
   1.49%          5981  clpeak           [kernel.kallsyms]                               [k] do_sched_yield                                                                                                                                ▒
   1.11%          4453  clpeak           [kernel.kallsyms]                               [k] rcu_note_context_switch                                                                                                                       ▒
   1.01%          4037  clpeak           libigdrcl.so                                    [.] 0x000000000005f726                                                                                                                            ▒
   0.91%          3651  clpeak           [kernel.kallsyms]                               [k] yield_task_fair                                                                                                                               ▒
   0.83%          3346  clpeak           [kernel.kallsyms]                               [k] entry_SYSCALL_64_safe_stack                                                                                                                   ▒
   0.82%          3285  clpeak           [kernel.kallsyms]                               [k] raw_spin_rq_lock_nested                                                                                                                       ▒
   0.78%          3113  clpeak           [kernel.kallsyms]                               [k] syscall_return_via_sysret                                                                                                                     ▒
   0.75%          3008  clpeak           [kernel.kallsyms]                               [k] schedule                                                                                                                                      ▒
   0.72%         64724  swapper          [kernel.kallsyms]                               [k] intel_idle                                                                                                                                    ▒
   0.70%          2792  clpeak           libc.so.6                                       [.] clock_gettime@@GLIBC_2.17                                                                                                                     ▒
   0.58%          2338  clpeak           [kernel.kallsyms]                               [k] cpuacct_charge                                                                                                                                ▒
   0.53%          2107  clpeak           [kernel.kallsyms]                               [k] sched_clock                                                                                                                                   ▒
   0.51%          2060  clpeak           [kernel.kallsyms]                               [k] sched_clock_cpu                                                                                                                               ▒
   0.45%          1805  clpeak           libstdc++.so.6.0.33                             [.] std::chrono::_V2::system_clock::now()                                                                                                         ▒
   0.42%          1690  clpeak           [kernel.kallsyms]                               [k] _raw_spin_unlock                                                                                                                              ▒
   0.39%          1567  clpeak           libigdrcl.so                                    [.] 0x0000000000534467                                                                                                                            ▒
   0.39%          1563  clpeak           [kernel.kallsyms]                               [k] x64_sys_call                                                                                                                                  ▒
   0.37%          1469  clpeak           [kernel.kallsyms]                               [k] __list_add_valid_or_report                                                                                                                    ▒
   0.35%          1418  clpeak           [kernel.kallsyms]                               [k] check_cfs_rq_runtime                                                                                                                          ▒
   0.35%          1402  clpeak           [kernel.kallsyms]                               [k] pick_next_entity                                                                                                                              ▒
   0.34%          1368  clpeak           [kernel.kallsyms]                               [k] cgroup_rstat_updated                                                                                                                          ▒
   0.34%          1364  clpeak           [kernel.kallsyms]                               [k] __list_del_entry_valid_or_report                                                                                                              ▒
   0.27%          1098  clpeak           [kernel.kallsyms]                               [k] syscall_exit_to_user_mode_prepare                                                                                                             ▒
   0.23%          1856  pgrep            libc.so.6                                       [.] __strncpy_evex                                                                                                                                ▒
   0.22%           882  clpeak           libigdrcl.so                                    [.] 0x000000000056e33e                                                                                                                            ▒
   0.19%           775  clpeak           [kernel.kallsyms]                               [k] __x64_sys_sched_yield                 

System information:

kernel 6.8.5-301.fc40.x86_64
Fedora Kinoite 40.20240419.n.0
intel-compute-runtime 24.09.28717.17-1.fc40.x86_64 (this also happens on latest 24.13.29138.7, installed in a ubuntu 22.04 container)

CPU: Intel Core i9-10940X
GPU: Intel Arc A770 16GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions