Skip to content

bug: Cortex-cpp continues to have 1 layer offload to CPU while using GPU #1104

Closed
@Van-QA

Description

@Van-QA

Describe the bug
When generating responses using a local llm, cortex-cpp still seems to use CPU.
https://discord.com/channels/1107178041848909847/1149558035971321886/1253148982188838954

To Reproduce

  1. Install cortex-cpp and the CUDA toolkit locally.
  2. Turn on GPU acceler‌‌atio‌n
  3. Generate responses using a local llm.
  4. Observe high CPU usage.

Expected behavior
Since cortex-cpp is using a local llm and the CUDA toolkit, it should primarily use the GPU for processing and not consume as much CPU.

Desktop

  • OS: Linux

Additional context
The logs indicate that 32 out of 33 layers are offloaded to the GPU, but 1 layer is still processed on the CPU. This behavior will be investigated further.
image
image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Completed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions