Closed
Description
Describe the bug
When generating responses using a local llm, cortex-cpp still seems to use CPU.
https://discord.com/channels/1107178041848909847/1149558035971321886/1253148982188838954
To Reproduce
- Install cortex-cpp and the CUDA toolkit locally.
- Turn on GPU acceleration
- Generate responses using a local llm.
- Observe high CPU usage.
Expected behavior
Since cortex-cpp is using a local llm and the CUDA toolkit, it should primarily use the GPU for processing and not consume as much CPU.
Desktop
- OS: Linux
Additional context
The logs indicate that 32 out of 33 layers are offloaded to the GPU, but 1 layer is still processed on the CPU. This behavior will be investigated further.
Metadata
Metadata
Assignees
Type
Projects
Status
Completed