bug: Cortex-cpp continues to have 1 layer offload to CPU while using GPU

**Describe the bug**
When generating responses using a local llm, cortex-cpp still seems to use CPU.
https://discord.com/channels/1107178041848909847/1149558035971321886/1253148982188838954

**To Reproduce**
1. Install cortex-cpp and the CUDA toolkit locally.
2. Turn on GPU acceler‌‌atio‌n
3. Generate responses using a local llm.
4. Observe high CPU usage.

**Expected behavior**
Since cortex-cpp is using a local llm and the CUDA toolkit, it should primarily use the GPU for processing and not consume as much CPU.

**Desktop**
 - OS: Linux

**Additional context**
The logs indicate that 32 out of 33 layers are offloaded to the GPU, but 1 layer is still processed on the CPU. This behavior will be investigated further.
![image](https://github.com/janhq/cortex.cpp/assets/64197333/214346a5-03a5-4644-b9b8-ae0a15802445)
![image](https://github.com/janhq/cortex.cpp/assets/64197333/fc4353fc-e910-4393-891f-865898a22cd8)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Cortex-cpp continues to have 1 layer offload to CPU while using GPU #1104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Cortex-cpp continues to have 1 layer offload to CPU while using GPU #1104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions