Skip to content

Limit default context size in the node template #435

@CrossPr0duct

Description

@CrossPr0duct

Issue description

When loading a 8B Model in the npm create node-llama-cpp@latest it saturates memory to 24GB.

Expected Behavior

should only use 8~GB of vram

Actual Behavior

Shouldn't this only use 8GB of vram. I am using Q8.
My GPU Memory looks like 3 GB to start then jumps to 24

Steps to reproduce

Just install the latest npm create node-llama-cpp@latest and create an app run npm install then npm start and load the 8GB llama model.

My Environment

OS: Windows 10.0.26100 (x64) <-- says windows 10? but actually 11
Node: 22.13.0 (x64)
TypeScript: 5.7.3
node-llama-cpp: 3.6.0

CUDA: available
Vulkan: available

CUDA device: NVIDIA GeForce RTX 4090
CUDA used VRAM: 6.38% (1.53GB/23.99GB)
CUDA free VRAM: 93.61% (22.46GB/23.99GB)

Vulkan device: NVIDIA GeForce RTX 4090
Vulkan used VRAM: 6.38% (1.53GB/23.99GB)
Vulkan free VRAM: 93.61% (22.46GB/23.99GB)
Vulkan unified memory: 512MB (2.08%)

CPU model: AMD Ryzen 9 7900X 12-Core Processor
Math cores: 12
Used RAM: 50.15% (63.75GB/127.12GB)
Free RAM: 49.84% (63.37GB/127.12GB)
Used swap: 51.24% (76.41GB/149.12GB)
Max swap size: 149.12GB
mmap: supported

Additional Context

No response

Relevant Features Used

  • Metal support
  • CUDA support
  • Vulkan support
  • Grammar
  • Function calling

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, and I know how to start.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingreleased

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions