Skip to content

Huge difference in performance between llama.cpp and llama-cpp-python #1447

Closed
@kseyhan

Description

@kseyhan

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ X] I carefully followed the README.md.
  • [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

I'm running a bot on Libera IRC and the difference between llama.cpp's response time compared to the llama-cpp-python one is pretty huge when maxing out the context lenght.

this is how i run llama.cpp which with the latest update results in a response time of 3 seconds for my bot.
./server -t 8 -a llama-3-8b-instruct -m ./Meta-Llama-3-8B-Instruct-Q6_K.gguf -c 8192 -ngl 100 --timeout 10

this is how i run llama-cpp-python which results in a response time of 18 seconds for my bot
python3 -m llama_cpp.server --model ./Meta-Llama-3-8B-Instruct-Q6_K.gguf --n_threads 8 --n_gpu_layers -1 --n_ctx 8192

Am i doing something wrong or is this normal?

Environment and Context

i experienced that behaviour on linux and windows if self compiled or using the pre compiled wheels

  • Physical (or virtual) hardware you are using, e.g. for Linux:
    CPU: Model name: 13th Gen Intel(R) Core(TM) i5-13600K
    GPU: VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090]

  • Operating System, e.g. for Linux i'm at right now:

Linux b6.8.8-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Apr 27 17:53:31 UTC 2024 x86_64 GNU/Linu

  • SDK version, e.g. for Linux:
$ python3 --version = Python 3.11.9
$ make --version = GNU Make 4.4.1
$ g++ --version = g++ (GCC) 14.0.1 20240411 (Red Hat 14.0.1-0) 
nvcc makes use of gcc 13 = g++-13 (Homebrew GCC 13.2.0) 13.2.0
export NVCC_PREPEND_FLAGS='-ccbin /home/linuxbrew/.linuxbrew/bin/g++-13'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions