Skip to content

llama-cpp manually is noticeably faster than running it through the python API #148

Closed as not planned
@rnwang04

Description

@rnwang04

similar to #71 (comment), I found on my machine, LLaMA CPP is 27ms per token, and the python bindings are 68ms per token so it's much slower.
I wonder is there any idea for this performance degradation? Is such degradation normal? Or is there any way to close this gap?
Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already existsperformance

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions