Closed as not planned
Closed as not planned
Description
similar to #71 (comment), I found on my machine, LLaMA CPP is 27ms per token, and the python bindings are 68ms per token so it's much slower.
I wonder is there any idea for this performance degradation? Is such degradation normal? Or is there any way to close this gap?
Thanks a lot!