perf: Investigate performance discrepancy with llama-rs - 1.5x-2x slower

Preliminary results show that `llama.cpp` is 1.5x-2x _slower_ than `llama-rs`. They were both checked to compile with the same arch flags and use the same gnu toolchain.

Summary (on `Vicuna 13B, 2048 ctx size, 256 predict tokens`):
- `llama.cpp`: **430.44 ms** per run
- `llama-rs`: per_token_duration: **272.793ms**

[Detailed results](https://github.com/rustformers/llama-rs/issues/131#issuecomment-1506246836)

An interesting observation is that CPU util is _lower_ on llama-rs.

System Info:

llama.cpp
```
> make
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0
I CXX:      g++ (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0

./main
system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
```

llama-rs
```
warning: Using gnu
warning: Using MAVX
warning: Using AVX2
warning: Using FMA
warning: Using F16C
warning: Using SSE3
```
No BLAS.

Notes: llama-rs bench runs on [my branch](https://github.com/rustformers/llama-rs/pull/129).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Investigate performance discrepancy with llama-rs - 1.5x-2x slower #932

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf: Investigate performance discrepancy with llama-rs - 1.5x-2x slower #932

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions