Skip to content

Commit 73a346c

Browse files
authored
Merge pull request #598 from srv1n/add-reranker-example
Added Reranker example
2 parents 773d2c0 + d789cac commit 73a346c

File tree

6 files changed

+451
-1
lines changed

6 files changed

+451
-1
lines changed

Cargo.lock

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ members = [
44
"llama-cpp-sys-2",
55
"llama-cpp-2",
66
"examples/embeddings",
7-
"examples/simple",
7+
"examples/simple", "examples/reranker",
88
]
99

1010
[workspace.dependencies]

examples/reranker/Cargo.toml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[package]
2+
name = "reranker"
3+
version = "0.1.86"
4+
edition = "2021"
5+
6+
[dependencies]
7+
llama-cpp-2 = { path = "../../llama-cpp-2", version = "0.1.86" }
8+
hf-hub = { workspace = true }
9+
clap = { workspace = true, features = ["derive"] }
10+
anyhow = { workspace = true }
11+
encoding_rs = { workspace = true }
12+
13+
[features]
14+
cuda = ["llama-cpp-2/cuda"]
15+
metal = ["llama-cpp-2/metal"]
16+
native = ["llama-cpp-2/native"]
17+
vulkan = ["llama-cpp-2/vulkan"]
18+
19+
[lints]
20+
workspace = true

examples/reranker/README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Rust Reranker Implementation
2+
3+
A Rust implementation of cross-encoder based reranking using llama-cpp-2. Cross-encoder reranking is a more accurate way to determine similarity between queries and documents compared to traditional embedding-based approaches.
4+
5+
## Overview
6+
7+
This implementation adds a new pooling type `LLAMA_POOLING_TYPE_RANK` which enables cross-encoder based reranking. Unlike traditional embedding approaches that encode query and document separately, this method:
8+
9+
- Processes query and document pairs together in a single pass
10+
- Directly evaluates semantic relationships between the pairs
11+
- Outputs raw similarity scores indicating relevance
12+
13+
## Installation
14+
15+
```bash
16+
# Follow instructions to clone repo.
17+
# Navigate to examples reranker
18+
cd examples/reranker
19+
20+
# Build the project
21+
cargo build --release
22+
```
23+
24+
## Usage
25+
26+
### Command Line Interface
27+
28+
```bash
29+
cargo run --release -- \  ✔ │ 5s │ 12:48:35
30+
--model-path "models/bge-reranker-v2-m3.gguf" \
31+
--query "what is panda?" \
32+
--documents "hi" \
33+
--documents "it's a bear" \
34+
--documents "The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." \
35+
--pooling rank
36+
```
37+
Should output(with bge-reranker-v2-m3-Q5_0):
38+
rerank score 0: -6.551
39+
rerank score 1: -3.802
40+
rerank score 2: 4.522
41+
42+
### CLI Arguments
43+
44+
- `--model-path`: Path to the GGUF model file
45+
- `--query`: The search query
46+
- `--documents`: One or more documents to rank against the query
47+
- `--pooling`: Pooling type (options: none, mean, rank)
48+
49+
### Pooling Types
50+
51+
- `rank`: Performs cross-encoder reranking
52+
53+
54+
Note: The raw scores are not normalized through a sigmoid function. If you need scores between 0-1, you'll need to implement sigmoid normalization in your application code.
55+
56+
# Additional notes
57+
58+
- Query and documents are concatenated using the format <bos>query</eos><sep>answer</eos>
59+
60+
## Supported Models
61+
62+
Some tested models:
63+
64+
- [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3)
65+
- [jinaai/jina-reranker-v1-tiny-en](https://huggingface.co/jinaai/jina-reranker-v1-tiny-en)
66+
67+
Not tested others, but anything supported by llama.cpp should work.
68+
69+
## Implementation Details
70+
71+
This is a close Rust implementation of the reranker implementation discussed in [llama.cpp PR #9510](https://github.com/ggerganov/llama.cpp/pull/9510).
72+
73+
## Potential issues
74+
75+
The bos, eos, sep tokens are being hardcoded. We need to ideally get it from the model and build out the prompts based on each specific model.

0 commit comments

Comments
 (0)