|
| 1 | +# Rust Reranker Implementation |
| 2 | + |
| 3 | +A Rust implementation of cross-encoder based reranking using llama-cpp-2. Cross-encoder reranking is a more accurate way to determine similarity between queries and documents compared to traditional embedding-based approaches. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This implementation adds a new pooling type `LLAMA_POOLING_TYPE_RANK` which enables cross-encoder based reranking. Unlike traditional embedding approaches that encode query and document separately, this method: |
| 8 | + |
| 9 | +- Processes query and document pairs together in a single pass |
| 10 | +- Directly evaluates semantic relationships between the pairs |
| 11 | +- Outputs raw similarity scores indicating relevance |
| 12 | + |
| 13 | +## Installation |
| 14 | + |
| 15 | +```bash |
| 16 | +# Follow instructions to clone repo. |
| 17 | +# Navigate to examples reranker |
| 18 | +cd examples/reranker |
| 19 | + |
| 20 | +# Build the project |
| 21 | +cargo build --release |
| 22 | +``` |
| 23 | + |
| 24 | +## Usage |
| 25 | + |
| 26 | +### Command Line Interface |
| 27 | + |
| 28 | +```bash |
| 29 | +cargo run --release -- \ ✔ │ 5s │ 12:48:35 |
| 30 | + --model-path "models/bge-reranker-v2-m3.gguf" \ |
| 31 | + --query "what is panda?" \ |
| 32 | + --documents "hi" \ |
| 33 | + --documents "it's a bear" \ |
| 34 | + --documents "The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." \ |
| 35 | + --pooling rank |
| 36 | +``` |
| 37 | +Should output(with bge-reranker-v2-m3-Q5_0): |
| 38 | +rerank score 0: -6.551 |
| 39 | +rerank score 1: -3.802 |
| 40 | +rerank score 2: 4.522 |
| 41 | + |
| 42 | +### CLI Arguments |
| 43 | + |
| 44 | +- `--model-path`: Path to the GGUF model file |
| 45 | +- `--query`: The search query |
| 46 | +- `--documents`: One or more documents to rank against the query |
| 47 | +- `--pooling`: Pooling type (options: none, mean, rank) |
| 48 | + |
| 49 | +### Pooling Types |
| 50 | + |
| 51 | +- `rank`: Performs cross-encoder reranking |
| 52 | + |
| 53 | + |
| 54 | +Note: The raw scores are not normalized through a sigmoid function. If you need scores between 0-1, you'll need to implement sigmoid normalization in your application code. |
| 55 | + |
| 56 | +# Additional notes |
| 57 | + |
| 58 | +- Query and documents are concatenated using the format <bos>query</eos><sep>answer</eos> |
| 59 | + |
| 60 | +## Supported Models |
| 61 | + |
| 62 | +Some tested models: |
| 63 | + |
| 64 | +- [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) |
| 65 | +- [jinaai/jina-reranker-v1-tiny-en](https://huggingface.co/jinaai/jina-reranker-v1-tiny-en) |
| 66 | + |
| 67 | +Not tested others, but anything supported by llama.cpp should work. |
| 68 | + |
| 69 | +## Implementation Details |
| 70 | + |
| 71 | +This is a close Rust implementation of the reranker implementation discussed in [llama.cpp PR #9510](https://github.com/ggerganov/llama.cpp/pull/9510). |
| 72 | + |
| 73 | +## Potential issues |
| 74 | + |
| 75 | +The bos, eos, sep tokens are being hardcoded. We need to ideally get it from the model and build out the prompts based on each specific model. |
0 commit comments