|
1 | | -# LlamaIndex Vector_Stores Integration: Couchbase |
| 1 | +# LlamaIndex Vector Stores Integration: Couchbase |
| 2 | + |
| 3 | +This package provides Couchbase vector store integrations for LlamaIndex, offering multiple implementation options for vector similarity search based on Couchbase Server's native vector indexing capabilities. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +```bash |
| 8 | +pip install llama-index-vector-stores-couchbase |
| 9 | +``` |
| 10 | + |
| 11 | +## Available Vector Store Classes |
| 12 | + |
| 13 | +### CouchbaseSearchVectorStore |
| 14 | + |
| 15 | +Implements [Search Vector Indexes](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) using Couchbase Full-Text Search (FTS) with vector search capabilities. Ideal for hybrid searches combining vector, full-text, and geospatial searches. |
| 16 | + |
| 17 | +### CouchbaseQueryVectorStore (Recommended) |
| 18 | + |
| 19 | +Implements both [Hyperscale Vector Indexes](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) and [Composite Vector Indexes](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) using Couchbase Query Service with SQL++ and vector search functions. Supports: |
| 20 | + |
| 21 | +- **Hyperscale Vector Indexes**: Purpose-built for pure vector searches at massive scale with minimal memory footprint |
| 22 | +- **Composite Vector Indexes**: Best for combining vector similarity with scalar filters that exclude large portions of the dataset |
| 23 | + |
| 24 | +Can scale to billions of documents. Requires Couchbase Server 8.0+. |
| 25 | + |
| 26 | +### CouchbaseVectorStore (Deprecated) |
2 | 27 |
|
3 | 28 | > **Note:** `CouchbaseVectorStore` has been deprecated in version 0.4.0. Please use `CouchbaseSearchVectorStore` instead. |
| 29 | +
|
| 30 | +## Requirements |
| 31 | + |
| 32 | +- Python >= 3.9, < 4.0 |
| 33 | +- Couchbase Server 7.6+ for Search Vector Indexes |
| 34 | +- Couchbase Server 8.0+ for Hyperscale and Composite Vector Indexes |
| 35 | +- couchbase >= 4.5.0 |
| 36 | + |
| 37 | +## Basic Usage |
| 38 | + |
| 39 | +### Using CouchbaseSearchVectorStore (Search Vector Indexes) |
| 40 | + |
| 41 | +```python |
| 42 | +from llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore |
| 43 | +from couchbase.cluster import Cluster |
| 44 | +from couchbase.auth import PasswordAuthenticator |
| 45 | + |
| 46 | +# Connect to Couchbase |
| 47 | +auth = PasswordAuthenticator("username", "password") |
| 48 | +cluster = Cluster("couchbase://localhost", auth) |
| 49 | + |
| 50 | +# Initialize vector store |
| 51 | +vector_store = CouchbaseSearchVectorStore( |
| 52 | + cluster=cluster, |
| 53 | + bucket_name="my_bucket", |
| 54 | + scope_name="my_scope", |
| 55 | + collection_name="my_collection", |
| 56 | + index_name="my_vector_index", |
| 57 | + text_key="text", |
| 58 | + embedding_key="embedding", |
| 59 | + metadata_key="metadata", |
| 60 | + scoped_index=True, |
| 61 | +) |
| 62 | +``` |
| 63 | + |
| 64 | +### Using CouchbaseQueryVectorStore (Hyperscale & Composite Vector Indexes) |
| 65 | + |
| 66 | +```python |
| 67 | +from llama_index.vector_stores.couchbase import ( |
| 68 | + CouchbaseQueryVectorStore, |
| 69 | + QueryVectorSearchType, |
| 70 | + QueryVectorSearchSimilarity, |
| 71 | +) |
| 72 | + |
| 73 | +# Initialize Query Service-based vector store |
| 74 | +# Works with both Hyperscale Vector Indexes (pure vector search) |
| 75 | +# and Composite Vector Indexes (vector + scalar filters) |
| 76 | +vector_store = CouchbaseQueryVectorStore( |
| 77 | + cluster=cluster, |
| 78 | + bucket_name="my_bucket", |
| 79 | + scope_name="my_scope", |
| 80 | + collection_name="my_collection", |
| 81 | + search_type=QueryVectorSearchType.ANN, # or QueryVectorSearchType.KNN |
| 82 | + similarity=QueryVectorSearchSimilarity.COSINE, # Can also use string: "cosine", "euclidean", "dot_product" |
| 83 | + nprobes=10, # Optional: number of probes for ANN search (only for ANN) |
| 84 | + text_key="text", |
| 85 | + embedding_key="embedding", |
| 86 | + metadata_key="metadata", |
| 87 | +) |
| 88 | +``` |
| 89 | + |
| 90 | +## Configuration Options |
| 91 | + |
| 92 | +### Search Types |
| 93 | + |
| 94 | +The `QueryVectorSearchType` enum defines the type of vector search to perform: |
| 95 | + |
| 96 | +- `QueryVectorSearchType.ANN` - Approximate Nearest Neighbor (recommended for large datasets) |
| 97 | +- `QueryVectorSearchType.KNN` - K-Nearest Neighbor (exact search) |
| 98 | + |
| 99 | +### Similarity Metrics |
| 100 | + |
| 101 | +The `QueryVectorSearchSimilarity` enum provides various distance metrics: |
| 102 | + |
| 103 | +- `QueryVectorSearchSimilarity.COSINE` - Cosine similarity (range: -1 to 1) |
| 104 | +- `QueryVectorSearchSimilarity.DOT` - Dot product similarity |
| 105 | +- `QueryVectorSearchSimilarity.L2` or `EUCLIDEAN` - Euclidean distance |
| 106 | +- `QueryVectorSearchSimilarity.L2_SQUARED` or `EUCLIDEAN_SQUARED` - Squared Euclidean distance |
| 107 | + |
| 108 | +You can also use lowercase strings: `"cosine"`, `"dot_product"`, `"euclidean"`, etc. |
| 109 | + |
| 110 | +## Features |
| 111 | + |
| 112 | +- **Multiple Index Types**: Support for all three Couchbase vector index types: |
| 113 | + - Hyperscale Vector Indexes (Query Service-based, 8.0+) |
| 114 | + - Composite Vector Indexes (Query Service-based, 8.0+) |
| 115 | + - Search Vector Indexes (FTS-based, 7.6+) |
| 116 | +- **Flexible Similarity Metrics**: Multiple distance metrics including: |
| 117 | + - COSINE (Cosine similarity) |
| 118 | + - DOT (Dot product) |
| 119 | + - L2 / EUCLIDEAN (Euclidean distance) |
| 120 | + - L2_SQUARED / EUCLIDEAN_SQUARED (Squared Euclidean distance) |
| 121 | +- **Metadata Filtering**: Advanced filtering capabilities using LlamaIndex MetadataFilters |
| 122 | +- **Batch Operations**: Efficient batch insertion with configurable batch sizes |
| 123 | +- **High Performance**: ANN and KNN search support for efficient nearest neighbor queries |
| 124 | +- **Massive Scalability**: Hyperscale and Composite indexes can scale to billions of documents |
| 125 | + |
| 126 | +## Implementation Details |
| 127 | + |
| 128 | +### Query Service-Based Vector Indexes (`CouchbaseQueryVectorStore`) |
| 129 | + |
| 130 | +`CouchbaseQueryVectorStore` supports both **Hyperscale Vector Indexes** and **Composite Vector Indexes**, which use the Couchbase Query Service with SQL++ queries and vector search functions. |
| 131 | + |
| 132 | +#### Hyperscale Vector Indexes |
| 133 | + |
| 134 | +Purpose-built for pure vector searches at massive scale: |
| 135 | + |
| 136 | +**When to Use:** |
| 137 | + |
| 138 | +- Pure vector similarity searches without complex scalar filtering |
| 139 | +- Content discovery, recommendations, reverse image search |
| 140 | +- Chatbot context matching (e.g., RAG workflows) |
| 141 | +- Anomaly detection in IoT sensor networks |
| 142 | +- Datasets from tens of millions to billions of documents |
| 143 | + |
| 144 | +**Key Characteristics:** |
| 145 | + |
| 146 | +- Optimized specifically for vector searches |
| 147 | +- Higher accuracy at lower quantizations |
| 148 | +- Low memory footprint (most index data on disk) |
| 149 | +- Best TCO for huge datasets |
| 150 | +- Excellent for concurrent updates and searches |
| 151 | +- Scalar values and vectors compared simultaneously |
| 152 | + |
| 153 | +#### Composite Vector Indexes |
| 154 | + |
| 155 | +Combine a Global Secondary Index (GSI) with vector search functions: |
| 156 | + |
| 157 | +**When to Use:** |
| 158 | + |
| 159 | +- Searches that combine vector similarity with scalar filters |
| 160 | +- When scalar filters can exclude large portions (>20%) of the dataset |
| 161 | +- Applications requiring compliance-based restrictions on results |
| 162 | +- Content recommendations, job searches, supply chain management |
| 163 | +- Datasets from tens of millions to billions of documents |
| 164 | + |
| 165 | +**Key Characteristics:** |
| 166 | + |
| 167 | +- Scalar filters are applied _before_ vector search, reducing vectors to compare |
| 168 | +- Efficient when scalar values have low selectivity (exclude <20% of dataset) |
| 169 | +- Can exclude nearest neighbors based on scalar values (useful for compliance) |
| 170 | +- Can scale to billions of documents |
| 171 | + |
| 172 | +#### Search Types (Both Hyperscale & Composite) |
| 173 | + |
| 174 | +- **ANN (Approximate Nearest Neighbor)**: Faster approximate search with configurable `nprobes` parameter for accuracy/speed tradeoff |
| 175 | +- **KNN (K-Nearest Neighbor)**: Exact nearest neighbor search for maximum accuracy |
| 176 | + |
| 177 | +### Search Vector Indexes (`CouchbaseSearchVectorStore`) |
| 178 | + |
| 179 | +Search Vector Indexes combine Full-Text Search (FTS) with vector search capabilities: |
| 180 | + |
| 181 | +**When to Use:** |
| 182 | + |
| 183 | +- Hybrid searches combining vector, full-text, and geospatial searches |
| 184 | +- Applications like e-commerce product search, travel recommendations, or real estate searches |
| 185 | +- Datasets up to tens of millions of documents |
| 186 | + |
| 187 | +**Key Characteristics:** |
| 188 | + |
| 189 | +- Combines semantic search with keyword and geospatial searches in a single query |
| 190 | +- Supports both scoped and global indexes |
| 191 | +- Ideal for multi-modal search scenarios |
| 192 | + |
| 193 | +### Metadata Filtering |
| 194 | + |
| 195 | +Both implementations support metadata filtering: |
| 196 | + |
| 197 | +- Filter by document attributes using standard LlamaIndex `MetadataFilters` |
| 198 | +- Supports operators: `==`, `!=`, `>`, `<`, `>=`, `<=`, `IN`, `NIN` |
| 199 | +- Combine filters with `AND`/`OR` conditions |
| 200 | + |
| 201 | +### Choosing the Right Index Type |
| 202 | + |
| 203 | +The same `CouchbaseQueryVectorStore` class works with both Hyperscale and Composite Vector Indexes. The choice of which underlying index type to use is determined by the index you create on your Couchbase collection. |
| 204 | + |
| 205 | +| Feature | Hyperscale (via QueryVectorStore) | Composite (via QueryVectorStore) | Search (via SearchVectorStore) | |
| 206 | +| ------------------- | ------------------------------------ | -------------------------------- | ---------------------------------- | |
| 207 | +| **Index Type** | Hyperscale Vector Index | Composite Vector Index | Search Vector Index | |
| 208 | +| **Best For** | Pure vector searches | Vector + scalar filters | Vector + full-text + geospatial | |
| 209 | +| **Available Since** | Couchbase Server 8.0 | Couchbase Server 8.0 | Couchbase Server 7.6 | |
| 210 | +| **Scalar Handling** | Compared with vectors simultaneously | Pre-filters before vector search | Searches in parallel | |
| 211 | +| **Use Cases** | Content discovery, RAG, image search | Job search, compliance filtering | E-commerce, travel recommendations | |
| 212 | + |
| 213 | +For more information, refer to: [Couchbase Vector Search Documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) |
| 214 | + |
| 215 | +## License |
| 216 | + |
| 217 | +MIT |
0 commit comments