Skip to content

Commit 59313d7

Browse files
authored
Add Hyperscale and Composite Vector Indexes support for Couchbase vector-store (#20170)
1 parent 4f7d867 commit 59313d7

File tree

7 files changed

+1726
-169
lines changed

7 files changed

+1726
-169
lines changed
Lines changed: 215 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,217 @@
1-
# LlamaIndex Vector_Stores Integration: Couchbase
1+
# LlamaIndex Vector Stores Integration: Couchbase
2+
3+
This package provides Couchbase vector store integrations for LlamaIndex, offering multiple implementation options for vector similarity search based on Couchbase Server's native vector indexing capabilities.
4+
5+
## Installation
6+
7+
```bash
8+
pip install llama-index-vector-stores-couchbase
9+
```
10+
11+
## Available Vector Store Classes
12+
13+
### CouchbaseSearchVectorStore
14+
15+
Implements [Search Vector Indexes](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) using Couchbase Full-Text Search (FTS) with vector search capabilities. Ideal for hybrid searches combining vector, full-text, and geospatial searches.
16+
17+
### CouchbaseQueryVectorStore (Recommended)
18+
19+
Implements both [Hyperscale Vector Indexes](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) and [Composite Vector Indexes](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html) using Couchbase Query Service with SQL++ and vector search functions. Supports:
20+
21+
- **Hyperscale Vector Indexes**: Purpose-built for pure vector searches at massive scale with minimal memory footprint
22+
- **Composite Vector Indexes**: Best for combining vector similarity with scalar filters that exclude large portions of the dataset
23+
24+
Can scale to billions of documents. Requires Couchbase Server 8.0+.
25+
26+
### CouchbaseVectorStore (Deprecated)
227

328
> **Note:** `CouchbaseVectorStore` has been deprecated in version 0.4.0. Please use `CouchbaseSearchVectorStore` instead.
29+
30+
## Requirements
31+
32+
- Python >= 3.9, < 4.0
33+
- Couchbase Server 7.6+ for Search Vector Indexes
34+
- Couchbase Server 8.0+ for Hyperscale and Composite Vector Indexes
35+
- couchbase >= 4.5.0
36+
37+
## Basic Usage
38+
39+
### Using CouchbaseSearchVectorStore (Search Vector Indexes)
40+
41+
```python
42+
from llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore
43+
from couchbase.cluster import Cluster
44+
from couchbase.auth import PasswordAuthenticator
45+
46+
# Connect to Couchbase
47+
auth = PasswordAuthenticator("username", "password")
48+
cluster = Cluster("couchbase://localhost", auth)
49+
50+
# Initialize vector store
51+
vector_store = CouchbaseSearchVectorStore(
52+
cluster=cluster,
53+
bucket_name="my_bucket",
54+
scope_name="my_scope",
55+
collection_name="my_collection",
56+
index_name="my_vector_index",
57+
text_key="text",
58+
embedding_key="embedding",
59+
metadata_key="metadata",
60+
scoped_index=True,
61+
)
62+
```
63+
64+
### Using CouchbaseQueryVectorStore (Hyperscale & Composite Vector Indexes)
65+
66+
```python
67+
from llama_index.vector_stores.couchbase import (
68+
CouchbaseQueryVectorStore,
69+
QueryVectorSearchType,
70+
QueryVectorSearchSimilarity,
71+
)
72+
73+
# Initialize Query Service-based vector store
74+
# Works with both Hyperscale Vector Indexes (pure vector search)
75+
# and Composite Vector Indexes (vector + scalar filters)
76+
vector_store = CouchbaseQueryVectorStore(
77+
cluster=cluster,
78+
bucket_name="my_bucket",
79+
scope_name="my_scope",
80+
collection_name="my_collection",
81+
search_type=QueryVectorSearchType.ANN, # or QueryVectorSearchType.KNN
82+
similarity=QueryVectorSearchSimilarity.COSINE, # Can also use string: "cosine", "euclidean", "dot_product"
83+
nprobes=10, # Optional: number of probes for ANN search (only for ANN)
84+
text_key="text",
85+
embedding_key="embedding",
86+
metadata_key="metadata",
87+
)
88+
```
89+
90+
## Configuration Options
91+
92+
### Search Types
93+
94+
The `QueryVectorSearchType` enum defines the type of vector search to perform:
95+
96+
- `QueryVectorSearchType.ANN` - Approximate Nearest Neighbor (recommended for large datasets)
97+
- `QueryVectorSearchType.KNN` - K-Nearest Neighbor (exact search)
98+
99+
### Similarity Metrics
100+
101+
The `QueryVectorSearchSimilarity` enum provides various distance metrics:
102+
103+
- `QueryVectorSearchSimilarity.COSINE` - Cosine similarity (range: -1 to 1)
104+
- `QueryVectorSearchSimilarity.DOT` - Dot product similarity
105+
- `QueryVectorSearchSimilarity.L2` or `EUCLIDEAN` - Euclidean distance
106+
- `QueryVectorSearchSimilarity.L2_SQUARED` or `EUCLIDEAN_SQUARED` - Squared Euclidean distance
107+
108+
You can also use lowercase strings: `"cosine"`, `"dot_product"`, `"euclidean"`, etc.
109+
110+
## Features
111+
112+
- **Multiple Index Types**: Support for all three Couchbase vector index types:
113+
- Hyperscale Vector Indexes (Query Service-based, 8.0+)
114+
- Composite Vector Indexes (Query Service-based, 8.0+)
115+
- Search Vector Indexes (FTS-based, 7.6+)
116+
- **Flexible Similarity Metrics**: Multiple distance metrics including:
117+
- COSINE (Cosine similarity)
118+
- DOT (Dot product)
119+
- L2 / EUCLIDEAN (Euclidean distance)
120+
- L2_SQUARED / EUCLIDEAN_SQUARED (Squared Euclidean distance)
121+
- **Metadata Filtering**: Advanced filtering capabilities using LlamaIndex MetadataFilters
122+
- **Batch Operations**: Efficient batch insertion with configurable batch sizes
123+
- **High Performance**: ANN and KNN search support for efficient nearest neighbor queries
124+
- **Massive Scalability**: Hyperscale and Composite indexes can scale to billions of documents
125+
126+
## Implementation Details
127+
128+
### Query Service-Based Vector Indexes (`CouchbaseQueryVectorStore`)
129+
130+
`CouchbaseQueryVectorStore` supports both **Hyperscale Vector Indexes** and **Composite Vector Indexes**, which use the Couchbase Query Service with SQL++ queries and vector search functions.
131+
132+
#### Hyperscale Vector Indexes
133+
134+
Purpose-built for pure vector searches at massive scale:
135+
136+
**When to Use:**
137+
138+
- Pure vector similarity searches without complex scalar filtering
139+
- Content discovery, recommendations, reverse image search
140+
- Chatbot context matching (e.g., RAG workflows)
141+
- Anomaly detection in IoT sensor networks
142+
- Datasets from tens of millions to billions of documents
143+
144+
**Key Characteristics:**
145+
146+
- Optimized specifically for vector searches
147+
- Higher accuracy at lower quantizations
148+
- Low memory footprint (most index data on disk)
149+
- Best TCO for huge datasets
150+
- Excellent for concurrent updates and searches
151+
- Scalar values and vectors compared simultaneously
152+
153+
#### Composite Vector Indexes
154+
155+
Combine a Global Secondary Index (GSI) with vector search functions:
156+
157+
**When to Use:**
158+
159+
- Searches that combine vector similarity with scalar filters
160+
- When scalar filters can exclude large portions (>20%) of the dataset
161+
- Applications requiring compliance-based restrictions on results
162+
- Content recommendations, job searches, supply chain management
163+
- Datasets from tens of millions to billions of documents
164+
165+
**Key Characteristics:**
166+
167+
- Scalar filters are applied _before_ vector search, reducing vectors to compare
168+
- Efficient when scalar values have low selectivity (exclude <20% of dataset)
169+
- Can exclude nearest neighbors based on scalar values (useful for compliance)
170+
- Can scale to billions of documents
171+
172+
#### Search Types (Both Hyperscale & Composite)
173+
174+
- **ANN (Approximate Nearest Neighbor)**: Faster approximate search with configurable `nprobes` parameter for accuracy/speed tradeoff
175+
- **KNN (K-Nearest Neighbor)**: Exact nearest neighbor search for maximum accuracy
176+
177+
### Search Vector Indexes (`CouchbaseSearchVectorStore`)
178+
179+
Search Vector Indexes combine Full-Text Search (FTS) with vector search capabilities:
180+
181+
**When to Use:**
182+
183+
- Hybrid searches combining vector, full-text, and geospatial searches
184+
- Applications like e-commerce product search, travel recommendations, or real estate searches
185+
- Datasets up to tens of millions of documents
186+
187+
**Key Characteristics:**
188+
189+
- Combines semantic search with keyword and geospatial searches in a single query
190+
- Supports both scoped and global indexes
191+
- Ideal for multi-modal search scenarios
192+
193+
### Metadata Filtering
194+
195+
Both implementations support metadata filtering:
196+
197+
- Filter by document attributes using standard LlamaIndex `MetadataFilters`
198+
- Supports operators: `==`, `!=`, `>`, `<`, `>=`, `<=`, `IN`, `NIN`
199+
- Combine filters with `AND`/`OR` conditions
200+
201+
### Choosing the Right Index Type
202+
203+
The same `CouchbaseQueryVectorStore` class works with both Hyperscale and Composite Vector Indexes. The choice of which underlying index type to use is determined by the index you create on your Couchbase collection.
204+
205+
| Feature | Hyperscale (via QueryVectorStore) | Composite (via QueryVectorStore) | Search (via SearchVectorStore) |
206+
| ------------------- | ------------------------------------ | -------------------------------- | ---------------------------------- |
207+
| **Index Type** | Hyperscale Vector Index | Composite Vector Index | Search Vector Index |
208+
| **Best For** | Pure vector searches | Vector + scalar filters | Vector + full-text + geospatial |
209+
| **Available Since** | Couchbase Server 8.0 | Couchbase Server 8.0 | Couchbase Server 7.6 |
210+
| **Scalar Handling** | Compared with vectors simultaneously | Pre-filters before vector search | Searches in parallel |
211+
| **Use Cases** | Content discovery, RAG, image search | Job search, compliance filtering | E-commerce, travel recommendations |
212+
213+
For more information, refer to: [Couchbase Vector Search Documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html)
214+
215+
## License
216+
217+
MIT
Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,19 @@
1+
"""Couchbase vector stores."""
2+
13
from llama_index.vector_stores.couchbase.base import (
2-
CouchbaseVectorStore,
3-
CouchbaseSearchVectorStore,
4+
CouchbaseVectorStore, # Deprecated
5+
CouchbaseSearchVectorStore, # FTS-based
6+
CouchbaseQueryVectorStore, # GSI-based with BHIVE support
7+
CouchbaseVectorStoreBase, # Base class
8+
QueryVectorSearchType, # Enum for search types
9+
QueryVectorSearchSimilarity, # Enum for similarity metrics
410
)
511

6-
7-
__all__ = ["CouchbaseVectorStore", "CouchbaseSearchVectorStore"]
12+
__all__ = [
13+
"CouchbaseVectorStore",
14+
"CouchbaseSearchVectorStore",
15+
"CouchbaseQueryVectorStore",
16+
"CouchbaseVectorStoreBase",
17+
"QueryVectorSearchType",
18+
"QueryVectorSearchSimilarity",
19+
]

0 commit comments

Comments
 (0)