Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ Another option is to use [synthetic `_source`](elasticsearch://reference/elasti
Here are estimates for different element types and quantization levels:

* `element_type: float`: `num_vectors * num_dimensions * 4`
* `element_type: bfloat16`: `num_vectors * num_dimensions * 2`
* `element_type: float` with `quantization: int8`: `num_vectors * (num_dimensions + 4)`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check this with bfloat16 + quantization

* `element_type: float` with `quantization: int4`: `num_vectors * (num_dimensions/2 + 4)`
* `element_type: float` with `quantization: bbq`: `num_vectors * (num_dimensions/8 + 14)`
Expand Down Expand Up @@ -122,13 +123,11 @@ You can check the current value in `KiB` using `lsblk -o NAME,RA,MOUNTPOINT,TYPE
::::


## Use Direct IO when the vector data does not fit in RAM
## Use on-disk rescoring when the vector data does not fit in RAM
```{applies_to}
stack: preview 9.1
stack: ga 9.3
serverless: unavailable
```
If your indices are of type `bbq_hnsw` and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float32 vectors.
If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.

In these scenarios, direct IO can significantly reduce query latency. Enable it by setting the JVM option `vector.rescoring.directio=true` on all vector search nodes in your cluster.

Only use this option if you're experiencing very high query latencies on indices of type `bbq_hnsw`. Otherwise, enabling direct IO may increase your query latencies.
In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.
19 changes: 19 additions & 0 deletions solutions/search/vector/knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,15 @@ POST quantized-image-index/_search
}
```

### BFloat16 vector encoding [knn-search-bfloat16]
```
{applies_to}
stack: ga 9.3
```
Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.

Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.

### Filtered kNN search [knn-search-filter-example]

The kNN search API supports restricting vector similarity search with a filter. The request returns the top `k` nearest neighbors that also satisfy the filter query, enabling targeted, pre-filtered approximate kNN in {{es}}.
Expand Down Expand Up @@ -1227,6 +1236,16 @@ This example will:
* Return the top 10 (`k`) rescored candidates.
* Merge the rescored candidates from all shards, and return the top 10 (`k`) results.

#### The `on_disk_rescore` option
```{applies_to}
stack: ga 9.3
serverless: unavailable
```

By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.

Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.

#### Additional rescoring techniques [dense-vector-knn-search-rescoring-rescore-additional]

The following sections provide additional ways of rescoring:
Expand Down