elastic · thecoop · Nov 7, 2025 · thecoop · Nov 7, 2025
@@ -51,6 +51,7 @@ Another option is to use  [synthetic `_source`](elasticsearch://reference/elasti
 Here are estimates for different element types and quantization levels:
 
 * `element_type: float`: `num_vectors * num_dimensions * 4`
+* `element_type: bfloat16`: `num_vectors * num_dimensions * 2`
 * `element_type: float` with `quantization: int8`: `num_vectors * (num_dimensions + 4)`
 * `element_type: float` with `quantization: int4`: `num_vectors * (num_dimensions/2 + 4)`
 * `element_type: float` with `quantization: bbq`: `num_vectors * (num_dimensions/8 + 14)`
@@ -122,13 +123,11 @@ You can check the current value in `KiB` using `lsblk -o NAME,RA,MOUNTPOINT,TYPE
 ::::
 
 
-## Use Direct IO when the vector data does not fit in RAM
+## Use on-disk rescoring when the vector data does not fit in RAM
 ```{applies_to}
-stack: preview 9.1
+stack: ga 9.3
 serverless: unavailable
 ```
-If your indices are of type `bbq_hnsw` and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float32 vectors.
+If you use quantized indices and your nodes don't have enough off-heap RAM to store all vector data in memory, you may see very high query latencies. Vector data includes the HNSW graph, quantized vectors, and raw float vectors.
 
-In these scenarios, direct IO can significantly reduce query latency. Enable it by setting the JVM option `vector.rescoring.directio=true` on all vector search nodes in your cluster.
-
-Only use this option if you're experiencing very high query latencies on indices of type `bbq_hnsw`. Otherwise, enabling direct IO may increase your query latencies.
+In these scenarios, on-disk rescoring can significantly reduce query latency. Enable it by setting the `on_disk_rescore: true` option on your vector indices. Note that your data will need to be re-indexed or force-merged to use the new setting in subsequent searches.
@@ -325,6 +325,15 @@ POST quantized-image-index/_search
 }
 ```
 
+### BFloat16 vector encoding [knn-search-bfloat16]
+```
+{applies_to}
+stack: ga 9.3
+```
+Instead of storing raw vectors as 4-byte values, you can use `element_type: bfloat16` to store each dimension as a 2-byte value. This can be useful if your indexed vectors are at bfloat16 precision already, or if you wish to reduce the disk space required to store vector data. Elasticsearch will automatically truncate 4-byte float values to 2-byte bfloat16 values when indexing vectors.
+
+Due to the reduced precision of bfloat16, any vectors retrieved from the index may have slightly different values to those originally indexed.
+
 ### Filtered kNN search [knn-search-filter-example]
 
 The kNN search API supports restricting vector similarity search with a filter. The request returns the top `k` nearest neighbors that also satisfy the filter query, enabling targeted, pre-filtered approximate kNN in {{es}}.
@@ -1227,6 +1236,16 @@ This example will:
 * Return the top 10 (`k`) rescored candidates.
 * Merge the rescored candidates from all shards, and return the top 10 (`k`) results.
 
+#### The `on_disk_rescore` option
+```{applies_to}
+stack: ga 9.3
+serverless: unavailable
+```
+
+By default, Elasticsearch will read raw vector data into memory to perform rescoring. This may have an effect on performance if the vector data is too large to all fit in off-heap memory at once. By specifying the `on_disk_rescore: true` index setting, Elasticsearch will read vector data from disk directly during rescoring.
+
+Note that this setting will only apply to newly indexed vectors; to apply the option to all vectors in the index, the vectors must be re-indexed or force-merged after changing the setting.
+
 #### Additional rescoring techniques [dense-vector-knn-search-rescoring-rescore-additional]
 
 The following sections provide additional ways of rescoring: