Skip to content

Commit ccc062e

Browse files
committed
Add guide on optimizing indexing performance
1 parent 33ef2db commit ccc062e

File tree

1 file changed

+101
-0
lines changed

1 file changed

+101
-0
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
title: Optimize indexing performance by analyzing batch statistics
3+
description: Learn how to analyze the `progressTrace` to identify and resolve indexing bottlenecks in Meilisearch.
4+
---
5+
6+
# Optimize indexing performance by analyzing batch statistics
7+
8+
Indexing performance can vary significantly depending on your dataset, index settings, and hardware. The [batch object](/reference/api/batches) provides information about the progress of asynchronous indexing operations.
9+
10+
The `progressTrace` field within the batch object offers a detailed breakdown of where time is spent during the indexing process. By analyzing this data, you can identify bottlenecks and adjust configuration settings to improve indexing speed.
11+
12+
## Understanding the `progressTrace`
13+
14+
The `progressTrace` is a hierarchical trace showing each phase of indexing and how long it took.
15+
Each entry follows the structure:
16+
17+
```json
18+
"processing tasks > indexing > extracting word proximity": "33.71s"
19+
```
20+
21+
This means:
22+
23+
- The step occurred during **indexing**.
24+
- The subtask was **extracting word proximity**.
25+
- It took **33.71 seconds**.
26+
27+
Your goal is to focus on the **longest-running steps** and understand which index settings or data characteristics influence them.
28+
29+
## Key phases and how to optimize them
30+
31+
### Document processing
32+
33+
| Trace key | Description | Optimization |
34+
|------------|--------------|--------------|
35+
| `computing document changes`, `extracting documents` | Meilisearch compares incoming documents to existing ones. | No direct optimization possible. The duration scales with the number and size of incoming documents.|
36+
37+
### Filterable attributes
38+
39+
| Trace key | Description | Optimization |
40+
|------------|--------------|--------------|
41+
| `extracting facets`, `merging facet caches` | Extracts and merges filterable attributes. | Keep the number of [**filterable attributes**](/reference/api/settings#filterable-attributes) to a minimum. |
42+
43+
### Searchable attributes
44+
45+
| Trace key | Description | Optimization |
46+
|------------|--------------|--------------|
47+
| `extracting words`, `merging word caches` | Tokenizes text and builds the inverted index. | - Ensure the [**searchable attributes**](/reference/api/settings#searchable-attributes) list includes only the fields you want to be checked for query word matches. |
48+
49+
### Proximity precision
50+
51+
| Trace key | Description | Optimization |
52+
|------------|--------------|--------------|
53+
| `extracting word proximity`, `merging word proximity` | Builds the data structures for phrase and attribute ranking. | Lower the precision of this operation by setting [proximity precision](/reference/api/settings#proximity-precision) to `byAttribute` instead of the default `byWord`|
54+
55+
### Disk I/O and hardware bottlenecks
56+
57+
| Trace key | Description | Optimization |
58+
|------------|--------------|--------------|
59+
| `waiting for database writes` | Time spent writing data to disk. | No direct optimization possible. Either the disk is slow, either the quantity of data to write is big. Avoid HDDs (Hard Disk Drives). |
60+
| `waiting for extractors` | Time spent waiting for CPU-bound extraction. | No direct optimization possible. Indicates a CPU bottleneck. Use more cores or scale horizontally with [sharding](/learn/advanced/sharding). |
61+
62+
### Facets and filterable attributes
63+
64+
| Trace key | Description | Optimization |
65+
|------------|--------------|--------------|
66+
| `post processing facets > strings bulk` / `numbers bulk` | Processes equality or comparison filters. | - Disable unused [**filter features**](/reference/api/settings#features), such as comparison operators on string values. <br/>- Keep [**sortable attributes**](reference/api/settings#sortable-attributes) to the minimum required. |
67+
| `post processing facets > facet search` | Builds structures for the [facet search API](/reference/api/facet_search). | If you don’t use the facet search API, [disable it](/reference/api/settings#update-facet-search-settings).|
68+
69+
### Embeddings
70+
71+
| Trace key | Description | Optimization |
72+
|------------|--------------|--------------|
73+
| `writing embeddings to database` | Time spent saving vector embeddings. | - Use smaller embedding vectors when possible. <br/>- You can avoid recomputing embeddings on document update by [disabling embedding regeneration](/reference/api/documents#vectors). <br/>- Consider enabling [binary quantization](/reference/api/settings#binaryquantized) for your embedders. |
74+
75+
### Word prefixes and post-processing
76+
77+
| Trace key | Description | Optimization |
78+
|------------|--------------|--------------|
79+
| `post processing words > word prefix *` | Builds prefix data for autocomplete. Allows to match documents that begin with a specific query term, instead of only exact matches.| Disable [**prefix search**](/reference/api/settings#prefix-search) (`prefixSearch: disabled`) if not required. <br/> Note that this can severely impact search result relevancy. |
80+
| `post processing words > word fst` | Builds the word FST (finite state transducer). | No direct action possible, as it depends on the number of different words in the database. Fewer searchable words can improve speed. |
81+
82+
## Example analysis
83+
84+
If you see:
85+
86+
```json
87+
"processing tasks > indexing > post processing facets > facet search": "1763.06s"
88+
```
89+
90+
The [facet search feature](/learn/filtering_and_sorting/search_with_facet_filters#searching-facet-values) is consuming significant time. If your application doesn’t use it, disable it:
91+
92+
```
93+
client.index('INDEX_NAME').updateFacetSearch(false);
94+
```
95+
96+
## Learn more
97+
98+
- [Indexing best practices](/learn/indexing/indexing_best_practices)
99+
- [Impact of RAM and multi-threading on indexing performance
100+
](/learn/indexing/ram_multithreading_performance)
101+
- [Configuring index settings](/learn/configuration/configuring_index_settings)

0 commit comments

Comments
 (0)