Skip to content

Commit 67636a7

Browse files
Ingest pipeline best practices (#1381)
based on the discussions here: #1052 this is my first PR against the docs, and I am building a couple of new pages. I think it makes sense to split it out. I am putting it into that part of the docs. https://www.elastic.co/docs/manage-data/ingest/transform-enrich/ingest-pipelines The tips and tricks are generic and not specific to just o11y, or security. <img width="1776" alt="image" src="https://github.com/user-attachments/assets/f5254116-7170-42b1-911d-1387605fbb8b" /> --------- Co-authored-by: Colleen McGinnis <[email protected]> Co-authored-by: Colleen McGinnis <[email protected]>
1 parent 25de97a commit 67636a7

File tree

5 files changed

+844
-0
lines changed

5 files changed

+844
-0
lines changed

manage-data/images/icon-check.svg

Lines changed: 1 addition & 0 deletions
Loading

manage-data/images/icon-cross.svg

Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
mapped_pages:
3+
- https://www.elastic.co/docs/manage-data/ingest/transform-enrich/error-handling.html
4+
applies_to:
5+
stack: ga
6+
serverless: ga
7+
---
8+
9+
# Error handling
10+
11+
Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively.
12+
13+
:::{important}
14+
Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.
15+
:::
16+
17+
Errors in ingest pipelines typically fall into the following categories:
18+
19+
- Parsing Errors: Occur when a processor fails to parse a field, such as a date or number.
20+
- Missing Fields: Happen when a required field is absent in the document.
21+
22+
:::{tip}
23+
Create an `error-handling-pipeline` that sets `event.kind` to `pipeline_error` and stores the error message, along with the tag from the failed processor, in the `error.message` field. Including a tag is especially helpful when using multiple `grok`, `dissect`, or `script` processors, as it helps identify which one caused the failure.
24+
:::
25+
26+
The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline.
27+
28+
## Global vs. processor-specific
29+
30+
The following example demonstrates how to use the `on_failure` handler at the pipeline level rather than within individual processors. While this approach ensures the pipeline exits gracefully on failure, it also means that processing stops at the point of error.
31+
32+
In this example, a typo was made in the configuration of the `dissect` processor intended to extract `user.name` from the message. A comma (`,`) was used instead of the correct colon (`:`).
33+
34+
```json
35+
POST _ingest/pipeline/_simulate
36+
{
37+
"docs": [
38+
{
39+
"_source": {
40+
"@timestamp": "2025-04-03T10:00:00.000Z",
41+
"message": "user: philipp has logged in"
42+
}
43+
}
44+
],
45+
"pipeline": {
46+
"processors": [
47+
{
48+
"dissect": {
49+
"field": "message",
50+
"pattern": "%{}, %{user.name} %{}",
51+
"tag": "dissect for user.name"
52+
}
53+
},
54+
{
55+
"append": {
56+
"field": "event.category",
57+
"value": "authentication"
58+
}
59+
}
60+
],
61+
"on_failure": [
62+
{
63+
"set": {
64+
"field": "event.kind",
65+
"value": "pipeline_error"
66+
}
67+
},
68+
{
69+
"append": {
70+
"field": "error.message",
71+
"value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"
72+
}
73+
}
74+
]
75+
}
76+
}
77+
```
78+
79+
The second processor, which sets `event.category` to `authentication`, is no longer executed because the first `dissect` processor fails and triggers the global `on_failure` handler. The resulting document shows which processor caused the error, the pattern it attempted to apply, and the input it received.
80+
81+
```json
82+
"@timestamp": "2025-04-03T10:00:00.000Z",
83+
"message": "user: philipp has logged in",
84+
"event": {
85+
"kind": "pipeline_error"
86+
},
87+
"error": {
88+
"message": "Processor dissect with tag dissect for user.name in pipeline _simulate_pipeline failed with message: Unable to find match for dissect pattern: %{}, %{user.name} %{} against source: user: philipp has logged in"
89+
}
90+
```
91+
92+
We can restructure the pipeline by moving the `on_failure` handling directly into the processor itself. This allows the pipeline to continue execution. In this case, the `event.category` processor still runs. You can also retain the global `on_failure` to handle errors from other processors, while adding processor-specific error handling where needed.
93+
94+
:::{note}
95+
While executing two `set` processors within the `dissect` error handler may not always be ideal, it serves as a demonstration.
96+
:::
97+
98+
For the `dissect` processor, consider setting a temporary field like `_tmp.error: dissect_failure`. You can then use `if` conditions in later processors to execute them only if parsing failed, allowing for more controlled and flexible error handling.
99+
100+
```json
101+
POST _ingest/pipeline/_simulate
102+
{
103+
"docs": [
104+
{
105+
"_source": {
106+
"@timestamp": "2025-04-03T10:00:00.000Z",
107+
"message": "user: philipp has logged in"
108+
}
109+
}
110+
],
111+
"pipeline": {
112+
"processors": [
113+
{
114+
"dissect": {
115+
"field": "message",
116+
"pattern": "%{}, %{user.name} %{}",
117+
"on_failure": [
118+
{
119+
"set": {
120+
"field": "event.kind",
121+
"value": "pipeline_error"
122+
}
123+
},
124+
{
125+
"append": {
126+
"field": "error.message",
127+
"value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"
128+
}
129+
}
130+
],
131+
"tag": "dissect for user.name"
132+
}
133+
},
134+
{
135+
"append": {
136+
"field": "event.category",
137+
"value": "authentication"
138+
}
139+
}
140+
],
141+
"on_failure": [
142+
{
143+
"set": {
144+
"field": "event.kind",
145+
"value": "pipeline_error"
146+
}
147+
},
148+
{
149+
"set": {
150+
"field": "error.message",
151+
"value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"
152+
}
153+
}
154+
]
155+
}
156+
}
157+
```

0 commit comments

Comments
 (0)