-
Notifications
You must be signed in to change notification settings - Fork 176
Ingest pipeline best practices #1381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
philippkahr
merged 37 commits into
elastic:main
from
philippkahr:best-practices-ingest-pipelines
Aug 20, 2025
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
a01a622
initial commit, let's see if it builds
philippkahr ce87300
this should help
philippkahr e42cd64
reworked the md
philippkahr 4ef0b86
Reworking the line breaks
philippkahr 0c7ed69
Reworked, grammar, whitespace, formatting
philippkahr 545f91b
wrong naming
philippkahr bd76756
Marius suggested to use append which makes more sense since it is an …
philippkahr ca4d379
fix typo
philippkahr afe8884
Added Timestamp in Logstash config
philippkahr 1afb592
add icons
colleenmcginnis 572bbcc
Update manage-data/ingest/transform-enrich/common-mistakes.md
philippkahr a10e333
Update common mistakes
philippkahr 9498933
miny fixes
philippkahr 3649496
Remove ingest lag from this PR
philippkahr 300fec7
Added tips
philippkahr 9b9cc55
Merge branch 'main' into best-practices-ingest-pipelines
colleenmcginnis bf922d8
fix build errors
colleenmcginnis cebd954
copy edits
colleenmcginnis 1313996
Merge pull request #1 from colleenmcginnis/cmcg-best-practices-ingest…
philippkahr d304058
Update general-tips.md
philippkahr 6c393aa
Update general-tips.md
philippkahr c0559b1
Update manage-data/ingest/transform-enrich/error-handling.md
philippkahr 95237a1
Update manage-data/ingest/transform-enrich/error-handling.md
philippkahr a020609
Update manage-data/ingest/transform-enrich/error-handling.md
philippkahr 85a1abe
Reworked it as suggested
philippkahr 4c334c5
Added notebox as suggested
philippkahr 1b0fbe4
Merge branch 'main' into best-practices-ingest-pipelines
colleenmcginnis 609696c
first round of edits
colleenmcginnis 6f7c288
remove content
colleenmcginnis 18a6185
Merge pull request #2 from colleenmcginnis/cmcg-best-practices-ingest…
philippkahr 7f5de7c
Added new fields API, reworked section
philippkahr 22b42a1
Update manage-data/ingest/transform-enrich/readable-maintainable-inge…
philippkahr 071be5f
Update manage-data/ingest/transform-enrich/readable-maintainable-inge…
philippkahr 9077e2d
Apply suggestions from code review
philippkahr 4a0eed0
Merge branch 'main' into best-practices-ingest-pipelines
colleenmcginnis c303973
Merge branch 'main' into best-practices-ingest-pipelines
colleenmcginnis e9efbb2
Merge branch 'main' into best-practices-ingest-pipelines
philippkahr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| --- | ||
| mapped_pages: | ||
| - https://www.elastic.co/docs/manage-data/ingest/transform-enrich/error-handling.html | ||
| applies_to: | ||
| stack: ga | ||
| serverless: ga | ||
| --- | ||
|
|
||
| # Error handling | ||
|
|
||
| Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively. | ||
|
|
||
| :::{important} | ||
| Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store. | ||
| ::: | ||
|
|
||
| Errors in ingest pipelines typically fall into the following categories: | ||
|
|
||
| - Parsing Errors: Occur when a processor fails to parse a field, such as a date or number. | ||
| - Missing Fields: Happen when a required field is absent in the document. | ||
|
|
||
| :::{tip} | ||
| Create an `error-handling-pipeline` that sets `event.kind` to `pipeline_error` and stores the error message, along with the tag from the failed processor, in the `error.message` field. Including a tag is especially helpful when using multiple `grok`, `dissect`, or `script` processors, as it helps identify which one caused the failure. | ||
| ::: | ||
|
|
||
| The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline. | ||
|
|
||
| ## Global vs. processor-specific | ||
|
|
||
| The following example demonstrates how to use the `on_failure` handler at the pipeline level rather than within individual processors. While this approach ensures the pipeline exits gracefully on failure, it also means that processing stops at the point of error. | ||
|
|
||
| In this example, a typo was made in the configuration of the `dissect` processor intended to extract `user.name` from the message. A comma (`,`) was used instead of the correct colon (`:`). | ||
|
|
||
| ```json | ||
| POST _ingest/pipeline/_simulate | ||
| { | ||
| "docs": [ | ||
| { | ||
| "_source": { | ||
| "@timestamp": "2025-04-03T10:00:00.000Z", | ||
| "message": "user: philipp has logged in" | ||
| } | ||
| } | ||
| ], | ||
| "pipeline": { | ||
| "processors": [ | ||
| { | ||
| "dissect": { | ||
| "field": "message", | ||
| "pattern": "%{}, %{user.name} %{}", | ||
| "tag": "dissect for user.name" | ||
| } | ||
| }, | ||
| { | ||
| "append": { | ||
| "field": "event.category", | ||
| "value": "authentication" | ||
| } | ||
| } | ||
| ], | ||
| "on_failure": [ | ||
| { | ||
| "set": { | ||
| "field": "event.kind", | ||
| "value": "pipeline_error" | ||
| } | ||
| }, | ||
| { | ||
| "append": { | ||
| "field": "error.message", | ||
| "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The second processor, which sets `event.category` to `authentication`, is no longer executed because the first `dissect` processor fails and triggers the global `on_failure` handler. The resulting document shows which processor caused the error, the pattern it attempted to apply, and the input it received. | ||
|
|
||
| ```json | ||
| "@timestamp": "2025-04-03T10:00:00.000Z", | ||
| "message": "user: philipp has logged in", | ||
| "event": { | ||
| "kind": "pipeline_error" | ||
| }, | ||
| "error": { | ||
| "message": "Processor dissect with tag dissect for user.name in pipeline _simulate_pipeline failed with message: Unable to find match for dissect pattern: %{}, %{user.name} %{} against source: user: philipp has logged in" | ||
| } | ||
| ``` | ||
|
|
||
| We can restructure the pipeline by moving the `on_failure` handling directly into the processor itself. This allows the pipeline to continue execution. In this case, the `event.category` processor still runs. You can also retain the global `on_failure` to handle errors from other processors, while adding processor-specific error handling where needed. | ||
|
|
||
| :::{note} | ||
| While executing two `set` processors within the `dissect` error handler may not always be ideal, it serves as a demonstration. | ||
| ::: | ||
|
|
||
| For the `dissect` processor, consider setting a temporary field like `_tmp.error: dissect_failure`. You can then use `if` conditions in later processors to execute them only if parsing failed, allowing for more controlled and flexible error handling. | ||
|
|
||
| ```json | ||
| POST _ingest/pipeline/_simulate | ||
| { | ||
| "docs": [ | ||
| { | ||
| "_source": { | ||
| "@timestamp": "2025-04-03T10:00:00.000Z", | ||
| "message": "user: philipp has logged in" | ||
| } | ||
| } | ||
| ], | ||
| "pipeline": { | ||
| "processors": [ | ||
| { | ||
| "dissect": { | ||
| "field": "message", | ||
| "pattern": "%{}, %{user.name} %{}", | ||
| "on_failure": [ | ||
| { | ||
| "set": { | ||
| "field": "event.kind", | ||
| "value": "pipeline_error" | ||
| } | ||
| }, | ||
| { | ||
| "append": { | ||
| "field": "error.message", | ||
| "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}" | ||
| } | ||
| } | ||
| ], | ||
| "tag": "dissect for user.name" | ||
| } | ||
| }, | ||
| { | ||
| "append": { | ||
| "field": "event.category", | ||
| "value": "authentication" | ||
| } | ||
| } | ||
| ], | ||
| "on_failure": [ | ||
| { | ||
| "set": { | ||
| "field": "event.kind", | ||
| "value": "pipeline_error" | ||
| } | ||
| }, | ||
| { | ||
| "set": { | ||
| "field": "error.message", | ||
| "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore, but IME when users want global error handling they also want to reroute documents to a different failure index which might be nice to include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mention this anymore and just wait for the failure stream feature coming out at some point in time, and then we have people who manually implemented it and our out-of-the-box solution which is then conflicting with each other.