Skip to content

Commit 7f5de7c

Browse files
committed
Added new fields API, reworked section
1 parent 18a6185 commit 7f5de7c

File tree

1 file changed

+172
-43
lines changed

1 file changed

+172
-43
lines changed

manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md

Lines changed: 172 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
---
23
mapped_pages:
34
- https://www.elastic.co/docs/manage-data/ingest/transform-enrich/common-mistakes.html
@@ -18,75 +19,81 @@ This guide does not provide guidance on optimizing for ingest pipeline performan
1819

1920
When creating ingest pipelines, there are are few options for accessing fields in conditional statements and scripts. All formats can be used to reference fields, so choose the one that makes your pipeline easier to read and maintain.
2021

21-
| Notation | Example | Notes |
22-
|---|---|---|
23-
| Dot notation | `ctx.event.action` | Supported in conditionals and painless scripts. |
24-
| Square bracket notation | `ctx['event']['action']` | Supported in conditionals and painless scripts. |
25-
| Mixed dot and bracket notation | `ctx.event['action']` | Supported in conditionals and painless scripts. |
22+
| Notation | Example | Notes |
23+
| ------------------------------ | ----------------------------------------------------- | ------------------------------------------------------------------------------- |
24+
| Dot notation | `ctx.event.action` | Supported in conditionals and painless scripts. |
25+
| Square bracket notation | `ctx['event']['action']` | Supported in conditionals and painless scripts. |
26+
| Mixed dot and bracket notation | `ctx.event['action']` | Supported in conditionals and painless scripts. |
27+
| Field API | `field('event.action', '')` or `$('event.action','')` | Supported in conditionals and painless scripts. Only available in versions 9.2+ |
28+
| Field API | `field('event.action', '')` or `$('event.action','')` | Supported only in painless scripts. |
2629

2730
Below are some general guidelines for choosing the right option in a situation.
2831

32+
### Field API
33+
34+
Starting with version [9.2](https://github.com/elastic/elasticsearch/pull/131581) we have access to the field API that enables the usage of this API in conditionals (the `if` statement of your processor). Otherwise you can always use the field API in the script processor itself.
35+
36+
:::{note}
37+
This is the preferred way to access fields.
38+
:::
39+
40+
**Benefits**
41+
42+
- Clean and easy to read
43+
- Handles null values automatically
44+
- Adds support for additional functions like `isEmpty()` to ease comparisions.
45+
- Handles dots as part of field name
46+
- Handles dots as dot walking for object notation
47+
- Handles special characters.
48+
49+
**Limitations**
50+
51+
- Only available starting in 9.2 for conditionals.
52+
2953
### Dot notation [dot-notation]
3054

31-
**Benefits:**
32-
* Clean and easy to read.
33-
* Supports null safety operations `?`. Read more in [Use null safe operators (`?.`)](#null-safe-operators).
55+
**Benefits**
56+
57+
- Clean and easy to read.
58+
- Supports null safety operations `?`. Read more in [Use null safe operators (`?.`)](#null-safe-operators).
3459

3560
**Limitations**
36-
* Does not support field names that contain a `.` or any special characters such as `@`.
61+
62+
- Does not support field names that contain a `.` or any special characters such as `@`.
3763
Use [Bracket notation](#bracket-notation) instead.
3864

3965
### Bracket notation [bracket-notation]
4066

41-
**Benefits:**
42-
* Supports special characters such as `@` in the field name.
67+
**Benefits**
68+
69+
- Supports special characters such as `@` in the field name.
4370
For example, if there's a field name called `has@!%&chars`, you would use `ctx['has@!%&chars']`.
44-
* Supports field names that contain `.`.
71+
- Supports field names that contain `.`.
4572
For example, if there's a field named `foo.bar`, if you used `ctx.foo.bar` it will try to access the field `bar` in the object `foo` in the object `ctx`. If you used `ctx['foo.bar']` it can access the field directly.
4673

47-
**Limitations:**
48-
* Slightly more verbose than dot notation.
49-
* No support for null safety operations `?`.
74+
**Limitations**
75+
76+
- Slightly more verbose than dot notation.
77+
- No support for null safety operations `?`.
5078
Use [Dot notation](#dot-notation) instead.
5179

5280
### Mixed dot and bracket notation
5381

54-
**Benefits:**
55-
* You can also mix dot notation and bracket notation to take advantage of the benefits of both formats.
82+
**Benefits**
83+
84+
- You can also mix dot notation and bracket notation to take advantage of the benefits of both formats.
5685
For example, you could use `ctx.my.nested.object['has@!%&chars']`. Then you can use the `?` operator on the fields using dot notation while still accessing a field with a name that contains special characters: `ctx.my?.nested?.object['has@!%&chars']`.
5786

58-
**Limitations:**
59-
* Slightly more difficult to read.
87+
**Limitations**
6088

89+
- Slightly more difficult to read.
6190

6291
## Write concise conditionals (`if` statements) [conditionals]
6392

6493
Use conditionals (`if` statements) to ensure that an ingest pipeline processor is only applied when specific conditions are met.
6594

6695
% In an ingest pipeline, when working with conditionals inside processors. The topic around error processing is a bit more complex, most importantly any errors that are coming from null values, missing keys, missing values, inside the conditional, will lead to an error that is not captured by the `ignore_failure` handler and will exit the pipeline.
6796

68-
### Avoid excessive OR conditions
69-
70-
When using the [boolean OR operator](elasticsearch://reference/scripting-languages/painless/painless-operators-boolean.md#boolean-or-operator) (`||`), `if` conditions can become unnecessarily complex and difficult to maintain, especially when chaining many OR checks. Instead, consider using array-based checks like `.contains()` to simplify your logic and improve readability.
71-
72-
#### ![ ](../../images/icon-cross.svg) **Don't**: Run many ORs
73-
74-
```painless
75-
"if": "ctx?.kubernetes?.container?.name == 'admin' || ctx?.kubernetes?.container?.name == 'def'
76-
|| ctx?.kubernetes?.container?.name == 'demo' || ctx?.kubernetes?.container?.name == 'acme'
77-
|| ctx?.kubernetes?.container?.name == 'wonderful'
78-
```
79-
80-
#### ![ ](../../images/icon-check.svg) **Do**: Use contains to compare
81-
82-
```painless
83-
["admin","def","demo","acme","wonderful"].contains(ctx.kubernetes?.container?.name)
84-
```
85-
86-
:::{tip}
87-
This example only checks for exact matches. Do not use this approach if you need to check for partial matches.
88-
:::
89-
9097
### Use null safe operators (`?.`) [null-safe-operators]
9198

9299
Anticipate potential problems with the data, and use the [null safe operator](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) (`?.`) to prevent data from being processed incorrectly.
@@ -266,6 +273,29 @@ POST _ingest/pipeline/_simulate
266273
}
267274
}
268275
```
276+
277+
:::
278+
279+
### Avoid excessive OR conditions
280+
281+
When using the [boolean OR operator](elasticsearch://reference/scripting-languages/painless/painless-operators-boolean.md#boolean-or-operator) (`||`), `if` conditions can become unnecessarily complex and difficult to maintain, especially when chaining many OR checks. Instead, consider using array-based checks like `.contains()` to simplify your logic and improve readability.
282+
283+
#### ![ ](../../images/icon-cross.svg) **Don't**: Run many ORs
284+
285+
```painless
286+
"if": "ctx?.kubernetes?.container?.name == 'admin' || ctx?.kubernetes?.container?.name == 'def'
287+
|| ctx?.kubernetes?.container?.name == 'demo' || ctx?.kubernetes?.container?.name == 'acme'
288+
|| ctx?.kubernetes?.container?.name == 'wonderful'
289+
```
290+
291+
#### ![ ](../../images/icon-check.svg) **Do**: Use contains to compare
292+
293+
```painless
294+
["admin","def","demo","acme","wonderful"].contains(ctx.kubernetes?.container?.name)
295+
```
296+
297+
:::{tip}
298+
This example only checks for exact matches. Do not use this approach if you need to check for partial matches.
269299
:::
270300

271301
## Convert mb/gb values to bytes
@@ -345,10 +375,78 @@ The [rename processor](elasticsearch://reference/enrich-processor/rename-process
345375
- `ignore_missing`: Useful when you are not sure that the field you want to rename exists.
346376
- `ignore_failure`: Helps with any failures encountered. For example, the rename processor can only rename to non-existing fields. If you already have the field `abc` and you want to rename `def` to `abc`, the operation will fail.
347377

348-
## Use a script processor
378+
## Script processor
349379

350380
If no built-in processor can achieve your goal, you may need to use a [script processor](elasticsearch://reference/enrich-processor/script-processor.md) in your ingest pipeline. Be sure to write scripts that are clear, concise, and maintainable.
351381

382+
### Add new fields
383+
384+
All of the above discussed ways to [access fields](#access-fields) and retrieve their values is applicable within the script context. [Null handling](#null-safe-operators) is still an important aspect when accessing the fields.
385+
386+
:::{tip}
387+
The fields API is the recommended way to add new fields.
388+
:::
389+
390+
**Fields API**
391+
We get the following field `cpu.usage` and we want to rename it to `system.cpu.total.norm.pct` which represents a scale from 0-1.0, where 1 is the equivalent of 100%.
392+
393+
```json
394+
POST _ingest/pipeline/_simulate
395+
{
396+
"docs": [
397+
{
398+
"_source": {
399+
"cpu": {
400+
"usage": 90 <1>
401+
}
402+
}
403+
}
404+
],
405+
"pipeline": {
406+
"processors": [
407+
{
408+
"script": {
409+
"source": """
410+
field('system.cpu.total.norm.pct').set($('cpu.usage',0.0)/100.0) <2>
411+
"""
412+
}
413+
}
414+
]
415+
}
416+
}
417+
```
418+
1. Our field expects 0-1 and not 0-100, we will have to divide by 100 to get the right representation.
419+
2. The `field` API is exposed as `field(<field name>)`. The `set(<value>)` is responsible for setting the value. Inside we use the `$(<field name>, fallback)` to read the value out of the existing field. Lastly we divide by `100.0`. The `.0` is important, otherwise it will perform an integer only division and return just 0 instead of 0.9.
420+
421+
**No fields API**
422+
Without the field API this can also be achieved. However there is much more code involved, as we have to ensure that we can walk the full path of `system.cpu.total.norm.pct`.
423+
424+
```json
425+
{
426+
"script": {
427+
"source": "
428+
if(ctx.system == null){ <1>
429+
ctx.system = new HashMap(); <2>
430+
}
431+
if(ctx.system.cpu == null){
432+
ctx.system.cpu = [:]; <3>
433+
}
434+
if(ctx.system.cpu.total == null){
435+
ctx.system.cpu.total = [:];
436+
}
437+
if(ctx.system.cpu.total.norm == null){
438+
ctx.system.cpu.total.norm = [:];
439+
}
440+
ctx.system.cpu.total.norm.pct = $('cpu.usage', 0.0)/100.0; <4>
441+
"
442+
}
443+
}
444+
```
445+
1. We need to check whether the objects are null or not and then create them.
446+
2. We create a new HashMap to store all the objects in it.
447+
3. Instead of writing `new HashMap()` we can use the shortcut `[:]`.
448+
4. We perform the same calculation as above and set the value.
449+
352450
### Calculate `event.duration` in a complex manner
353451

354452
#### ![ ](../../images/icon-cross.svg) **Don't**: Use verbose and error-prone scripting patterns
@@ -364,6 +462,7 @@ If no built-in processor can achieve your goal, you may need to use a [script pr
364462
}
365463
}
366464
```
465+
367466
1. Avoid accessing fields using square brackets instead of dot notation.
368467
2. `ctx['event']['duration']`: Do not attempt to access child properties without ensuring the parent property exists.
369468
3. `timeString.substring(0,2)`: Avoid parsing substrings manually instead of leveraging date/time parsing utilities.
@@ -405,6 +504,7 @@ POST _ingest/pipeline/_simulate
405504
}
406505
}
407506
```
507+
408508
1. Ensure the `event` object exists before assigning to it.
409509
2. Use `DateTimeFormatter` and `LocalTime` to parse the duration string.
410510
3. Store the duration in nanoseconds, as expected by ECS.
@@ -428,14 +528,14 @@ When reconstructing or normalizing IP addresses in ingest pipelines, avoid unnec
428528
}
429529
}
430530
```
531+
431532
1. Uses square bracket notation for field access instead of dot notation.
432533
2. Unnecessary casting to `Integer` when parsing string segments.
433534
3. Allocates an extra variable for the IP string instead of setting the field directly.
434535
4. Does not check if `destination` is available as an object.
435536

436537
#### ![ ](../../images/icon-check.svg) **Do**: Use concise, readable, and safe scripting
437538

438-
439539
```json
440540
POST _ingest/pipeline/_simulate
441541
{
@@ -463,6 +563,7 @@ POST _ingest/pipeline/_simulate
463563
}
464564
}
465565
```
566+
466567
1. Uses dot notation for field access.
467568
2. Avoids unnecessary casting and extra variables.
468569
3. Uses the null safe operator (`?.`) to check for field existence.
@@ -546,3 +647,31 @@ POST _ingest/pipeline/_simulate
546647
```
547648

548649
In this example, `{{tags.0}}` retrieves the first element of the `tags` array (`"cool-host"`) and assigns it to the `host.alias` field. This approach is necessary when you want to extract a specific value from an array for use elsewhere in your document. Using the correct index ensures you get the intended value, and this pattern works for any array field in your source data.
650+
651+
### Transform into a JSON string
652+
653+
Whenever you need to store the original `_source` within a field `event.original`, we can use mustache function `{{#toJson}}<field>{{/toJson}}`.
654+
655+
```json
656+
POST _ingest/pipeline/_simulate
657+
{
658+
"docs": [
659+
{
660+
"_source": {
661+
"foo": "bar",
662+
"key": 123
663+
}
664+
}
665+
],
666+
"pipeline": {
667+
"processors": [
668+
{
669+
"set": {
670+
"field": "event.original",
671+
"value": "{{#toJson}}_source{{/toJson}}"
672+
}
673+
}
674+
]
675+
}
676+
}
677+
```

0 commit comments

Comments
 (0)