Skip to content

Document Data Flow and expression context more clearly #958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 8, 2024
Merged
12 changes: 6 additions & 6 deletions dsl-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1519,7 +1519,7 @@ from: .order.pet

### Output

Documents the structure - and optionally configures the filtering of - workflow/task output data.
Documents the structure - and optionally configures the transformations of - workflow/task output data.

It's crucial for authors to document the schema of output data whenever feasible. This documentation empowers consuming applications to provide contextual auto-suggestions when handling runtime expressions.

Expand All @@ -1544,11 +1544,11 @@ output:
petId:
type: string
required: [ petId ]
as:
petId: '${ .pet.id }'
as:
petId: '${ .pet.id }'
export:
as:
'.petList += [ . ]'
'.petList += [ $task.output ]'
```

### Export
Expand All @@ -1566,13 +1566,13 @@ Optionally, the context might have an associated schema.

#### Examples

Merge the task output into the current context.
Merge the task output into the current context.

```yaml
as: '.+$output'
```

Replace the context with the task output.
Replace the context with the task output.

```yaml
as: $output
Expand Down
120 changes: 96 additions & 24 deletions dsl.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,39 +146,97 @@ Once the task has been executed, different things can happen:

In Serverless Workflow DSL, data flow management is crucial to ensure that the right data is passed between tasks and to the workflow itself.

Here's how data flows through a workflow based on various filtering stages:
Here's how data flows through a workflow based on various transformation stages:

1. **Filter Workflow Input**
Before the workflow starts, the input data provided to the workflow can be filtered to ensure only relevant data is passed into the workflow context. This step allows the workflow to start with a clean and focused dataset, reducing potential overhead and complexity in subsequent tasks.
1. **Transform Workflow Input**
Before the workflow starts, the input data provided to the workflow can be transformed to ensure only relevant data in the expected format is passed into the workflow context. This can be done using the top level `input.from` expression. It evaluates on the raw workflow input and defaults to the identity expression which leaves the input unchanged. This step allows the workflow to start with a clean and focused dataset, reducing potential overhead and complexity in subsequent tasks. The result of this expression will set as the initial value for the `$context` runtime expression argument and be passed to the first task.

*Example: If the workflow receives a JSON object as input, a filter can be applied to remove unnecessary fields and retain only those that are required for the workflow's execution.*
*Example: If the workflow receives a JSON object as input, a transformation can be applied to remove unnecessary fields and retain only those that are required for the workflow's execution.*

2. **Filter First Task Input**
The input data for the first task can be filtered to match the specific requirements of that task. This ensures that the first task receives only the necessary data it needs to perform its operations.
2. **Transform First Task Input**
The input data for the first task can be transformed to match the specific requirements of that task. This ensures that the first task receives only the data required to perform its operations. This can be done using the task's `input.from` expression. It evaluates the transformed workflow input and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition.

*Example: If the first task is a function call that only needs a subset of the workflow input, a filter can be applied to provide only those fields needed for the function to execute.*
*Example: If the first task is a function call that only needs a subset of the workflow input, a transformation can be applied to provide only those fields needed for the function to execute.*

3. **Filter First Task Output**
After the first task completes, its output can be filtered before passing it to the next task or storing it in the workflow context. This helps in managing the data flow and keeping the context clean by removing any unnecessary data produced by the task.
3. **Transform First Task Output**
After completing the first task, its output can be transformed before passing it to the next task or storing it in the workflow context. Transformations are applied using the `output.as` runtime expression. It evaluates the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be input for the next task. To update the context, one uses the `export.as` runtime expression. It evaluates the raw output and defaults to the expression that returns the existing context. The result of this runtime expression replaces the workflow's current context and the content of the `$context` runtime expression argument. This helps manage the data flow and keep the context clean by removing any unnecessary data produced by the task.

*Example: If the first task returns a large dataset, a filter can be applied to retain only the relevant results needed for subsequent tasks.*
*Example: If the first task returns a large dataset, a transformation can be applied to retain only the relevant results needed for subsequent tasks.*

4. **Filter Last Task Input**
Before the last task in the workflow executes, its input data can be filtered to ensure it receives only the necessary information. This step is crucial for ensuring that the final task has all the required data to complete the workflow successfully.
4. **Transform Last Task Input**
Before the last task in the workflow executes, its input data can be transformed to ensure it receives only the necessary information. This can be done using the task's `input.from` expression. It evaluates the transformed workflow input and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition. This step is crucial for ensuring the final task has all the required data to complete the workflow successfully.

*Example: If the last task involves generating a report, the input filter can ensure that only the data required for the report generation is passed to the task.*
*Example: If the last task involves generating a report, the input transformation can ensure that only the data required for the report generation is passed to the task.*

5. **Filter Last Task Output**
After the last task completes, its output can be filtered before it is considered as the workflow output. This ensures that the workflow produces a clean and relevant output, free from any extraneous data that might have been generated during the task execution.
5. **Transform Last Task Output**
After the last task completes, its output can be transformed before it is considered the workflow output. Transformations are applied using the `output.as` runtime expression. It evaluates the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be passed to the workflow `output.as` runtime expression. This ensures that the workflow produces a clean and relevant output, free from any extraneous data that might have been generated during the task execution.

*Example: If the last task outputs various statistics, a filter can be applied to retain only the key metrics that are relevant to the stakeholders.*
*Example: If the last task outputs various statistics, a transformation can be applied to retain only the key metrics that are relevant to the stakeholders.*

6. **Filter Workflow Output**
Finally, the overall workflow output can be filtered before it is returned to the caller or stored. This step ensures that the final output of the workflow is concise and relevant, containing only the necessary information that needs to be communicated or recorded.
6. **Transform Workflow Output**
Finally, the overall workflow output can be transformed before it is returned to the caller or stored. Transformations are applied using the `output.as` runtime expression. It evaluates the last task's output and defaults to the identity expression, which leaves the output unchanged. This step ensures that the final output of the workflow is concise and relevant, containing only the necessary information that needs to be communicated or recorded.

*Example: If the workflow's final output is a summary report, a filter can ensure that the report contains only the most important summaries and conclusions, excluding any intermediate data.*
*Example: If the workflow's final output is a summary report, a transformation can ensure that the report contains only the most important summaries and conclusions, excluding any intermediate data.*

By applying filters at these strategic points, Serverless Workflow DSL ensures that data flows through the workflow in a controlled and efficient manner, maintaining clarity and relevance at each stage of execution. This approach helps in managing complex workflows and ensures that each task operates with the precise data it requires, leading to more predictable and reliable workflow outcomes.
By applying transformations at these strategic points, Serverless Workflow DSL ensures that data flows through the workflow in a controlled and efficient manner, maintaining clarity and relevance at each execution stage. This approach helps manage complex workflows and ensures that each task operates with the precise data required, leading to more predictable and reliable workflow outcomes.

Visually, this can be represented as follows:

```mermaid
flowchart TD

initial_context_arg([<code>$context</code>])
context_arg([<code>$context</code>])
input_arg([<code>$input</code>])
output_arg([<code>$output</code>])

workflow_raw_input{{Raw Workflow Input}}
workflow_input_from[Workflow: <code>input.from</code>]
workflow_transformed_input{{Transformed Workflow Input}}

task_raw_input{{Raw Task Input}}
task_input_from[Task: <code>input.from</code>]
task_transformed_input{{Transformed Task Input}}
task_definition[Task definition]
task_raw_output{{Raw Task output}}
task_output_as[Task: <code>output.as</code>]
task_transformed_output{{Transformed Task output}}
task_export_as[Task: <code>export.as</code>]

workflow_raw_output{{Raw Workflow Output}}
workflow_output_as[Workflow: <code>output.as</code>]
workflow_transformed_output{{Transformed Workflow Output}}

workflow_raw_input --> workflow_input_from
workflow_input_from -- Produces --> workflow_transformed_input
workflow_transformed_input -- Set as --> initial_context_arg
workflow_transformed_input -- Passed to --> task_raw_input

subgraph Task

task_raw_input -- Passed to --> task_input_from
task_input_from -- Produces --> task_transformed_input
task_transformed_input -- Set as --> input_arg
task_transformed_input -- Passed to --> task_definition

task_definition -- Execution produces --> task_raw_output
task_raw_output -- Passed to --> task_output_as
task_output_as -- Produces --> task_transformed_output
task_output_as -- Set as --> output_arg
task_transformed_output -- Passed to --> task_export_as
end

task_transformed_output -- Passed as raw input to --> next_task

subgraph next_task [Next Task]
end

task_export_as -- Result set as --> context_arg

next_task -- Transformed output becomes --> workflow_raw_output
workflow_raw_output -- Passed to --> workflow_output_as
workflow_output_as -- Produces --> workflow_transformed_output
```

### Runtime Expressions

Expand All @@ -202,8 +260,9 @@ When the evaluation of an expression fails, runtimes **must** raise an error wit

| Name | Type | Description |
|:-----|:----:|:------------|
| context | `any` | The task's context data. |
| input | `any` | The task's filtered input. |
| context | `map` | The task's context data. |
| input | `any` | The task's transformed input. |
| output | `any` | The task's transformed output. |
| secrets | `map` | A key/value map of the workflow secrets.<br>To avoid unintentional bleeding, secrets can only be used in the `input.from` runtime expression. |
| task | [`taskDescriptor`](#task-descriptor) | Describes the current task. |
| workflow | [`workflowDescriptor`](#workflow-descriptor) | Describes the current workflow. |
Expand All @@ -225,7 +284,8 @@ This argument contains information about the runtime executing the workflow.
|:-----|:----:|:------------|:--------|
| name | `string` | The task's name. | `getPet` |
| definition | `map` | The tasks definition (specified under the name) as a parsed object | `{ "call": "http", "with": { ... } }` |
| input | `any` | The task's input *BEFORE* the `input.from` expression. For the result of `input.from` expression use the context of the runtime expression (for jq `.`) | - |
| input | `any` | The task's *raw* input (i.e. *BEFORE* the `input.from` expression). For the result of `input.from` expression use the context of the runtime expression (for jq `.`) | - |
| output | `any` | The task's *raw* output (i.e. *BEFORE* the `output.as` expression). | - |
| startedAt.iso8601 | `string` | The start time of the task as a ISO 8601 date time string. It uses `T` as the date-time delimiter, either UTC (`Z`) or a time zone offset (`+01:00`). The precision can be either seconds, milliseconds or nanoseconds | `2022-01-01T12:00:00Z`, `2022-01-01T12:00:00.123456Z`, `2022-01-01T12:00:00.123+01:00` |
| startedAt.epochMillis | `integer` | The start time of the task as a integer value of milliseconds since midnight of 1970-01-01 UTC | `1641024000123` (="2022-01-01T08:00:00.123Z") |
| startedAt.epochNanos | `integer` | The start time of the task as a integer value of nanoseconds since midnight of 1970-01-01 UTC | `1641024000123456` (="2022-01-01T08:00:00.123456Z") |
Expand All @@ -236,11 +296,23 @@ This argument contains information about the runtime executing the workflow.
|:-----|:----:|:------------|:--------|
| id | `string` | A unique id of the workflow execution. Now specific format is imposed | UUIDv4: `4a5c8422-5868-4e12-8dd9-220810d2b9ee`, ULID: `0000004JFGDSW1H037G7J7SFB9` |
| definition | `map` | The workflow's definition as a parsed object | `{ "document": { ... }, "do": [...] }` |
| input | `any` | The workflow's input *BEFORE* the `input.from` expression. For the result of `input.from` expression use the `$input` argument | - |
| input | `any` | The workflow's *raw* input (i.e *BEFORE* the `input.from` expression). For the result of `input.from` expression use the `$input` argument | - |
| startedAt.iso8601 | `string` | The start time of the execution as a ISO 8601 date time string. It uses `T` as the date-time delimiter, either UTC (`Z`) or a time zone offset (`+01:00`). The precision can be either seconds, milliseconds or nanoseconds | `2022-01-01T12:00:00Z`, `2022-01-01T12:00:00.123456Z`, `2022-01-01T12:00:00.123+01:00` |
| startedAt.epochMillis | `integer` | The start time of the execution as a integer value of milliseconds since midnight of 1970-01-01 UTC | `1641024000123` (="2022-01-01T08:00:00.123Z") |
| startedAt.epochNanos | `integer` | The start time of the execution as a integer value of nanoseconds since midnight of 1970-01-01 UTC | `1641024000123456` (="2022-01-01T08:00:00.123456Z") |

The following table shows which arguments are available for each runtime expression:

| Runtime Expression | Evaluated on | Produces | `$context` | `$input` | `$output` | `$secrets` | `$task` | `$workflow` |
|:-------------------|:---------:|:---------:|:---------:|:---------:|:-------:|:---------:|:-------:|:----------:|
| Workflow `input.from` | Raw workflow input | Transformed workflow input | | | | ✔ | | ✔ |
| Task `input.from` | Raw task input (i.e. transformed workflow input for the first task, transformed output from previous task otherwise) | Transformed task input | ✔ | | | ✔ | ✔ | ✔ |
| Task `if` | Transformed task input | | ✔ | ✔ | | ✔ | ✔ | ✔ |
| Task definition | Transformed task input | | ✔ | ✔ | | ✔ | ✔ | ✔ |
| Task `output.as` | Raw task output | Transformed task output | ✔ | ✔ | | ✔ | ✔ | ✔ |
| Task `export.as` | Transformed task output | `$context` | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Workflow `output.as` | Last task's transformed output | Transformed workflow output | ✔ | | | ✔ | | ✔ |

### Fault Tolerance

Serverless Workflow is designed with resilience in mind, acknowledging that errors are an inevitable part of any system. The DSL provides robust mechanisms to identify, describe, and handle errors effectively, ensuring the workflow can recover gracefully from failures.
Expand Down
Loading