Document Data Flow and expression context more clearly #958

matthias-pichler · 2024-08-04T18:54:06Z

Signed-off-by: Matthias Pichler [email protected]

Please specify parts of this PR update:

Discussion or Issue link:

Closes #921

What this PR does:

I extended the "Data Flow" section of the DSL with some clarifying statements and also added a diagram showing the data flow.

Additional information:

cdavernas · 2024-08-04T18:57:35Z

Not sure that it's needed, as it's always the filtered input that is used, everywhere but in the input.from, which is the only exception.

matthias-pichler · 2024-08-04T19:02:32Z

Not sure that it's needed, as it's always the filtered input that is used, everywhere but in the input.from, which is the only exception.

@cdavernas should I then close #921 without any additional clarifications?

I have to admit though that I always have to think a bit what the context of a given expression is so I imagine clarifications could help. Furthermore it is not inherently clear (from just the names) that if runs after input.from ... maybe this should be added in the data flow section even though it's not data flow related 🤷

cdavernas · 2024-08-04T19:07:15Z

No no, that's just my opinion, let's wait for others to kick in 😉

Anyways, more info cannot hurt, especially if you feel that's not especially intuitive or lacks clarification 😉

cdavernas · 2024-08-04T19:09:37Z

maybe this should be added in the data flow section even though it's not data flow related 🤷

Maybe, yes! Along with some explicit graphical/schema representation of what comes/happens when, as @zolero pointed out to me in PMs!

matthias-pichler · 2024-08-05T08:57:22Z

@cdavernas I removed the extra statement from the task.if description and added a general overview for runtime expressions and data flow 👍

JBBianchi · 2024-08-05T09:28:14Z

More explanation cannot hurt, especially on a topic that seems confusing (see #890 for instance).

If I can add my two cents, I would get rid of the term "filter*" in favor of "transform*". Filter conveys the idea of removing part of the dataset where transform is broader and just implies mutations.

matthias-pichler · 2024-08-05T09:34:23Z

If I can add my two cents, I would get rid of the term "filter*" in favor of "transform*". Filter conveys the idea of removing part of the dataset where transform is broader and just implies mutations.

I like it 👍 I also felt "Filtered" wasn't the right term. I will wait for feedback from the others and then change it accordingly.

cdavernas · 2024-08-05T15:10:14Z

@matthias-pichler-warrify I liked the term filter, but @JBBianchi made a valid point. I'm totally OK with replacing it withtransform or whatever you guys feel fits better!

cdavernas

Looks great to me, many thanks!

matthias-pichler · 2024-08-05T20:21:11Z

export.as sometimes uses the undocumented $output variable (e.g. https://github.com/serverlessworkflow/specification/blob/main/dsl-reference.md#examples-31)

This argument exists because the export.as expressions is evaluated on the existing context and thus there is no other way to access the output. Should I:

a) add the $output argument to the docs and evaluate export.as on the existing context
b) change export.as to be evaluated on the transformed output and remove $output from the examples

fjtirado · 2024-08-06T10:23:29Z

export.as sometimes uses the undocumented $output variable (e.g. https://github.com/serverlessworkflow/specification/blob/main/dsl-reference.md#examples-31)

This argument exists because the export.as expressions is evaluated on the existing context and thus there is no other way to access the output. Should I:

a) add the $output argument to the docs and evaluate export.as on the existing context b) change export.as to be evaluated on the transformed output and remove $output from the examples

I prefer option b). @cdavernas prefers option a). We can decide by throwing a coin, ping pong match or voting ;)

matthias-pichler · 2024-08-06T10:27:00Z

I prefer option b). @cdavernas prefers option a). We can decide by throwing a coin, ping pong match or voting ;)

haha 😂 Luckily we have some more people that might break the tie @JBBianchi @ricardozanini
I am unsure to be honest but I think I also slightly prefer removing $output

a) add the $output argument to the docs and evaluate export.as on the existing context

has the advantage that the default expressions are always the identity (i.e. .)

b) change export.as to be evaluated on the transformed output and remove $output from the examples

has the advantage that one less argument is needed and realistically one will always use the output to update the context anyways

cdavernas · 2024-08-06T10:32:39Z

As @fjtirado said, I have a priori a slight preference for a because it is a tad more agile imho. I however do not have a strong or irreversible opinion. As long as the user can use both raw and modified output, it's fine by me.

matthias-pichler · 2024-08-06T10:44:08Z

As long as the user can use both raw and modified output, it's fine by me.

That is a good point, because as I understand it, neither option allows this. Should I update #953 to include a $task.output property that holds the raw output?

fjtirado · 2024-08-06T10:51:03Z

The output is, by default, the input of the next task, isnt it?
Then yes, we should be able to dump the original task output (the result of the rest invocation, for example) to the context.
Whatever option we chose we need careful documentation and example, since this is as subttle as a wargame manual ;)

JBBianchi · 2024-08-06T11:20:31Z

a) add the $output argument to the docs and evaluate export.as on the existing context
b) change export.as to be evaluated on the transformed output and remove $output from the examples

My initial preference was option a because it makes sense for . to be the context, as it is the affected object. However, the argument for removing the extra $output argument, which only exists in export.as, is a valid point as well.

As long as the user can use both raw and modified output, it's fine by me.

From my understanding of the specification, $output refers to the raw output, and there is no way to access the "transformed" output because output and export are processed simultaneously. However, it is reasonable to consider that export might be processed after output.

If we agree with @matthias-pichler-warrify's proposal to make the raw output accessible to $task, we could have:

.: the transformed output
$context: the task context, which will be updated by the current operation
$task.output: the raw output of the task

This approach is acceptable. My only concern is that, in most cases, we would prefer to use the raw output to update the context rather than the transformed output, which is intended specifically for use by the next task.

cdavernas · 2024-08-06T11:23:13Z

From my understanding of the specification, $output refers to the raw output, and there is no way to access the "transformed" output because output and export are processed simultaneously. However, it is reasonable to consider that export might be processed after output.

That's partially incorrect imho. The raw ouput was should be supplied by $output, which is the reason why I created it. The transformed output is accessed with ..

On another note, I don't understand how you could ever evaluate two expression simultaneously. My understanding is to first evaluate output, then export, as explained in the schema @matthias-pichler-warrify has created

matthias-pichler · 2024-08-06T11:49:45Z

The transformed output is accessed with .

Currently it is not because . refers to the context object. Making it refer to the transformed output is what option b) proposes.

As it stands now only one of the raw or transformed output is available ... whatever $output refers to currently 🤷‍♂️

matthias-pichler · 2024-08-06T11:53:20Z

I think both options of simultaneous and in order evaluation are possible. Whichever we define as the intended semantics (independent of implementation).

Making them simultaneous forces export.as to use the raw output either as . or $output. I definitely think that the raw output should be available to export in some form though

fjtirado · 2024-08-06T12:02:46Z

order evaluation are possible. Whichever we define as the intended semantics (independent of implementation).

Making them simultaneous forces export.as to use the raw output either as . or $output. I definitely think that the raw output should be available to export in some form though

I think we should impose an order to avoid ambiguity, so . is the transformed output and $task.output the original output always, regardless the implementation

cdavernas · 2024-08-06T12:17:38Z

Currently it is not because . refers to the context object.

That is not as it was initially intended, because unlike the raw input, the context was accessible thanks to $context. I might have documented it incorrectly though.

@matthias-pichler-warrify Based on which info did you conclude . was the context, therefore being an exception to all other evaluations?

cdavernas · 2024-08-06T12:21:28Z

I think both options of simultaneous and in order evaluation are possible

Simultaneously makes no sense to me, but that's just my opinion. As a matter if fact, it's counter intuitive, error prone and confusing. Plus, I doubt that implementers will rely on parallel processing to do such trivial evaluations, therefore they will arbitrarily evaluate one or the other.

JBBianchi · 2024-08-06T12:42:28Z

I think both options of simultaneous and in order evaluation are possible

Simultaneously makes no sense to me, but that's just my opinion. As a matter if fact, it's counter intuitive, error prone and confusing. Plus, I doubt that implementers will rely on parallel processing to do such trivial evaluations, therefore they will arbitrarily evaluate one or the other.

I didn't choose the proper wording I think. Just to be clear, the term "simultaneously" is used figuratively here, not to imply that the operations occur in parallel threads, but rather that they are independent and can be processed in any order. Historically, output.as and export.as were two operations related to output, to and from. These operations were processed "at the same moment, after the task returned its output" (not as "parallel tasks"). There was nothing preventing an implementer from processing one before the other, as there was no order of preference.

cdavernas · 2024-08-06T13:26:26Z

To conclude, my opinion is that we should indicate that the . of the export.as should be the transformed output, and to keep $output parameter (which is more consistent, knowing we already declare an $input paramater, too).

This would ensure we make no exception to . representing the "current" data. Proceeding otherwise might be confusing, at least it is to me.

Whatever we choose to proceed with, we should clarify that output should be transformed BEFORE exporting. This is IMHO more consistent, more intuitive and more logical.

On a side note, it's IMO perfectly fine that the info might be retrieved using $task.output, too: it just is an additional convenience.

What do you guys think?

matthias-pichler · 2024-08-06T17:36:37Z

I think the split of output.from and output.to introduced some drift so the spec is no longer coherent. We should reason from the ground up and I will adapt the spec accordingly. I will try to consolidate the discussion:

I think the transformed input should be available to the output.as and export.as expressions. The logical place for me to put it is $input while the raw input might live in $task.input (or for clarity $task.rawInput)
If we keep $output it should most likely refer to the transformed output since $input also refers to the transformed input (as documented here). Otherwise it would be inconsistent for authors
If we introduce $task.output it should probably refer to the raw output since Document content of $workflow and $task arguments #951 specifies the $task.input argument as the raw input
export.as should definitely have access to the raw output since I might want to store something in the context that might not be relevant to the next task/output
export.as should probably have access to the transformed output 🤷 ? If yes then a order is automatically imposed since output.as has to produce the transformed output first. Otherwise they still wouldn't be "independent" of each other. After closer inspection order does still matter since output.as has access to $context which is modified by export.as and thus order still matters and output.as before export.as seems to make more sense/be more intuitve.
Other runtime expressions such as input.from and output.as have their evaluation context be the same as the data they are transforming. This means input.from and output.as both "default" to the identity expression. In jq this would be ${ . }. Since the export.as expression modifies the value of $context keeping this property would mean that the evaluation context of export.as should be the context itself.
I however believe the authors will not think about export.as this way and will rather reason that output.as and export.as will either "be downstream of each other" or "siblings". So I think this point is not very strong.

I think this leaves us with the following options, that we can vote on:

Option 1

export.as is evaluated on the transformed output. i.e. export.as is downstream of output.as
Order is automatically imposed (output.as -> export.as)
$input is the transformed task input
$output does NOT need to exist
$task.output will refer to the raw output and will be set in both output.as (although redundant) and export.as

👍 Pro

One less runtime argument ($output)

👎 Con

referring to the raw output is more tedious with $task.output

Option 2

export.as is evaluated on the raw output.
$input is the transformed task input
$output does NOT exist and export.as does NOT have access to the transformed output. Thus some processing might need to be duplicated
Order is still imposed since output.as relies on the old $context. (output.as -> export.as)
If we want we can still introduce $task.output to refer to the raw output but it is no longer needed

👍 Pro

One less runtime argument ($output)
referring to the raw output is concise (with .)
export.as and output.as have the same evaluation context

👎 Con

authors might have to duplicate common transformations

Option 3

export.as is evaluated on the current context
$input is the transformed task input
$output needs to exist and will refer to the transformed output to be consistent with $input
Order is automatically imposed (output.as -> export.as)
$task.output will refer to the raw output and will be set in both output.as (although redundant) and export.as

👍 Pro

default runtime expressions all are the identity (i.e. ${ . })

👎 Con

referring to the raw output is more tedious with $task.output

Option 4

export.as is evaluated on the raw output.
$input is the transformed task input
$output does exist and contains the transformed output.
Order is automatically imposed (output.as -> export.as)
If we want we can still introduce $task.output to refer to the raw output but it is no longer needed

👍 Pro

No duplication for common transformations
referring to the raw output is concise (with .)
export.as and output.as have the same evaluation context

👎 Con

One more runtime argument ($output)

----- UPDATE ----

Option 5

export.as is evaluated on the transformed output.
$output does exist and contains the transformed output.
Order is automatically imposed (output.as -> export.as)
$task.output will refer to the raw output and will be set in both output.as (although redundant) and export.as

matthias-pichler · 2024-08-06T17:44:43Z

I prefer option 4. 👍

default runtime expressions all are the identity (i.e. ${ . })

I don't think this is not a strong pro argument for option 3 and "purely theoretical" and according to #948 thus least priority

Signed-off-by: Matthias Pichler <[email protected]>

cdavernas · 2024-08-06T18:33:29Z

I prefer option #4 too, which was more or less the initial intent when writing the refactor (as has already been implemented in Synapse's alpha branch more than a month ago :p).

I don't see the problem with having "one more argument", on the contrary, as they are but a convenience.

cdavernas · 2024-08-06T18:37:55Z

If we want we can still introduce $task.output to refer to the raw output but it is no longer needed

Even though not required, I'd personally still add it for the sake of consistency. Plus, when writing an expression using $task, authors might find it clearer.

I however don't see it as an absolute must, so if you guys feel we should remove it, let's do it.

Signed-off-by: Matthias Pichler <[email protected]>

JBBianchi · 2024-08-07T08:13:18Z

Other runtime expressions such as input.from and output.as have their evaluation context be the same as the data they are transforming. This means input.from and output.as both "default" to the identity expression. In jq this would be ${ . }. Since the export.as expression modifies the value of $context keeping this property would mean that the evaluation context of export.as should be the context itself.
I however believe the authors will not think about export.as this way and will rather reason that output.as and export.as will either "be downstream of each other" or "siblings". So I think this point is not very strong.

I agree with all the points, but the last point needs some nuance IMO. An important aspect to consider is that the evaluation contexts of input.from and output.as are respectively the raw input and output because those are the results of the previous step in the execution pipeline, not because of the target of the transformation. Therefore, it's a bit of a stretch to suggest that the evaluation context for export.as should be the $context. If we choose to position export.as as the step following output.as, it would be logical for its evaluation context to be the transformed output.

So to keep the consistency of the pipeline concept, I'd rather opt for Options 1 .
Maybe with a twist of using $output instead of $task.output. The later is a bit useless (output.as already have access to the raw output as its evaluation context) and I don't think it brings more clarity than $output, it's just makes it longer to type.

matthias-pichler · 2024-08-07T08:43:25Z

@JBBianchi

Therefore, it's a bit of a stretch to suggest that the evaluation context for export.as should be the $context.

I agree. It's a weak argument I just included for completeness

Maybe with a twist of using $output instead of $task.output.

For me that is honestly a non-starter because it would be SUPER confusing to have $input be the transformed input and $output be the raw output.

I think that since export.as has access to the raw output via $task.output and the transformed output using $output I am flexible on the evaluation context of export.as being either the raw or transformed output.

Aligning with the "pipeline concept" makes sense so I updated the docs

Signed-off-by: Matthias Pichler <[email protected]>

matthias-pichler · 2024-08-07T09:13:10Z

So I think @cdavernas , @JBBianchi and I are aligned now 🎉

@fjtirado @ricardozanini what are your thoughts?

fjtirado · 2024-08-07T10:02:25Z

Im a man of principles, so I still think Option1 is the way to go

fjtirado · 2024-08-07T10:22:32Z

The use case I see for export (context) having access to the transformed output is the same one than when we want to just store the original output of the task. Maybe we have a future task (not the consecutive one) that want to do something with the transformed output. In other words, I think we cannot predict if users would like to store the original output or the transformed one in the context, it will heavily depends on the scenario. I also think forcing an order between output.as and export.as is a good thing.

matthias-pichler · 2024-08-07T10:31:48Z

@fjtirado

Im a man of principles, so I still think Option1 is the way to go

which principle drives your opinion here? The only difference between option 1 and what is currently documented is that we define $output additionally

I also think forcing an order between output.as and export.as is a good thing.

Absolutely 👍 but as I stated in my summary. Order is always output.as before export.as (in all options). This is because if output.as uses $context we need to define that it should refer to the "old" context and not the one export.as produces 👍 This is currently already documented this way in this PR

I think we cannot predict if users would like to store the original output or the transformed one in the context

I agree. Authors have to have access to both raw & transformed output in the export.as expressions. Both Option 1 and the current agreement allow this 👍

To summarize:
The only difference between Option 1 and what we currently have (option 5) is that $output is defined IN ADDITION (even though redundant). And thus my question is: Are you against defining $output additionally?

matthias-pichler · 2024-08-07T10:33:22Z

I will summarize what we have right now because it does differ from option 4. It is basically option 1 with $output defined

this PR

export.as is evaluated on the transformed output.
$output does exist and contains the transformed output.
Order is automatically imposed (output.as -> export.as)
$task.output will refer to the raw output and will be set in both output.as (although redundant) and export.as

I added it above as option 5

dsl.md

ricardozanini · 2024-08-07T16:11:30Z

I didn't go through our examples, but if we need something that documents the transformation, we should add it to this PR.

Signed-off-by: Matthias Pichler <[email protected]> Co-authored-by: Ricardo Zanini <[email protected]>

Signed-off-by: Matthias Pichler <[email protected]>

matthias-pichler · 2024-08-07T20:14:17Z

I didn't go through our examples, but if we need something that documents the transformation, we should add it to this PR.

I went through all examples and ctk cases and didn't find anything that needed updating. The one example that used export.as was added by me and already used the current spec 😅

But thanks to your reminder I found one example in the dsl-reference that needed updating 🙏

matthias-pichler requested review from ricardozanini and cdavernas as code owners August 4, 2024 18:54

cdavernas added change: fix Something isn't working. Impacts in a minor version change. change: documentation Improvements or additions to documentation. It won't impact a version change. area: spec Changes in the Specification labels Aug 4, 2024

cdavernas added this to the v1.0.0-alpha3 milestone Aug 4, 2024

matthias-pichler changed the title ~~Specify task.if context~~ Document Data Flow and expression context more clearly Aug 5, 2024

cdavernas approved these changes Aug 5, 2024

View reviewed changes

matthias-pichler added 4 commits August 6, 2024 19:50

docs: specify task.if context

67f7669

Signed-off-by: Matthias Pichler <[email protected]>

docs: describe data flow visually

87337a6

Signed-off-by: Matthias Pichler <[email protected]>

docs: add table for runtime arguments

2a36aae

Signed-off-by: Matthias Pichler <[email protected]>

refactor: use transform instead of filter

de41643

Signed-off-by: Matthias Pichler <[email protected]>

matthias-pichler force-pushed the task-if-context branch from ba2d17f to de41643 Compare August 6, 2024 17:50

docs: update dsl to reflect option 4

0a0b685

Signed-off-by: Matthias Pichler <[email protected]>

matthias-pichler added 2 commits August 7, 2024 08:47

Merge remote-tracking branch 'upstream/main' into task-if-context

dfc61a6

docs: use transformed output as context

b17752d

Signed-off-by: Matthias Pichler <[email protected]>

ricardozanini reviewed Aug 7, 2024

View reviewed changes

cdavernas and others added 4 commits August 7, 2024 18:14

Merge branch 'main' into task-if-context

0df3bb4

Apply suggestions from code review

39e53b9

Signed-off-by: Matthias Pichler <[email protected]> Co-authored-by: Ricardo Zanini <[email protected]>

Merge branch 'main' into task-if-context

61186a7

fix: update dsl reference example

9853412

Signed-off-by: Matthias Pichler <[email protected]>

matthias-pichler requested a review from ricardozanini August 7, 2024 20:14

ricardozanini approved these changes Aug 8, 2024

View reviewed changes

cdavernas merged commit 1e50891 into serverlessworkflow:main Aug 8, 2024
2 checks passed

Document Data Flow and expression context more clearly #958

Document Data Flow and expression context more clearly #958

Uh oh!

Conversation

matthias-pichler commented Aug 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdavernas commented Aug 4, 2024

Uh oh!

matthias-pichler commented Aug 4, 2024

Uh oh!

cdavernas commented Aug 4, 2024

Uh oh!

cdavernas commented Aug 4, 2024

Uh oh!

matthias-pichler commented Aug 5, 2024

Uh oh!

JBBianchi commented Aug 5, 2024

Uh oh!

matthias-pichler commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdavernas commented Aug 5, 2024

Uh oh!

cdavernas left a comment

Choose a reason for hiding this comment

Uh oh!

matthias-pichler commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjtirado commented Aug 6, 2024

Uh oh!

matthias-pichler commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdavernas commented Aug 6, 2024

Uh oh!

matthias-pichler commented Aug 6, 2024

Uh oh!

fjtirado commented Aug 6, 2024

Uh oh!

JBBianchi commented Aug 6, 2024

Uh oh!

cdavernas commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthias-pichler commented Aug 6, 2024

Uh oh!

matthias-pichler commented Aug 6, 2024

Uh oh!

fjtirado commented Aug 6, 2024

Uh oh!

cdavernas commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdavernas commented Aug 6, 2024

Uh oh!

JBBianchi commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdavernas commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthias-pichler commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Option 1

👍 Pro

👎 Con

Option 2

👍 Pro

👎 Con

Option 3

👍 Pro

👎 Con

Option 4

👍 Pro

👎 Con

Option 5

Uh oh!

matthias-pichler commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

matthias-pichler commented Aug 4, 2024 •

edited

Loading

matthias-pichler commented Aug 5, 2024 •

edited

Loading

matthias-pichler commented Aug 5, 2024 •

edited

Loading

matthias-pichler commented Aug 6, 2024 •

edited

Loading

cdavernas commented Aug 6, 2024 •

edited

Loading

cdavernas commented Aug 6, 2024 •

edited

Loading

JBBianchi commented Aug 6, 2024 •

edited

Loading

cdavernas commented Aug 6, 2024 •

edited

Loading

matthias-pichler commented Aug 6, 2024 •

edited

Loading

matthias-pichler commented Aug 6, 2024 •

edited

Loading

cdavernas commented Aug 6, 2024 •

edited

Loading

cdavernas commented Aug 6, 2024 •

edited

Loading

matthias-pichler commented Aug 7, 2024 •

edited

Loading

fjtirado commented Aug 7, 2024 •

edited

Loading

matthias-pichler commented Aug 7, 2024 •

edited

Loading

matthias-pichler commented Aug 7, 2024 •

edited

Loading