Skip to content

dataFilters, insert by default instead of merging #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorgenj opened this issue Mar 5, 2021 · 6 comments
Closed

dataFilters, insert by default instead of merging #286

jorgenj opened this issue Mar 5, 2021 · 6 comments
Labels
area: spec Changes in the Specification

Comments

@jorgenj
Copy link
Contributor

jorgenj commented Mar 5, 2021

I'd like to propose the following:

  • Change actionDataFilter & eventDataFilter to support simple insert, no merging.
  • Add the ability for customers to reference the state data in their actionDataFilter.results or eventDataFilter.data expressions, which gives them the power to specify custom 'merge' behavior.

WHY?

Because this is much simpler to understand, there are no surprises about deep or shallow merging its just a simple insert by default. It also lets users leverage the power of JQ to do custom 'merge' strategies.

Consider the example:

results: ${.}
toStateData: .customer

This would simply insert the action result at the location .customer in state data. If something already exists there in state data, this would completely over-write that data.

What if users really want to do merge?

We can give users much more flexibility, though it requires that the runtime make the state data available as a pre-defined jq variable.

With that we can give the user all kinds of flexibility.

(These examples assume the action result, referenced by . is a customer object)

Here's an example where we do a simple merge via actionDataFilter:

results: ${$stateData.customer + .}
toStateData: .customer

The above example takes customer from stateData and shallow merges it with the action result, reassigning the results back to .customer in the state data.

Here's an example where we do a recursive merge via actionDataFilter:

results: ${$stateData.customer * .}
toStateData: .customer

The above example takes customer from stateData and recursively merges it with the action result, reassigning the results back to .customer in the state data.

@jorgenj jorgenj mentioned this issue Mar 5, 2021
5 tasks
@tsurdilo tsurdilo added the area: spec Changes in the Specification label Mar 5, 2021
@tsurdilo
Copy link
Contributor

tsurdilo commented Mar 7, 2021

couple of comments:

  1. the data actionDataFilter filter is executed against is the results of the function invoked in that action, not state data. the second example where we you seem to use state data in the "results" property expression should not be possible to do.

  2. by default if the actionDataFilter is not specified, or "result" is not specified it is assumed that the entire result of function invocation is added/merged back into state data, so results: ${.} is unnecessary

  3. the whole point of sometimes using ${ ...expr } and sometimes not is completely unclear to me (why would not not always want to do it). The "results" prop is an expression, which can be a simple selection, the "toStateData" is the same thing...selection. I think its much less confusing to just always have an expression inside ${ ... } block. much easier on runtimes as well.

I think just doing inserts could be fine if and only if we add ability for some data (if in state data) to be immutable.

@jorgenj
Copy link
Contributor Author

jorgenj commented Mar 8, 2021

  1. the data actionDataFilter filter is executed against is the results of the function invoked in that action, not state data. the second example where we you seem to use state data in the "results" property expression should not be possible to do.

I think you missed some detail here in what I wrote.

Specifically:

it requires that the runtime make the state data available as a pre-defined jq variable

It might have been more clear if I reversed the order of the operands for + in the example to ${. + $stateData.customer}.

  1. by default if the actionDataFilter is not specified, or "result" is not specified it is assumed that the entire result of function invocation is added/merged back into state data, so results: ${.} is unnecessary

Yes, I know this, but I wanted to make sure the example was very clear, I didn't want to leave anything to be assumed.

  1. the whole point of sometimes using ${ ...expr } and sometimes not is completely unclear to me (why would not not always want to do it). The "results" prop is an expression, which can be a simple selection, the "toStateData" is the same thing...selection. I think its much less confusing to just always have an expression inside ${ ... } block. much easier on runtimes as well.

Consider this example from bash:

var FOO=BAR
var $FOO=BAR

env

What do you expect the output of env to include?

Now consider the following example with what you're proposing:

toStateData: ${ .foo }

Given the following action results:

baz: 1

Given the existing state data of:

foo: .bar

Please take a moment to consider how a user might think that they should be getting the following output because the expression ${.foo} on state data in other contexts would produce a string with the value .bar:

foo: .bar
bar: 
  baz: 1

Overall, I think there's something wrong about the way the spec is using the '${}'. In any other language I've seen, something like that is used for variable interpolation embedded in a string. In fact I think I saw examples where that's how it's being used (for example : Some string ${.value1} with ${.value2} used as a parameter for a function call).

I would propose the following be considered:

  • In situations where a string value should be produced, we support embedded interpolation of expressions via ${}, where the result will be a string representation of whatever is returned by the embedded expression.
    • toStateData is one such case, the result should be a string indicating a location in the stateData where the results should be inserted. An example value should be .foo, but we could allow interpolation like .foo.${.someOtherValue} which might produce the actual string .foo.bar. If that were the case, then the action results would be inserted at .foo.bar.
  • In situations where a json result should be produced, the user must specify a string containing the text of a jq expression to be evaluated, but it should not include the ${} because we're not producing a string and they're not doing variable interpolation.
    • ActionDataFilter would be an example here, it should produce a json value. The user should indicate that via some jq syntax like .foo (note that there's no ${} because we're not doing interpolation)

I think just doing inserts could be fine if and only if we add ability for some data (if in state data) to be immutable.

I'm not sure what we're trying to accomplish with immutability.

@tsurdilo
Copy link
Contributor

tsurdilo commented Mar 11, 2021

@jorgenj it requires that the runtime make the state data available as a pre-defined jq variable
imo this is not as easy to do as its not predefined but dynamic.
different states would need different logic, e.g for event state you would need to update this "preset

  1. after the state data filter filters the input
  2. after event is consumed
  3. after each action finishes
  4. after an error happens

I think for pre-defining variables such as for example $env (environmental variables) is a good idea but that is best for immutable data which state data is not

@tsurdilo
Copy link
Contributor

tsurdilo commented Mar 11, 2021

@jorgenj
I'm not sure what we're trying to accomplish with immutability.
it could be part of the requirements that the initial "customerID" that triggers the workflow execution cannot be changed or removed. Doing that could compromise the entire integrity of the workflow solution and even cause big problems (think payment systems).

That is one of the things that the old dataInputSchema and dataOutputSchema props were trying to accomplish in a way (just covered "existence")

@jorgenj
Copy link
Contributor Author

jorgenj commented Apr 14, 2021

@jorgenj it requires that the runtime make the state data available as a pre-defined jq variable
imo this is not as easy to do as its not predefined but dynamic.

This statement is confusing, the runtimes primary job is to keep track of the current state data over the duration of a workflow. It's unclear whether the difficulty here is around maintaining stateData (something the runtime already has to do) or whether it's about exposing a reference to that data in the chosen expression language.

different states would need different logic, e.g for event state you would need to update this "preset

  1. after the state data filter filters the input
  2. after event is consumed
  3. after each action finishes
  4. after an error happens

In all of these cases, the runtime should have the stateData with the appropriate current content at the time of each of these events. This proposal doesn't seem to change that, no matter what the runtime has to merge the changes into the state in some specific order. Whether the current stateData is able to be referenced at each of these points or not doesn't seem to change what the runtime has to do.

@tsurdilo wdyt?

@jorgenj
Copy link
Contributor Author

jorgenj commented Apr 29, 2021

Judging by the lack of response, the answer is clearly that there's no point in trying to follow up further on this.

@jorgenj jorgenj closed this as completed Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spec Changes in the Specification
Projects
None yet
Development

No branches or pull requests

2 participants