-
Notifications
You must be signed in to change notification settings - Fork 159
Document Data Flow and expression context more clearly #958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Data Flow and expression context more clearly #958
Conversation
Not sure that it's needed, as it's always the filtered input that is used, everywhere but in the |
@cdavernas should I then close #921 without any additional clarifications? I have to admit though that I always have to think a bit what the context of a given expression is so I imagine clarifications could help. Furthermore it is not inherently clear (from just the names) that |
No no, that's just my opinion, let's wait for others to kick in 😉 Anyways, more info cannot hurt, especially if you feel that's not especially intuitive or lacks clarification 😉 |
Maybe, yes! Along with some explicit graphical/schema representation of what comes/happens when, as @zolero pointed out to me in PMs! |
@cdavernas I removed the extra statement from the |
More explanation cannot hurt, especially on a topic that seems confusing (see #890 for instance). If I can add my two cents, I would get rid of the term "filter*" in favor of "transform*". Filter conveys the idea of removing part of the dataset where transform is broader and just implies mutations. |
I like it 👍 I also felt "Filtered" wasn't the right term. I will wait for feedback from the others and then change it accordingly. |
@matthias-pichler-warrify I liked the term |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, many thanks!
This argument exists because the a) add the |
I prefer option b). @cdavernas prefers option a). We can decide by throwing a coin, ping pong match or voting ;) |
haha 😂 Luckily we have some more people that might break the tie @JBBianchi @ricardozanini
has the advantage that the default expressions are always the identity (i.e.
has the advantage that one less argument is needed and realistically one will always use the output to update the context anyways |
As @fjtirado said, I have a priori a slight preference for |
That is a good point, because as I understand it, neither option allows this. Should I update #953 to include a |
The output is, by default, the input of the next task, isnt it? |
My initial preference was option a because it makes sense for
From my understanding of the specification, If we agree with @matthias-pichler-warrify's proposal to make the raw output accessible to
This approach is acceptable. My only concern is that, in most cases, we would prefer to use the raw output to update the context rather than the transformed output, which is intended specifically for use by the next task. |
That's partially incorrect imho. The raw ouput was should be supplied by On another note, I don't understand how you could ever evaluate two expression simultaneously. My understanding is to first evaluate output, then export, as explained in the schema @matthias-pichler-warrify has created |
Currently it is not because As it stands now only one of the raw or transformed output is available ... whatever |
I think both options of simultaneous and in order evaluation are possible. Whichever we define as the intended semantics (independent of implementation). Making them simultaneous forces |
I think we should impose an order to avoid ambiguity, so |
That is not as it was initially intended, because unlike the raw input, the context was accessible thanks to @matthias-pichler-warrify Based on which info did you conclude |
Simultaneously makes no sense to me, but that's just my opinion. As a matter if fact, it's counter intuitive, error prone and confusing. Plus, I doubt that implementers will rely on parallel processing to do such trivial evaluations, therefore they will arbitrarily evaluate one or the other. |
I didn't choose the proper wording I think. Just to be clear, the term "simultaneously" is used figuratively here, not to imply that the operations occur in parallel threads, but rather that they are independent and can be processed in any order. Historically, output.as and export.as were two operations related to output, to and from. These operations were processed "at the same moment, after the task returned its output" (not as "parallel tasks"). There was nothing preventing an implementer from processing one before the other, as there was no order of preference. |
To conclude, my opinion is that we should indicate that the This would ensure we make no exception to Whatever we choose to proceed with, we should clarify that output should be transformed BEFORE exporting. This is IMHO more consistent, more intuitive and more logical. On a side note, it's IMO perfectly fine that the info might be retrieved using What do you guys think? |
I think the split of
I think this leaves us with the following options, that we can vote on: Option 1
👍 Pro
👎 Con
Option 2
👍 Pro
👎 Con
Option 3
👍 Pro
👎 Con
Option 4
👍 Pro
👎 Con
----- UPDATE ---- Option 5
|
I prefer option 4. 👍
I don't think this is not a strong pro argument for option 3 and "purely theoretical" and according to #948 thus least priority |
Signed-off-by: Matthias Pichler <[email protected]>
Signed-off-by: Matthias Pichler <[email protected]>
Signed-off-by: Matthias Pichler <[email protected]>
Signed-off-by: Matthias Pichler <[email protected]>
ba2d17f
to
de41643
Compare
I prefer option #4 too, which was more or less the initial intent when writing the refactor (as has already been implemented in Synapse's alpha branch more than a month ago :p). I don't see the problem with having "one more argument", on the contrary, as they are but a convenience. |
Even though not required, I'd personally still add it for the sake of consistency. Plus, when writing an expression using I however don't see it as an absolute must, so if you guys feel we should remove it, let's do it. |
Signed-off-by: Matthias Pichler <[email protected]>
I agree with all the points, but the last point needs some nuance IMO. An important aspect to consider is that the evaluation contexts of So to keep the consistency of the pipeline concept, I'd rather opt for Options 1 . |
I agree. It's a weak argument I just included for completeness
For me that is honestly a non-starter because it would be SUPER confusing to have I think that since Aligning with the "pipeline concept" makes sense so I updated the docs |
Signed-off-by: Matthias Pichler <[email protected]>
So I think @cdavernas , @JBBianchi and I are aligned now 🎉 @fjtirado @ricardozanini what are your thoughts? |
Im a man of principles, so I still think Option1 is the way to go |
The use case I see for export (context) having access to the transformed output is the same one than when we want to just store the original output of the task. Maybe we have a future task (not the consecutive one) that want to do something with the transformed output. In other words, I think we cannot predict if users would like to store the original output or the transformed one in the context, it will heavily depends on the scenario. I also think forcing an order between output.as and export.as is a good thing. |
which principle drives your opinion here? The only difference between option 1 and what is currently documented is that we define
Absolutely 👍 but as I stated in my summary. Order is always
I agree. Authors have to have access to both raw & transformed output in the To summarize: |
I will summarize what we have right now because it does differ from option 4. It is basically option 1 with this PR
I added it above as option 5 |
I didn't go through our examples, but if we need something that documents the transformation, we should add it to this PR. |
Signed-off-by: Matthias Pichler <[email protected]> Co-authored-by: Ricardo Zanini <[email protected]>
Signed-off-by: Matthias Pichler <[email protected]>
I went through all examples and ctk cases and didn't find anything that needed updating. The one example that used But thanks to your reminder I found one example in the dsl-reference that needed updating 🙏 |
Signed-off-by: Matthias Pichler [email protected]
Please specify parts of this PR update:
Discussion or Issue link:
Closes #921
What this PR does:
I extended the "Data Flow" section of the DSL with some clarifying statements and also added a diagram showing the data flow.
Additional information: