-
Notifications
You must be signed in to change notification settings - Fork 0
Data models #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: type_names
Are you sure you want to change the base?
Data models #69
Conversation
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
I'm lifting the validator directly from @XzzX's [`PythonWorkflowDefinitionFunctionNode`](https://github.com/XzzX/python-workflow-definition/blob/fec059137d5c23a5983a798d347a50dbb911e56b/src/python_workflow_definition/models.py#L57) Co-authored-by: Sebastian Eibl <[email protected]> Signed-off-by: liamhuber <[email protected]>
Again, lifted from @XzzX's attack for the python workflow definition [`PythonWorkflowDefinitionNode`](https://github.com/XzzX/python-workflow-definition/blob/fec059137d5c23a5983a798d347a50dbb911e56b/src/python_workflow_definition/models.py#L68) Co-authored-by: Sebastian Eibl <[email protected]> Signed-off-by: liamhuber <[email protected]>
E.g. to avoid double ".." Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
And correct the nodes typing Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## type_names #69 +/- ##
==============================================
+ Coverage 95.50% 95.98% +0.48%
==============================================
Files 4 4
Lines 667 773 +106
==============================================
+ Hits 637 742 +105
- Misses 30 31 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@XzzX, I lifted almost verbatim a couple pydantic snippets from your python workflow definition model file, so I added you as a co-author on those commits. LMK if you'd prefer to handle this a different way. |
Signed-off-by: liamhuber <[email protected]>
Without this, tuple keys ("node", "port") get transformed into "node,port" and destroy the deserialization.
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Including child port names Signed-off-by: liamhuber <[email protected]>
And do basic model validation for their interaction with the output labels Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
We only need to convert the format for JSON (so far) -- for python we can retain the original dict-structure. Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
|
Ok, I'm happy with this. I extended the data model to always include the inputs and outputs per this comment. The next thing I'd like to stack onto this work is to actually import the fully qualified names and validate the node model against the I really like the file serialization helpers @XzzX made here and think they'd be great on the base |
|
And here's a little example I've been playing with to look at the "json" and "python" dump formats from flowrep import model
def plus_minus(x, y):
return x + y, x - y
def sum(x, y):
return x + y
g = model.WorkflowNode(
inputs=["x", "y"],
outputs=["z"],
nodes={
"macro": model.WorkflowNode(
inputs=["a", "b"],
outputs=["c", "d"],
nodes={
"differential": model.AtomicNode(
fully_qualified_name="__main__.plus_minus",
inputs=["x", "y"],
outputs=["output_0", "output_1"],
)
},
edges={
("differential", "x"): "a",
("differential", "y"): "b",
"c": ("differential", "output_0"),
"d": ("differential", "output_1"),
}
),
"add": model.AtomicNode(
fully_qualified_name="__main__.add",
inputs=["x", "y"],
outputs=["sum"],
)
},
edges={
("macro", "a"): "x",
("macro", "b"): "y",
("add", "x"): ("macro", "c"),
("add", "y"): ("macro", "d"),
"z": ("add", "sum"),
}
)
jdict = g.model_dump(mode="json")
jdict |
| edges: dict[ | ||
| str | tuple[str, str], | ||
| str | tuple[str, str], | ||
| ] # But dict[str, str] gets disallowed in validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is verbose, but I'd prefer something like this.
class EdgeModel(BaseModel):
source_node: str # reference to a node from the nodes list
source_port: int | None # either port index or None for the node itself
target_node: str # reference to a node from the nodes list
target_port: str # name of the target portThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with this, but do you intentionally exclude the information for graph parents to negotiate IO with their children? Maybe we ought to store that somewhere other than "edges", or the type hints here get very liberal and there are source/target combinations which are disallowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I can live with this only under duress -- I still prefer if the data format automatically encodes the restriction that each target can have only one port. What about
class SourceModel(BaseModel):
source_node: str # reference to a node from the nodes list
source_port: int | None # either port index or None for the node itself
class TargetModel(BaseModel):
target_node: str # reference to a node from the nodes list
target_port: str # name of the target portwith edges: dict[TargetModel, SourceModel].
Breaking out IO handling, we'd get something like
class InputModel(BaseModel):
port: str
class OutputModel(BaseModel):
port: intand input_transfer: dict[TargetModel, InputModel]; output_transfer: dict[OutputModel, SourceModel] (or whatever we call the variable). Or something similar to this.
TODO: testsdoneAdds pydantic models for the atomic1 and workflow node types.
Edges are structured as dictionaries per #65, and workflows explicitly specify their IO labels per #63.
Out of scope (will stack as a separate PRs):
Footnotes
I'm running with "atomic" right now since both Sam and I are fine with it, but this is subject to change. ↩