Skip to content

Conversation

@liamhuber
Copy link
Member

@liamhuber liamhuber commented Jan 7, 2026

TODO: tests done

Adds pydantic models for the atomic1 and workflow node types.
Edges are structured as dictionaries per #65, and workflows explicitly specify their IO labels per #63.

Out of scope (will stack as a separate PRs):

  • Parsing python into these types
  • Parsers-as-decorators
  • Other graph elements (for, ...)
  • Metadata and things pertaining to node store, etc.

Footnotes

  1. I'm running with "atomic" right now since both Sam and I are fine with it, but this is subject to change.

liamhuber and others added 12 commits January 6, 2026 15:56
Signed-off-by: liamhuber <[email protected]>
I'm lifting the validator directly from @XzzX's [`PythonWorkflowDefinitionFunctionNode`](https://github.com/XzzX/python-workflow-definition/blob/fec059137d5c23a5983a798d347a50dbb911e56b/src/python_workflow_definition/models.py#L57)

Co-authored-by: Sebastian Eibl <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Again, lifted from @XzzX's attack for the python workflow definition [`PythonWorkflowDefinitionNode`](https://github.com/XzzX/python-workflow-definition/blob/fec059137d5c23a5983a798d347a50dbb911e56b/src/python_workflow_definition/models.py#L68)

Co-authored-by: Sebastian Eibl <[email protected]>
Signed-off-by: liamhuber <[email protected]>
E.g. to avoid double ".."

Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
And correct the nodes typing

Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Binder 👈 Launch a binder notebook on branch pyiron/flowrep/data_models

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 97.22222% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.98%. Comparing base (53bb709) to head (298e3e6).

Files with missing lines Patch % Lines
flowrep/model.py 97.22% 3 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           type_names      #69      +/-   ##
==============================================
+ Coverage       95.50%   95.98%   +0.48%     
==============================================
  Files               4        4              
  Lines             667      773     +106     
==============================================
+ Hits              637      742     +105     
- Misses             30       31       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liamhuber
Copy link
Member Author

@XzzX, I lifted almost verbatim a couple pydantic snippets from your python workflow definition model file, so I added you as a co-author on those commits. LMK if you'd prefer to handle this a different way.

@liamhuber liamhuber changed the base branch from main to type_names January 7, 2026 15:36
Signed-off-by: liamhuber <[email protected]>
Without this, tuple keys ("node", "port") get transformed into "node,port" and destroy the deserialization.

Signed-off-by: liamhuber <[email protected]>
Including child port names

Signed-off-by: liamhuber <[email protected]>
And do basic model validation for their interaction with the output labels

Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
We only need to convert the format for JSON (so far) -- for python we can retain the original dict-structure.

Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
@liamhuber
Copy link
Member Author

Ok, I'm happy with this. I extended the data model to always include the inputs and outputs per this comment. The next thing I'd like to stack onto this work is to actually import the fully qualified names and validate the node model against the ast inspection of the functions. This would be either a pre-cursor to or part-of writing a parser to go from a python function object to the recipe model (and from there the decorator is a trivial step).

I really like the file serialization helpers @XzzX made here and think they'd be great on the base NodeModel class, but will leave that to another PR.

@liamhuber liamhuber changed the title (WIP) Data models Data models Jan 7, 2026
@liamhuber liamhuber marked this pull request as ready for review January 7, 2026 18:21
@liamhuber liamhuber requested review from XzzX and samwaseda January 7, 2026 18:22
@liamhuber
Copy link
Member Author

liamhuber commented Jan 7, 2026

And here's a little example I've been playing with to look at the "json" and "python" dump formats

from flowrep import model

def plus_minus(x, y):
    return x + y, x - y

def sum(x, y):
    return x + y

g = model.WorkflowNode(
    inputs=["x", "y"],
    outputs=["z"],
    nodes={
        "macro": model.WorkflowNode(
            inputs=["a", "b"],
            outputs=["c", "d"],
            nodes={
                "differential": model.AtomicNode(
                    fully_qualified_name="__main__.plus_minus",
                    inputs=["x", "y"],
                    outputs=["output_0", "output_1"],
                )
            },
            edges={
                ("differential", "x"): "a",
                ("differential", "y"): "b",
                "c": ("differential", "output_0"),
                "d": ("differential", "output_1"),
            }
        ),
        "add": model.AtomicNode(
            fully_qualified_name="__main__.add",
            inputs=["x", "y"],
            outputs=["sum"],
        )
    },
    edges={
        ("macro", "a"): "x",
        ("macro", "b"): "y",
        ("add", "x"): ("macro", "c"),
        ("add", "y"): ("macro", "d"),
        "z": ("add", "sum"),
    }
)

jdict = g.model_dump(mode="json")
jdict

Comment on lines +61 to +64
edges: dict[
str | tuple[str, str],
str | tuple[str, str],
] # But dict[str, str] gets disallowed in validation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is verbose, but I'd prefer something like this.

class EdgeModel(BaseModel):
    source_node: str  # reference to a node from the nodes list
    source_port: int | None  # either port index or None for the node itself
    target_node: str  # reference to a node from the nodes list
    target_port: str  # name of the target port

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with this, but do you intentionally exclude the information for graph parents to negotiate IO with their children? Maybe we ought to store that somewhere other than "edges", or the type hints here get very liberal and there are source/target combinations which are disallowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I can live with this only under duress -- I still prefer if the data format automatically encodes the restriction that each target can have only one port. What about

class SourceModel(BaseModel):
    source_node: str  # reference to a node from the nodes list
    source_port: int | None  # either port index or None for the node itself

class TargetModel(BaseModel):
    target_node: str  # reference to a node from the nodes list
    target_port: str  # name of the target port

with edges: dict[TargetModel, SourceModel].

Breaking out IO handling, we'd get something like

class InputModel(BaseModel):
    port: str

class OutputModel(BaseModel):
    port: int

and input_transfer: dict[TargetModel, InputModel]; output_transfer: dict[OutputModel, SourceModel] (or whatever we call the variable). Or something similar to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants