Data models #69

liamhuber · 2026-01-07T01:14:31Z

~~TODO: tests~~ done

Adds pydantic models for the atomic¹ and workflow node types.
Edges are structured as dictionaries per #65, and workflows explicitly specify their IO labels per #63.

Out of scope (will stack as a separate PRs):

Parsing python into these types
Parsers-as-decorators
Other graph elements (for, ...)
Metadata and things pertaining to node store, etc.

I'm running with "atomic" right now since both Sam and I are fine with it, but this is subject to change. ↩

Signed-off-by: liamhuber <[email protected]>

@XzzX

I'm lifting the validator directly from @XzzX's [`PythonWorkflowDefinitionFunctionNode`](https://github.com/XzzX/python-workflow-definition/blob/fec059137d5c23a5983a798d347a50dbb911e56b/src/python_workflow_definition/models.py#L57) Co-authored-by: Sebastian Eibl <[email protected]> Signed-off-by: liamhuber <[email protected]>

@XzzX

Again, lifted from @XzzX's attack for the python workflow definition [`PythonWorkflowDefinitionNode`](https://github.com/XzzX/python-workflow-definition/blob/fec059137d5c23a5983a798d347a50dbb911e56b/src/python_workflow_definition/models.py#L68) Co-authored-by: Sebastian Eibl <[email protected]> Signed-off-by: liamhuber <[email protected]>

E.g. to avoid double ".." Signed-off-by: liamhuber <[email protected]>

Signed-off-by: liamhuber <[email protected]>

And correct the nodes typing Signed-off-by: liamhuber <[email protected]>

Signed-off-by: liamhuber <[email protected]>

github-actions · 2026-01-07T01:14:41Z

👈 Launch a binder notebook on branch pyiron/flowrep/data_models

codecov · 2026-01-07T01:15:59Z

Codecov Report

❌ Patch coverage is 97.22222% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.98%. Comparing base (53bb709) to head (298e3e6).

Files with missing lines	Patch %	Lines
flowrep/model.py	97.22%	3 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##           type_names      #69      +/-   ##
==============================================
+ Coverage       95.50%   95.98%   +0.48%     
==============================================
  Files               4        4              
  Lines             667      773     +106     
==============================================
+ Hits              637      742     +105     
- Misses             30       31       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

liamhuber · 2026-01-07T01:16:05Z

@XzzX, I lifted almost verbatim a couple pydantic snippets from your python workflow definition model file, so I added you as a co-author on those commits. LMK if you'd prefer to handle this a different way.

Signed-off-by: liamhuber <[email protected]>

Without this, tuple keys ("node", "port") get transformed into "node,port" and destroy the deserialization. Signed-off-by: liamhuber <[email protected]>

Signed-off-by: liamhuber <[email protected]>

Including child port names Signed-off-by: liamhuber <[email protected]>

And do basic model validation for their interaction with the output labels Signed-off-by: liamhuber <[email protected]>

Signed-off-by: liamhuber <[email protected]>

We only need to convert the format for JSON (so far) -- for python we can retain the original dict-structure. Signed-off-by: liamhuber <[email protected]>

Signed-off-by: liamhuber <[email protected]>

liamhuber · 2026-01-07T18:21:06Z

Ok, I'm happy with this. I extended the data model to always include the inputs and outputs per this comment. The next thing I'd like to stack onto this work is to actually import the fully qualified names and validate the node model against the ast inspection of the functions. This would be either a pre-cursor to or part-of writing a parser to go from a python function object to the recipe model (and from there the decorator is a trivial step).

I really like the file serialization helpers @XzzX made here and think they'd be great on the base NodeModel class, but will leave that to another PR.

liamhuber · 2026-01-07T19:18:43Z

And here's a little example I've been playing with to look at the "json" and "python" dump formats

from flowrep import model

def plus_minus(x, y):
    return x + y, x - y

def sum(x, y):
    return x + y

g = model.WorkflowNode(
    inputs=["x", "y"],
    outputs=["z"],
    nodes={
        "macro": model.WorkflowNode(
            inputs=["a", "b"],
            outputs=["c", "d"],
            nodes={
                "differential": model.AtomicNode(
                    fully_qualified_name="__main__.plus_minus",
                    inputs=["x", "y"],
                    outputs=["output_0", "output_1"],
                )
            },
            edges={
                ("differential", "x"): "a",
                ("differential", "y"): "b",
                "c": ("differential", "output_0"),
                "d": ("differential", "output_1"),
            }
        ),
        "add": model.AtomicNode(
            fully_qualified_name="__main__.add",
            inputs=["x", "y"],
            outputs=["sum"],
        )
    },
    edges={
        ("macro", "a"): "x",
        ("macro", "b"): "y",
        ("add", "x"): ("macro", "c"),
        ("add", "y"): ("macro", "d"),
        "z": ("add", "sum"),
    }
)

jdict = g.model_dump(mode="json")
jdict

XzzX · 2026-01-08T09:35:23Z

flowrep/model.py

+    edges: dict[
+        str | tuple[str, str],
+        str | tuple[str, str],
+    ]  # But dict[str, str] gets disallowed in validation


I know this is verbose, but I'd prefer something like this.

class EdgeModel(BaseModel): source_node: str # reference to a node from the nodes list source_port: int | None # either port index or None for the node itself target_node: str # reference to a node from the nodes list target_port: str # name of the target port

I can live with this, but do you intentionally exclude the information for graph parents to negotiate IO with their children? Maybe we ought to store that somewhere other than "edges", or the type hints here get very liberal and there are source/target combinations which are disallowed.

Actually, I can live with this only under duress -- I still prefer if the data format automatically encodes the restriction that each target can have only one port. What about

class SourceModel(BaseModel): source_node: str # reference to a node from the nodes list source_port: int | None # either port index or None for the node itself class TargetModel(BaseModel): target_node: str # reference to a node from the nodes list target_port: str # name of the target port

with edges: dict[TargetModel, SourceModel].

Breaking out IO handling, we'd get something like

class InputModel(BaseModel): port: str class OutputModel(BaseModel): port: int

and input_transfer: dict[TargetModel, InputModel]; output_transfer: dict[OutputModel, SourceModel] (or whatever we call the variable). Or something similar to this.

liamhuber and others added 12 commits January 6, 2026 15:56

Add pydantic dependency

d9c4520

Signed-off-by: liamhuber <[email protected]>

Scaffold the two most basic models

2ff08fa

Signed-off-by: liamhuber <[email protected]>

Make the validation stricter

9aa6fff

E.g. to avoid double ".." Signed-off-by: liamhuber <[email protected]>

Fix message string

07d1f74

Signed-off-by: liamhuber <[email protected]>

Black newlines

96d1b14

Signed-off-by: liamhuber <[email protected]>

Ruff

4240152

Signed-off-by: liamhuber <[email protected]>

Fill out the WorkflowNode

e294052

And correct the nodes typing Signed-off-by: liamhuber <[email protected]>

Add a list of excluded child names

ab5a39c

Signed-off-by: liamhuber <[email protected]>

Add a cyclicity check

e16e82a

Signed-off-by: liamhuber <[email protected]>

Black

d9c7cea

Signed-off-by: liamhuber <[email protected]>

liamhuber changed the base branch from main to type_names January 7, 2026 15:36

liamhuber mentioned this pull request Jan 7, 2026

How to represent a function? #21

Open

liamhuber added 11 commits January 7, 2026 08:45

Add tests

c605ac6

Signed-off-by: liamhuber <[email protected]>

Fix JSON tuple-key issue

963a93b

Without this, tuple keys ("node", "port") get transformed into "node,port" and destroy the deserialization. Signed-off-by: liamhuber <[email protected]>

Pull inputs and output up to the base model

1e0068d

Signed-off-by: liamhuber <[email protected]>

Validate entire graph topology

3de62ed

Including child port names Signed-off-by: liamhuber <[email protected]>

Add unpacking arguments suggested by @XzzX

0f07c94

And do basic model validation for their interaction with the output labels Signed-off-by: liamhuber <[email protected]>

lint

8c0b070

Signed-off-by: liamhuber <[email protected]>

Make output unpacking choices disjoint

0a80c25

Signed-off-by: liamhuber <[email protected]>

Use an enum instead of a literal

fb139b3

Signed-off-by: liamhuber <[email protected]>

Fine-grain the edge dumping

3c49f48

We only need to convert the format for JSON (so far) -- for python we can retain the original dict-structure. Signed-off-by: liamhuber <[email protected]>

Hold reserved names on the workflow class

b2f49e7

Signed-off-by: liamhuber <[email protected]>

Rename

298e3e6

Signed-off-by: liamhuber <[email protected]>

liamhuber changed the title ~~(WIP) Data models~~ Data models Jan 7, 2026

liamhuber marked this pull request as ready for review January 7, 2026 18:21

liamhuber requested review from XzzX and samwaseda January 7, 2026 18:22

XzzX reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data models #69

Data models #69

Uh oh!

liamhuber commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

codecov bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

liamhuber commented Jan 7, 2026

Uh oh!

liamhuber commented Jan 7, 2026

Uh oh!

liamhuber commented Jan 7, 2026 •

edited

Loading

Uh oh!

XzzX Jan 8, 2026

Uh oh!

liamhuber Jan 8, 2026

Uh oh!

liamhuber Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Data models #69

Are you sure you want to change the base?

Data models #69

Uh oh!

Conversation

liamhuber commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

liamhuber commented Jan 7, 2026

Uh oh!

liamhuber commented Jan 7, 2026

Uh oh!

liamhuber commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XzzX Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

liamhuber Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

liamhuber Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liamhuber commented Jan 7, 2026 •

edited

Loading

codecov bot commented Jan 7, 2026 •

edited

Loading

liamhuber commented Jan 7, 2026 •

edited

Loading