Skip to content

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Jan 4, 2026

As demonstrated in https://github.com/pyiron-dev/executorlib-export-python-workflow-definition/

Example:

from executorlib import SingleNodeExecutor, get_item_from_future

function_str = """
def get_sum(x, y):
    return x + y
    
def get_prod_and_div(x, y):
    return {"prod": x * y, "div": x / y}

def get_square(x):
    return x ** 2
"""

with open("workflow.py", "w") as f:
    f.write(function_str)

from workflow import get_sum, get_prod_and_div, get_square

with SingleNodeExecutor(export_workflow_filename="workflow.json") as exe:
    future_prod_and_div = exe.submit(get_prod_and_div, x=1, y=2)
    future_prod = get_item_from_future(future_prod_and_div, key="prod")
    future_div = get_item_from_future(future_prod_and_div, key="div")
    future_sum = exe.submit(get_sum, x=future_prod, y=future_div)
    future_result = exe.submit(get_square, x=future_sum)

Summary by CodeRabbit

  • New Features

    • Added support for exporting workflow dependency graphs to JSON files across all executor types (Flux, SLURM, and single-node executors).
  • Tests

    • Added new test suite validating workflow graph export functionality with arithmetic and NumPy array workflows, ensuring proper graph structure and output.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

Adds an optional export_workflow_filename parameter to multiple executors and the interactive DependencyTaskScheduler, and implements export_dependency_graph_function to serialize workflow nodes/edges to a JSON file; the filename is used on scheduler exit to write the workflow JSON.

Changes

Cohort / File(s) Summary
Executor constructors
src/executorlib/executor/flux.py, src/executorlib/executor/slurm.py, src/executorlib/executor/single.py
Added optional export_workflow_filename parameter to public executor constructors (FluxJobExecutor, FluxClusterExecutor, SlurmClusterExecutor, SlurmJobExecutor, SingleNodeExecutor, TestClusterExecutor) and threaded the value through to the underlying scheduler/executor construction calls.
Scheduler integration
src/executorlib/task_scheduler/interactive/dependency.py
Added export_workflow_filename: Optional[str] to DependencyTaskScheduler.__init__, stored as _export_workflow_filename, updated _generate_dependency_graph logic, and changed __exit__ to call export vs. plot depending on the provided filename. Imported the export function.
Graph export implementation
src/executorlib/task_scheduler/interactive/dependency_plot.py
Added export_dependency_graph_function(node_lst, edge_lst, file_name="workflow.json") which formats nodes/edges (handles functions, inputs, numpy arrays) and writes a JSON workflow file. Added json and numpy as np imports.
Tests
tests/test_singlenodeexecutor_pwd.py
Added tests exercising SingleNodeExecutor with export_workflow_filename="workflow.json", asserting node/edge counts and removing the generated file in teardown.

Sequence Diagram

sequenceDiagram
    participant User
    participant Executor as Executor<br/>(Flux/Slurm/Single)
    participant Scheduler as DependencyTask<br/>Scheduler
    participant Export as export_dependency<br/>_graph_function
    participant FS as File System

    User->>Executor: init(export_workflow_filename)
    Executor->>Scheduler: __init__(..., export_workflow_filename=...)
    Scheduler->>Scheduler: store _export_workflow_filename

    Note over Executor,Scheduler: Workflow runs, tasks scheduled/executed

    Executor->>Scheduler: __exit__()
    alt _export_workflow_filename provided
        Scheduler->>Export: export_dependency_graph_function(nodes, edges, filename)
        Export->>Export: build JSON structure (nodes, edges, ports)
        Export->>FS: write JSON file (filename)
    else no export filename
        Scheduler->>Scheduler: call plot_dependency_graph_function(plot_filename)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A ribbon of nodes in a neat JSON line,

I hop and I nibble, this graph looks fine.
From Flux to Slurm and the scheduler's nest,
I save every edge so your workflows rest.
Hooray for exports — a carrot for the test! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main feature being added: exporting workflow definitions to a JSON file format, which is the primary change across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jan-janssen jan-janssen marked this pull request as draft January 4, 2026 17:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (1)

263-271: Non-JSON-serializable values may cause TypeError at runtime.

The else branch serializes n["value"] directly. If the value is a non-JSON-serializable object (e.g., a custom class instance, datetime, or other complex types), json.dump will raise a TypeError. Consider adding a fallback to convert such values to strings.

🔎 Example defensive approach
         else:
+            try:
+                # Test if value is JSON serializable
+                json.dumps(n["value"])
+                value = n["value"]
+            except (TypeError, ValueError):
+                value = str(n["value"])
             pwd_nodes_lst.append(
                 {
                     "id": n["id"],
                     "type": n["type"],
-                    "value": n["value"],
+                    "value": value,
                     "name": n["name"],
                 }
             )
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b357c5 and 8683249.

📒 Files selected for processing (5)
  • src/executorlib/executor/flux.py
  • src/executorlib/executor/single.py
  • src/executorlib/executor/slurm.py
  • src/executorlib/task_scheduler/interactive/dependency.py
  • src/executorlib/task_scheduler/interactive/dependency_plot.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/executorlib/task_scheduler/interactive/dependency.py (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (2)
  • export_dependency_graph_function (237-307)
  • plot_dependency_graph_function (206-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: unittest_win
  • GitHub Check: unittest_old
  • GitHub Check: unittest_flux_openmpi
  • GitHub Check: minimal
🔇 Additional comments (10)
src/executorlib/executor/slurm.py (2)

105-106: LGTM!

The export_workflow_filename parameter is correctly added to the constructor signature and properly propagated to DependencyTaskScheduler. The documentation is appropriately updated.

Also applies to: 231-231


320-321: LGTM!

The export_workflow_filename parameter is correctly added and propagated in SlurmJobExecutor, consistent with the pattern in SlurmClusterExecutor.

Also applies to: 404-404

src/executorlib/executor/flux.py (2)

109-110: LGTM!

The export_workflow_filename parameter is correctly integrated into FluxJobExecutor with proper documentation and propagation to the underlying DependencyTaskScheduler.

Also applies to: 195-195


301-302: LGTM!

The export_workflow_filename parameter is correctly integrated into FluxClusterExecutor, following the same consistent pattern as other executor classes.

Also applies to: 430-430

src/executorlib/executor/single.py (2)

98-99: LGTM!

The export_workflow_filename parameter is correctly integrated into SingleNodeExecutor with proper documentation and propagation.

Also applies to: 177-177


270-271: LGTM!

The export_workflow_filename parameter is correctly integrated into TestClusterExecutor, maintaining consistency with the other executor implementations.

Also applies to: 368-368

src/executorlib/task_scheduler/interactive/dependency.py (3)

68-74: Verify the conditional logic for _generate_dependency_graph.

The logic sets _generate_dependency_graph = True when plot_dependency_graph_filename is not None OR when export_workflow_filename is None. This means if neither filename is provided, the graph is still generated (but not saved anywhere useful since plot_dependency_graph_function with filename=None displays inline in Jupyter).

Was the intention to only generate the graph when at least one filename is provided, or is the inline Jupyter display the expected fallback behavior?


219-230: LGTM!

The __exit__ method correctly dispatches to either export_dependency_graph_function or plot_dependency_graph_function based on the provided filename. The export path correctly uses the new JSON export function when export_workflow_filename is specified.


15-20: LGTM!

The import of export_dependency_graph_function is correctly added alongside the existing imports from the same module.

src/executorlib/task_scheduler/interactive/dependency_plot.py (1)

2-8: Add numpy to the project dependencies.

The code imports numpy at line 8 for handling np.ndarray serialization in the new export_dependency_graph_function function. However, numpy is not declared as a dependency in pyproject.toml. Since this function is called from src/executorlib/task_scheduler/interactive/dependency.py and numpy is not transitively available through the graph optional dependencies, add numpy to either the main dependencies or the graph optional dependency group.

Comment on lines +293 to +300
pwd_edges_lst.append(
{
"target": final_node["id"],
"targetPort": None,
"source": max([e["target"] for e in pwd_edges_lst]),
"sourcePort": None,
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle empty edge list to prevent ValueError on max().

If edge_lst is empty, pwd_edges_lst will be empty, and max([e["target"] for e in pwd_edges_lst]) will raise ValueError: max() arg is an empty sequence. This can occur with single-node workflows or graphs without edges.

🔎 Proposed fix
-    pwd_edges_lst.append(
-        {
-            "target": final_node["id"],
-            "targetPort": None,
-            "source": max([e["target"] for e in pwd_edges_lst]),
-            "sourcePort": None,
-        }
-    )
+    if pwd_edges_lst:
+        pwd_edges_lst.append(
+            {
+                "target": final_node["id"],
+                "targetPort": None,
+                "source": max(e["target"] for e in pwd_edges_lst),
+                "sourcePort": None,
+            }
+        )
🤖 Prompt for AI Agents
In src/executorlib/task_scheduler/interactive/dependency_plot.py around lines
293-300, the code calls max() on pwd_edges_lst which raises ValueError when
pwd_edges_lst is empty; update the logic to handle an empty list by computing
source only if pwd_edges_lst has elements (e.g., source = max(e["target"] for e
in pwd_edges_lst)), otherwise set source to a safe fallback (for example
final_node["id"] or None) or skip adding the edge entirely; implement a short
conditional that selects the fallback and then append the dict using that source
to avoid the max() on an empty sequence.

@codecov
Copy link

codecov bot commented Jan 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.41%. Comparing base (7b357c5) to head (c0cf665).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #882      +/-   ##
==========================================
+ Coverage   93.34%   93.41%   +0.07%     
==========================================
  Files          38       38              
  Lines        1817     1837      +20     
==========================================
+ Hits         1696     1716      +20     
  Misses        121      121              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jan-janssen jan-janssen marked this pull request as ready for review January 5, 2026 08:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/executorlib/task_scheduler/interactive/dependency.py (1)

37-38: Docstring contains duplicate attribute name with incorrect type.

Line 37 describes _generate_dependency_graph (bool) and line 38 reuses the same attribute name with type (str). The second entry should be _plot_dependency_graph_filename (str) based on the actual attribute.

🔎 Proposed fix
     Attributes:
         _future_hash_dict (Dict[str, Future]): A dictionary mapping task hash to future object.
         _task_hash_dict (Dict[str, Dict]): A dictionary mapping task hash to task dictionary.
         _generate_dependency_graph (bool): Whether to generate the dependency graph.
-        _generate_dependency_graph (str): Name of the file to store the plotted graph in.
+        _plot_dependency_graph_filename (str): Name of the file to store the plotted graph in.
+        _export_workflow_filename (str): Name of the file to store the exported workflow graph in.
🧹 Nitpick comments (2)
src/executorlib/task_scheduler/interactive/dependency.py (1)

216-227: Export and plot are mutually exclusive when both filenames are provided.

When both export_workflow_filename and plot_dependency_graph_filename are specified, only the export is performed due to the if/else structure. If this is intentional, consider documenting this precedence in the class docstring. Otherwise, consider supporting both operations when both filenames are provided.

🔎 Proposed fix to support both operations
         if self._generate_dependency_graph:
             node_lst, edge_lst = generate_nodes_and_edges_for_plotting(
                 task_hash_dict=self._task_hash_dict,
                 future_hash_inverse_dict={
                     v: k for k, v in self._future_hash_dict.items()
                 },
             )
             if self._export_workflow_filename is not None:
-                return export_dependency_graph_function(
+                export_dependency_graph_function(
                     node_lst=node_lst,
                     edge_lst=edge_lst,
                     file_name=self._export_workflow_filename,
                 )
-            else:
-                return plot_dependency_graph_function(
+            if self._plot_dependency_graph_filename is not None:
+                plot_dependency_graph_function(
                     node_lst=node_lst,
                     edge_lst=edge_lst,
                     filename=self._plot_dependency_graph_filename,
                 )
-        else:
-            return None
+        return None
tests/test_singlenodeexecutor_pwd.py (1)

8-15: Minor: Trailing whitespace on line 10.

Line 10 has trailing whitespace after return x + y. Pre-commit hooks should catch this, but worth noting.

🔎 Proposed fix
 def get_sum(x, y):
-    return x + y
-    
+    return x + y
+
 def get_prod_and_div(x, y):
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8683249 and c0cf665.

📒 Files selected for processing (2)
  • src/executorlib/task_scheduler/interactive/dependency.py
  • tests/test_singlenodeexecutor_pwd.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/executorlib/task_scheduler/interactive/dependency.py (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (2)
  • export_dependency_graph_function (237-307)
  • plot_dependency_graph_function (206-234)
tests/test_singlenodeexecutor_pwd.py (2)
src/executorlib/executor/single.py (1)
  • SingleNodeExecutor (20-194)
src/executorlib/standalone/select.py (1)
  • get_item_from_future (42-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: notebooks_integration
🔇 Additional comments (5)
src/executorlib/task_scheduler/interactive/dependency.py (2)

16-16: LGTM!

Import addition aligns with the new export functionality.


49-49: LGTM!

The new export_workflow_filename parameter is correctly added and the initialization logic properly enables dependency graph generation when either filename is provided.

Also applies to: 67-71

tests/test_singlenodeexecutor_pwd.py (3)

18-21: LGTM!

Proper tearDown implementation to clean up the generated workflow.json file after each test.


23-36: LGTM!

The test effectively validates the arithmetic workflow export with chained futures using get_item_from_future. The assertion that future_result.result() is None correctly reflects the graph generation mode behavior where tasks are recorded but not executed.


38-47: LGTM!

Good coverage for NumPy array handling in workflow export. The test verifies that numpy arrays are properly serialized in the workflow graph (converted to lists per export_dependency_graph_function implementation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants