-
Notifications
You must be signed in to change notification settings - Fork 3
[Feature] Export to Python Workflow Definition #882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence DiagramsequenceDiagram
participant User
participant Executor as Executor<br/>(Flux/Slurm/Single)
participant Scheduler as DependencyTask<br/>Scheduler
participant Export as export_dependency<br/>_graph_function
participant FS as File System
User->>Executor: init(export_workflow_filename)
Executor->>Scheduler: __init__(..., export_workflow_filename=...)
Scheduler->>Scheduler: store _export_workflow_filename
Note over Executor,Scheduler: Workflow runs, tasks scheduled/executed
Executor->>Scheduler: __exit__()
alt _export_workflow_filename provided
Scheduler->>Export: export_dependency_graph_function(nodes, edges, filename)
Export->>Export: build JSON structure (nodes, edges, ports)
Export->>FS: write JSON file (filename)
else no export filename
Scheduler->>Scheduler: call plot_dependency_graph_function(plot_filename)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (1)
263-271: Non-JSON-serializable values may causeTypeErrorat runtime.The
elsebranch serializesn["value"]directly. If the value is a non-JSON-serializable object (e.g., a custom class instance, datetime, or other complex types),json.dumpwill raise aTypeError. Consider adding a fallback to convert such values to strings.🔎 Example defensive approach
else: + try: + # Test if value is JSON serializable + json.dumps(n["value"]) + value = n["value"] + except (TypeError, ValueError): + value = str(n["value"]) pwd_nodes_lst.append( { "id": n["id"], "type": n["type"], - "value": n["value"], + "value": value, "name": n["name"], } )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/executorlib/executor/flux.pysrc/executorlib/executor/single.pysrc/executorlib/executor/slurm.pysrc/executorlib/task_scheduler/interactive/dependency.pysrc/executorlib/task_scheduler/interactive/dependency_plot.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/executorlib/task_scheduler/interactive/dependency.py (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (2)
export_dependency_graph_function(237-307)plot_dependency_graph_function(206-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
- GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: unittest_win
- GitHub Check: unittest_old
- GitHub Check: unittest_flux_openmpi
- GitHub Check: minimal
🔇 Additional comments (10)
src/executorlib/executor/slurm.py (2)
105-106: LGTM!The
export_workflow_filenameparameter is correctly added to the constructor signature and properly propagated toDependencyTaskScheduler. The documentation is appropriately updated.Also applies to: 231-231
320-321: LGTM!The
export_workflow_filenameparameter is correctly added and propagated inSlurmJobExecutor, consistent with the pattern inSlurmClusterExecutor.Also applies to: 404-404
src/executorlib/executor/flux.py (2)
109-110: LGTM!The
export_workflow_filenameparameter is correctly integrated intoFluxJobExecutorwith proper documentation and propagation to the underlyingDependencyTaskScheduler.Also applies to: 195-195
301-302: LGTM!The
export_workflow_filenameparameter is correctly integrated intoFluxClusterExecutor, following the same consistent pattern as other executor classes.Also applies to: 430-430
src/executorlib/executor/single.py (2)
98-99: LGTM!The
export_workflow_filenameparameter is correctly integrated intoSingleNodeExecutorwith proper documentation and propagation.Also applies to: 177-177
270-271: LGTM!The
export_workflow_filenameparameter is correctly integrated intoTestClusterExecutor, maintaining consistency with the other executor implementations.Also applies to: 368-368
src/executorlib/task_scheduler/interactive/dependency.py (3)
68-74: Verify the conditional logic for_generate_dependency_graph.The logic sets
_generate_dependency_graph = Truewhenplot_dependency_graph_filename is not NoneOR whenexport_workflow_filename is None. This means if neither filename is provided, the graph is still generated (but not saved anywhere useful sinceplot_dependency_graph_functionwithfilename=Nonedisplays inline in Jupyter).Was the intention to only generate the graph when at least one filename is provided, or is the inline Jupyter display the expected fallback behavior?
219-230: LGTM!The
__exit__method correctly dispatches to eitherexport_dependency_graph_functionorplot_dependency_graph_functionbased on the provided filename. The export path correctly uses the new JSON export function whenexport_workflow_filenameis specified.
15-20: LGTM!The import of
export_dependency_graph_functionis correctly added alongside the existing imports from the same module.src/executorlib/task_scheduler/interactive/dependency_plot.py (1)
2-8: Addnumpyto the project dependencies.The code imports
numpyat line 8 for handlingnp.ndarrayserialization in the newexport_dependency_graph_functionfunction. However,numpyis not declared as a dependency inpyproject.toml. Since this function is called fromsrc/executorlib/task_scheduler/interactive/dependency.pyand numpy is not transitively available through the graph optional dependencies, addnumpyto either the main dependencies or the graph optional dependency group.
| pwd_edges_lst.append( | ||
| { | ||
| "target": final_node["id"], | ||
| "targetPort": None, | ||
| "source": max([e["target"] for e in pwd_edges_lst]), | ||
| "sourcePort": None, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle empty edge list to prevent ValueError on max().
If edge_lst is empty, pwd_edges_lst will be empty, and max([e["target"] for e in pwd_edges_lst]) will raise ValueError: max() arg is an empty sequence. This can occur with single-node workflows or graphs without edges.
🔎 Proposed fix
- pwd_edges_lst.append(
- {
- "target": final_node["id"],
- "targetPort": None,
- "source": max([e["target"] for e in pwd_edges_lst]),
- "sourcePort": None,
- }
- )
+ if pwd_edges_lst:
+ pwd_edges_lst.append(
+ {
+ "target": final_node["id"],
+ "targetPort": None,
+ "source": max(e["target"] for e in pwd_edges_lst),
+ "sourcePort": None,
+ }
+ )🤖 Prompt for AI Agents
In src/executorlib/task_scheduler/interactive/dependency_plot.py around lines
293-300, the code calls max() on pwd_edges_lst which raises ValueError when
pwd_edges_lst is empty; update the logic to handle an empty list by computing
source only if pwd_edges_lst has elements (e.g., source = max(e["target"] for e
in pwd_edges_lst)), otherwise set source to a safe fallback (for example
final_node["id"] or None) or skip adding the edge entirely; implement a short
conditional that selects the fallback and then append the dict using that source
to avoid the max() on an empty sequence.
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #882 +/- ##
==========================================
+ Coverage 93.34% 93.41% +0.07%
==========================================
Files 38 38
Lines 1817 1837 +20
==========================================
+ Hits 1696 1716 +20
Misses 121 121 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/executorlib/task_scheduler/interactive/dependency.py (1)
37-38: Docstring contains duplicate attribute name with incorrect type.Line 37 describes
_generate_dependency_graph (bool)and line 38 reuses the same attribute name with type(str). The second entry should be_plot_dependency_graph_filename (str)based on the actual attribute.🔎 Proposed fix
Attributes: _future_hash_dict (Dict[str, Future]): A dictionary mapping task hash to future object. _task_hash_dict (Dict[str, Dict]): A dictionary mapping task hash to task dictionary. _generate_dependency_graph (bool): Whether to generate the dependency graph. - _generate_dependency_graph (str): Name of the file to store the plotted graph in. + _plot_dependency_graph_filename (str): Name of the file to store the plotted graph in. + _export_workflow_filename (str): Name of the file to store the exported workflow graph in.
🧹 Nitpick comments (2)
src/executorlib/task_scheduler/interactive/dependency.py (1)
216-227: Export and plot are mutually exclusive when both filenames are provided.When both
export_workflow_filenameandplot_dependency_graph_filenameare specified, only the export is performed due to the if/else structure. If this is intentional, consider documenting this precedence in the class docstring. Otherwise, consider supporting both operations when both filenames are provided.🔎 Proposed fix to support both operations
if self._generate_dependency_graph: node_lst, edge_lst = generate_nodes_and_edges_for_plotting( task_hash_dict=self._task_hash_dict, future_hash_inverse_dict={ v: k for k, v in self._future_hash_dict.items() }, ) if self._export_workflow_filename is not None: - return export_dependency_graph_function( + export_dependency_graph_function( node_lst=node_lst, edge_lst=edge_lst, file_name=self._export_workflow_filename, ) - else: - return plot_dependency_graph_function( + if self._plot_dependency_graph_filename is not None: + plot_dependency_graph_function( node_lst=node_lst, edge_lst=edge_lst, filename=self._plot_dependency_graph_filename, ) - else: - return None + return Nonetests/test_singlenodeexecutor_pwd.py (1)
8-15: Minor: Trailing whitespace on line 10.Line 10 has trailing whitespace after
return x + y. Pre-commit hooks should catch this, but worth noting.🔎 Proposed fix
def get_sum(x, y): - return x + y - + return x + y + def get_prod_and_div(x, y):
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/executorlib/task_scheduler/interactive/dependency.pytests/test_singlenodeexecutor_pwd.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/executorlib/task_scheduler/interactive/dependency.py (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (2)
export_dependency_graph_function(237-307)plot_dependency_graph_function(206-234)
tests/test_singlenodeexecutor_pwd.py (2)
src/executorlib/executor/single.py (1)
SingleNodeExecutor(20-194)src/executorlib/standalone/select.py (1)
get_item_from_future(42-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: notebooks_integration
🔇 Additional comments (5)
src/executorlib/task_scheduler/interactive/dependency.py (2)
16-16: LGTM!Import addition aligns with the new export functionality.
49-49: LGTM!The new
export_workflow_filenameparameter is correctly added and the initialization logic properly enables dependency graph generation when either filename is provided.Also applies to: 67-71
tests/test_singlenodeexecutor_pwd.py (3)
18-21: LGTM!Proper tearDown implementation to clean up the generated workflow.json file after each test.
23-36: LGTM!The test effectively validates the arithmetic workflow export with chained futures using
get_item_from_future. The assertion thatfuture_result.result()isNonecorrectly reflects the graph generation mode behavior where tasks are recorded but not executed.
38-47: LGTM!Good coverage for NumPy array handling in workflow export. The test verifies that numpy arrays are properly serialized in the workflow graph (converted to lists per
export_dependency_graph_functionimplementation).
As demonstrated in https://github.com/pyiron-dev/executorlib-export-python-workflow-definition/
Example:
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.