-
Notifications
You must be signed in to change notification settings - Fork 3
Extend the features supported by get_cache_execute_command() #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds two boolean options (exclusive, openmpi_oversubscribe) to get_cache_execute_command in executorlib/standalone/command.py, updates the docstring, includes --exact and --oversubscribe in the srun command when set, and raises ValueError for these options with the Flux backend. Task scheduler components propagate and default these flags from resource dicts. Tests updated to exercise valid Slurm usage and invalid Flux usage. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Cmd as get_cache_execute_command
participant Scheduler as FileTaskScheduler / shared
Caller->>Scheduler: submit task (resource_dict may include flags)
Scheduler->>Cmd: file_name, cores, backend, exclusive?, openmpi_oversubscribe?, pmi_mode?
alt backend == "slurm"
note right of Cmd #e6f9ea: Build srun command\n(include --oversubscribe / --exact when True)
Cmd-->>Scheduler: ["srun","-n",...,"--mpi=...","--oversubscribe"?, "--exact"?, sys.executable, cache_parallel.py, file_name]
else backend == "flux"
note right of Cmd #fff0f0: Validate options\n(reject exclusive/openmpi_oversubscribe)
Cmd-->>Scheduler: raise ValueError
else
note right of Cmd #f0f6ff: Existing paths\n(mpiexec / mpi4py / default)
Cmd-->>Scheduler: Construct command (pmi_mode optional)
end
Scheduler-->>Caller: execute or propagate error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #783 +/- ##
==========================================
+ Coverage 97.66% 97.67% +0.01%
==========================================
Files 33 33
Lines 1455 1463 +8
==========================================
+ Hits 1421 1429 +8
Misses 34 34 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
executorlib/standalone/command.py (2)
54-55: “OpenMPI oversubscribe” viasrun --oversubscribeis likely not what you intend
openmpi_oversubscribesuggests the Open MPI runtime flag.srun --oversubscribeis a Slurm scheduling option (node sharing), not the Open MPI oversubscription behavior. If you intend Open MPI oversubscription, prefer exportingOMPI_MCA_rmaps_base_oversubscribe=1for thesrunpath. Also consider honoring this flag for thempiexecpath (Open MPI’s--oversubscribe).Option A (recommended for Slurm/Open MPI): set the MCA var for Open MPI:
- if openmpi_oversubscribe: - command_prepend += ["--oversubscribe"] + if openmpi_oversubscribe: + # Ensure Open MPI allows oversubscription under Slurm + command_prepend += ["--export=ALL,OMPI_MCA_rmaps_base_oversubscribe=1"]Option B (if you truly want Slurm node sharing semantics): rename the flag or add a separate flag (e.g.,
slurm_allow_node_sharing) to avoid semantic confusion.Additionally, for the non-Slurm, mpiexec-based path, consider appending
--oversubscribewhenopenmpi_oversubscribe=Trueand Open MPI is detected.To verify whether
--oversubscribeis an Open MPI vs. Slurm option across your clusters and whetherOMPI_MCA_rmaps_base_oversubscribeis honored, please confirm with your scheduler docs or cluster modules.
67-74: Flux path: error handling OK; consider documenting the limitation and future-proofingRaising
ValueErrorforexclusiveandopenmpi_oversubscribeunder Flux is consistent, but please ensure the docstring reflects these limitations (see earlier comment). If Flux acquires equivalents (e.g., a way to request exclusive resources), a future extension point could be to map the flags instead of rejecting them.I can add a short section to the README/docs explaining the backend-specific support matrix for these flags if helpful.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
executorlib/standalone/command.py(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_mpich (macos-latest, 3.13)
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
- GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
- GitHub Check: unittest_flux_openmpi
- GitHub Check: unittest_win
- GitHub Check: unittest_slurm_mpich
- GitHub Check: notebooks_integration
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
| exclusive: bool = False, | ||
| openmpi_oversubscribe: bool = False, | ||
| pmi_mode: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Avoid breaking API: keep pmi_mode positional slot and make new flags keyword-only
Inserting exclusive and openmpi_oversubscribe before pmi_mode changes the positional argument order and will break callers that passed pmi_mode positionally. Preserve backward compatibility by keeping pmi_mode as the 4th parameter and making the new flags keyword-only.
Apply this diff to the function signature:
- exclusive: bool = False,
- openmpi_oversubscribe: bool = False,
- pmi_mode: Optional[str] = None,
+ pmi_mode: Optional[str] = None,
+ *,
+ exclusive: bool = False,
+ openmpi_oversubscribe: bool = False,To locate potential breakages (calls with four positional args), run:
🏁 Script executed:
#!/bin/bash
# Find calls using four positional args to get_cache_execute_command (likely broken by the reordering).
ast-grep --pattern $'get_cache_execute_command($_, $_, $_, $_)'Length of output: 915
Avoid breaking API: keep pmi_mode as the 4th positional argument and make new flags keyword-only
I ran an AST search—every existing call uses keyword arguments, so there’s no internal breakage. However, to preserve the public API for callers who may pass pmi_mode positionally, update the signature in executorlib/standalone/command.py as follows:
@@ def get_cache_execute_command(
- exclusive: bool = False,
- openmpi_oversubscribe: bool = False,
- pmi_mode: Optional[str] = None,
+ pmi_mode: Optional[str] = None,
+ *,
+ exclusive: bool = False,
+ openmpi_oversubscribe: bool = False,This keeps pmi_mode in its original (4th) slot and forces exclusive and openmpi_oversubscribe to be keyword-only.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| exclusive: bool = False, | |
| openmpi_oversubscribe: bool = False, | |
| pmi_mode: Optional[str] = None, | |
| pmi_mode: Optional[str] = None, | |
| *, | |
| exclusive: bool = False, | |
| openmpi_oversubscribe: bool = False, |
🤖 Prompt for AI Agents
In executorlib/standalone/command.py around lines 24 to 26, the function
signature currently lists exclusive, openmpi_oversubscribe, pmi_mode in that
order; update it so pmi_mode remains the 4th positional parameter and make
exclusive and openmpi_oversubscribe keyword-only: reorder the parameters so
pmi_mode: Optional[str] = None appears before exclusive and
openmpi_oversubscribe, and place a bare * before exclusive to force those two to
be keyword-only, keeping types and defaults unchanged.
| exclusive (bool): Whether to exclusively reserve the compute nodes, or allow sharing compute notes. Defaults to False. | ||
| openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False. | ||
| pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring fixes: correct typo and reflect backend limitations
- Typo: “compute notes” → “compute nodes”.
- Please document that
exclusiveandopenmpi_oversubscribeare not supported with the Flux backend (you raise ValueError for both).
Apply this diff:
- exclusive (bool): Whether to exclusively reserve the compute nodes, or allow sharing compute notes. Defaults to False.
- openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.
- pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None
+ pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix). Default is None.
+ exclusive (bool): Whether to exclusively reserve the compute nodes (no node sharing). Default: False.
+ Note: Not supported with the "flux" backend.
+ openmpi_oversubscribe (bool): Whether to oversubscribe the cores. Default: False.
+ Note: Not supported with the "flux" backend.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| exclusive (bool): Whether to exclusively reserve the compute nodes, or allow sharing compute notes. Defaults to False. | |
| openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False. | |
| pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None | |
| pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix). Default is None. | |
| exclusive (bool): Whether to exclusively reserve the compute nodes (no node sharing). Default: False. | |
| Note: Not supported with the "flux" backend. | |
| openmpi_oversubscribe (bool): Whether to oversubscribe the cores. Default: False. | |
| Note: Not supported with the "flux" backend. |
🤖 Prompt for AI Agents
In executorlib/standalone/command.py around lines 35 to 37, fix the docstring
typo "compute notes" to "compute nodes" and update the parameter descriptions to
note backend limitations: state that both exclusive and openmpi_oversubscribe
are not supported when using the Flux backend (they will raise ValueError), and
keep the pmi_mode description as-is (default None). Ensure the updated docstring
clearly mentions these Flux backend restrictions for exclusive and
openmpi_oversubscribe.
| if exclusive: | ||
| command_prepend += ["--exact"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong Slurm flag for exclusive allocation — use --exclusive, not --exact
--exact controls task placement semantics, not node exclusivity. For exclusive node allocation with srun, the correct flag is --exclusive.
Apply this diff:
- if exclusive:
- command_prepend += ["--exact"]
+ if exclusive:
+ command_prepend += ["--exclusive"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if exclusive: | |
| command_prepend += ["--exact"] | |
| if exclusive: | |
| command_prepend += ["--exclusive"] |
🤖 Prompt for AI Agents
In executorlib/standalone/command.py around lines 56-57, the code appends the
wrong Slurm flag "--exact" when requesting an exclusive allocation; change the
appended flag to "--exclusive" so that command_prepend uses the correct srun
option for node exclusivity (i.e., replace "--exact" with "--exclusive" in the
exclusive branch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_standalone_command.py (2)
54-63: Reduce assertion noise: compare the fixed prefix as a list.Index-by-index checks are verbose and harder to maintain. Assert the fixed prefix in one shot, then keep the path-based checks for the final elements.
- self.assertEqual(output[0], "srun") - self.assertEqual(output[1], "-n") - self.assertEqual(output[2], str(2)) - self.assertEqual(output[3], "--mpi=pmi2") - self.assertEqual(output[4], "--oversubscribe") - self.assertEqual(output[5], "--exact") + # Assert fixed prefix in one go (adjust "--exact" to "--exclusive" if that is the intended behavior) + self.assertEqual( + output[:6], + ["srun", "-n", str(2), "--mpi=pmi2", "--oversubscribe", "--exact"], + ) self.assertEqual(output[6], sys.executable) self.assertEqual(output[7].split(os.sep)[-1], "cache_parallel.py") self.assertEqual(output[8], file_name)
91-94: Also assert exception messages for Flux to lock in UX and avoid silent regressions.Validating the message makes the tests stricter and ensures users get actionable errors.
- with self.assertRaises(ValueError): - get_cache_execute_command(cores=2, file_name=file_name, backend="flux", openmpi_oversubscribe=True) - with self.assertRaises(ValueError): - get_cache_execute_command(cores=2, file_name=file_name, backend="flux", exclusive=True) + with self.assertRaisesRegex(ValueError, r"openmpi_oversubscribe.*flux backend"): + get_cache_execute_command(cores=2, file_name=file_name, backend="flux", openmpi_oversubscribe=True) + with self.assertRaisesRegex(ValueError, r"exclusive.*flux backend"): + get_cache_execute_command(cores=2, file_name=file_name, backend="flux", exclusive=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/test_standalone_command.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_standalone_command.py (1)
executorlib/standalone/command.py (1)
get_cache_execute_command(20-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: unittest_old
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
- GitHub Check: unittest_flux_mpich
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
- GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
- GitHub Check: unittest_slurm_mpich
- GitHub Check: unittest_win
- GitHub Check: unittest_flux_openmpi
- GitHub Check: notebooks_integration
🔇 Additional comments (1)
tests/test_standalone_command.py (1)
54-63: No backward-incompatible positional calls found
An AST-based scan of the entire repo confirmed there are no calls toget_cache_execute_commandwith more than two positional arguments. All existing usages rely on keyword arguments for newer parameters—no further action needed.
| output = get_cache_execute_command(cores=2, file_name=file_name, backend="slurm", pmi_mode="pmi2", openmpi_oversubscribe=True, exclusive=True) | ||
| self.assertEqual(output[0], "srun") | ||
| self.assertEqual(output[1], "-n") | ||
| self.assertEqual(output[2], str(2)) | ||
| self.assertEqual(output[3], "--mpi=pmi2") | ||
| self.assertEqual(output[4], sys.executable) | ||
| self.assertEqual(output[5].split(os.sep)[-1], "cache_parallel.py") | ||
| self.assertEqual(output[6], file_name) | ||
| self.assertEqual(output[4], "--oversubscribe") | ||
| self.assertEqual(output[5], "--exact") | ||
| self.assertEqual(output[6], sys.executable) | ||
| self.assertEqual(output[7].split(os.sep)[-1], "cache_parallel.py") | ||
| self.assertEqual(output[8], file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm intent: exclusive currently asserts --exact, which likely should be Slurm's --exclusive.
Docstring semantics (“exclusively reserve the compute nodes”) map to Slurm’s --exclusive, not --exact. --exact has different semantics and does not enforce exclusive node allocation. If the intent is exclusive node allocation, both the implementation and the test should use --exclusive. If the intent is “exact task placement,” then the parameter name and docstring should be updated accordingly to avoid confusion.
Apply this test-only diff if exclusive nodes are intended:
- self.assertEqual(output[5], "--exact")
+ self.assertEqual(output[5], "--exclusive")And update the Slurm path in executorlib/standalone/command.py accordingly:
# Replace the --exact flag with --exclusive for 'exclusive=True'
if exclusive:
command_prepend += ["--exclusive"]To verify the correct Slurm flag for exclusive node allocation:
🌐 Web query:
In Slurm's srun, which option enforces exclusive node allocation: --exclusive or --exact? Also, what is the meaning of --exact?
💡 Result:
The correct option in Slurm's srun to enforce exclusive node allocation is --exclusive. The --exact option does something notably different.
Option purposes:
-
--exclusive(forsrun,sbatch, orsalloc): This option requests that the allocation consist of whole, exclusive nodes for the job or job step, meaning no other jobs can run on the same nodes during the allocation period—even if your job doesn't use all CPUs or resources available on those nodes[1][4]. This ensures your job has access to all resources of the assigned nodes and prevents sharing them with other users/jobs during your job's run. -
--exact(forsrunonly): This option restricts each job step to exactly the amount of resources requested, allocating only the cores specified per job step[2][9][10]. Without--exact, job steps may get as many resources as are still available in the job allocation, which may result in one step grabbing all resources and later steps being blocked[2][10]. With--exact, each step receives precisely the number of CPUs/tasks you request, no more, no less; this is particularly used for fine control when launching multiple simultaneous steps within a single allocation.
Summary Table
| Option | Enforces exclusive node allocation? | Purpose/Effect |
|---|---|---|
--exclusive |
Yes | Allocates whole nodes exclusively to your job; no sharing with other jobs. |
--exact |
No | Ensures each srun step gets exactly the requested resources (e.g., 1 CPU per step), no more. |
References for option meanings:
- [--exclusive documentation][1][4]
- [--exact explanation][2][9][10]
In summary:
Use --exclusive for exclusive node allocations. The --exact option is used for controlling resource distribution for job steps within an already-allocated job, not for exclusive node allocation.
Citations:
- 1: https://slurm.schedmd.com/cons_tres.html
- 2: https://pgi-jcns.fz-juelich.de/portal/pages/slurm.html
- 3: https://cvw.cac.cornell.edu/slurm/basics/allocation
- 4: https://pawsey.atlassian.net/wiki/display/US/SLURM+upgrade+to+version+22.05.8
- 5: https://portal.cac.cornell.edu/techdocs/clusterinfo/slurm/
- 6: https://manpages.ubuntu.com/manpages/jammy/man1/srun.1.html
- 7: https://www.docs.arc.vt.edu/usage/more-slurm.html
- 8: https://docs.lxp.lu/first-steps/handling_jobs/
- 9: https://slurm.schedmd.com/srun.html
- 10: https://groups.oist.jp/scs/advanced-slurm
Replace --exact with Slurm’s --exclusive for true exclusive node allocation
Slurm’s --exclusive enforces whole-node exclusivity; --exact only limits per-step resource counts. To align intent and documentation, update both test and implementation:
• tests/test_standalone_command.py (around line 60):
- self.assertEqual(output[5], "--exact")
+ self.assertEqual(output[5], "--exclusive")• executorlib/standalone/command.py (where exclusive=True is handled):
- if exclusive:
- command_prepend += ["--exact"]
+ if exclusive:
+ command_prepend += ["--exclusive"]🤖 Prompt for AI Agents
In tests/test_standalone_command.py around lines 54 to 63 and
executorlib/standalone/command.py where exclusive=True is handled, the test and
implementation currently add Slurm’s --exact but should use --exclusive to
enforce whole-node exclusivity; update the implementation to emit "--exclusive"
instead of "--exact" when exclusive=True and update the test expectation to
assert "--exclusive" at the same index (replace the assertion for "--exact" with
"--exclusive"), keeping all other arguments and ordering unchanged.
for more information, see https://pre-commit.ci
Summary by CodeRabbit
New Features
Documentation
Tests