Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion executorlib/standalone/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ def get_cache_execute_command(
file_name: str,
cores: int = 1,
backend: Optional[str] = None,
exclusive: bool = False,
openmpi_oversubscribe: bool = False,
pmi_mode: Optional[str] = None,
Comment on lines +24 to 26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Avoid breaking API: keep pmi_mode positional slot and make new flags keyword-only

Inserting exclusive and openmpi_oversubscribe before pmi_mode changes the positional argument order and will break callers that passed pmi_mode positionally. Preserve backward compatibility by keeping pmi_mode as the 4th parameter and making the new flags keyword-only.

Apply this diff to the function signature:

-    exclusive: bool = False,
-    openmpi_oversubscribe: bool = False,
-    pmi_mode: Optional[str] = None,
+    pmi_mode: Optional[str] = None,
+    *,
+    exclusive: bool = False,
+    openmpi_oversubscribe: bool = False,

To locate potential breakages (calls with four positional args), run:


🏁 Script executed:

#!/bin/bash
# Find calls using four positional args to get_cache_execute_command (likely broken by the reordering).
ast-grep --pattern $'get_cache_execute_command($_, $_, $_, $_)'

Length of output: 915


Avoid breaking API: keep pmi_mode as the 4th positional argument and make new flags keyword-only

I ran an AST search—every existing call uses keyword arguments, so there’s no internal breakage. However, to preserve the public API for callers who may pass pmi_mode positionally, update the signature in executorlib/standalone/command.py as follows:

@@ def get_cache_execute_command(
-    exclusive: bool = False,
-    openmpi_oversubscribe: bool = False,
-    pmi_mode: Optional[str] = None,
+    pmi_mode: Optional[str] = None,
+    *,
+    exclusive: bool = False,
+    openmpi_oversubscribe: bool = False,

This keeps pmi_mode in its original (4th) slot and forces exclusive and openmpi_oversubscribe to be keyword-only.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
exclusive: bool = False,
openmpi_oversubscribe: bool = False,
pmi_mode: Optional[str] = None,
pmi_mode: Optional[str] = None,
*,
exclusive: bool = False,
openmpi_oversubscribe: bool = False,
🤖 Prompt for AI Agents
In executorlib/standalone/command.py around lines 24 to 26, the function
signature currently lists exclusive, openmpi_oversubscribe, pmi_mode in that
order; update it so pmi_mode remains the 4th positional parameter and make
exclusive and openmpi_oversubscribe keyword-only: reorder the parameters so
pmi_mode: Optional[str] = None appears before exclusive and
openmpi_oversubscribe, and place a bare * before exclusive to force those two to
be keyword-only, keeping types and defaults unchanged.

) -> list:
"""
Expand All @@ -30,7 +32,9 @@ def get_cache_execute_command(
file_name (str): The name of the file.
cores (int, optional): Number of cores used to execute the task. Defaults to 1.
backend (str, optional): name of the backend used to spawn tasks ["slurm", "flux"].
pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None (Flux only)
exclusive (bool): Whether to exclusively reserve the compute nodes, or allow sharing compute notes. Defaults to False.
openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.
pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None
Comment on lines +35 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Docstring fixes: correct typo and reflect backend limitations

  • Typo: “compute notes” → “compute nodes”.
  • Please document that exclusive and openmpi_oversubscribe are not supported with the Flux backend (you raise ValueError for both).

Apply this diff:

-        exclusive (bool): Whether to exclusively reserve the compute nodes, or allow sharing compute notes. Defaults to False.
-        openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.
-        pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None
+        pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix). Default is None.
+        exclusive (bool): Whether to exclusively reserve the compute nodes (no node sharing). Default: False.
+                          Note: Not supported with the "flux" backend.
+        openmpi_oversubscribe (bool): Whether to oversubscribe the cores. Default: False.
+                                      Note: Not supported with the "flux" backend.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
exclusive (bool): Whether to exclusively reserve the compute nodes, or allow sharing compute notes. Defaults to False.
openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.
pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None
pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix). Default is None.
exclusive (bool): Whether to exclusively reserve the compute nodes (no node sharing). Default: False.
Note: Not supported with the "flux" backend.
openmpi_oversubscribe (bool): Whether to oversubscribe the cores. Default: False.
Note: Not supported with the "flux" backend.
🤖 Prompt for AI Agents
In executorlib/standalone/command.py around lines 35 to 37, fix the docstring
typo "compute notes" to "compute nodes" and update the parameter descriptions to
note backend limitations: state that both exclusive and openmpi_oversubscribe
are not supported when using the Flux backend (they will raise ValueError), and
keep the pmi_mode description as-is (default None). Ensure the updated docstring
clearly mentions these Flux backend restrictions for exclusive and
openmpi_oversubscribe.


Returns:
list[str]: List of strings containing the python executable path and the backend script to execute
Expand All @@ -47,6 +51,10 @@ def get_cache_execute_command(
command_prepend = ["srun", "-n", str(cores)]
if pmi_mode is not None:
command_prepend += ["--mpi=" + pmi_mode]
if openmpi_oversubscribe:
command_prepend += ["--oversubscribe"]
if exclusive:
command_prepend += ["--exact"]
Comment on lines +56 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Wrong Slurm flag for exclusive allocation — use --exclusive, not --exact

--exact controls task placement semantics, not node exclusivity. For exclusive node allocation with srun, the correct flag is --exclusive.

Apply this diff:

-            if exclusive:
-                command_prepend += ["--exact"]
+            if exclusive:
+                command_prepend += ["--exclusive"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if exclusive:
command_prepend += ["--exact"]
if exclusive:
command_prepend += ["--exclusive"]
🤖 Prompt for AI Agents
In executorlib/standalone/command.py around lines 56-57, the code appends the
wrong Slurm flag "--exact" when requesting an exclusive allocation; change the
appended flag to "--exclusive" so that command_prepend uses the correct srun
option for node exclusivity (i.e., replace "--exact" with "--exclusive" in the
exclusive branch).

command_lst = (
command_prepend
+ command_lst
Expand All @@ -56,6 +64,14 @@ def get_cache_execute_command(
flux_command = ["flux", "run"]
if pmi_mode is not None:
flux_command += ["-o", "pmi=" + pmi_mode]
if openmpi_oversubscribe:
raise ValueError(
"The option openmpi_oversubscribe is not available with the flux backend."
)
if exclusive:
raise ValueError(
"The option exclusive is not available with the flux backend."
)
command_lst = (
flux_command
+ ["-n", str(cores)]
Expand Down
4 changes: 4 additions & 0 deletions executorlib/task_scheduler/file/shared.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,10 @@ def execute_tasks_h5(
file_name=file_name,
cores=task_resource_dict["cores"],
backend=backend,
exclusive=task_resource_dict.get("exclusive", False),
openmpi_oversubscribe=task_resource_dict.get(
"openmpi_oversubscribe", False
),
pmi_mode=pmi_mode,
),
file_name=file_name,
Expand Down
2 changes: 2 additions & 0 deletions executorlib/task_scheduler/file/task_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ def __init__(
"cores": 1,
"cwd": None,
"cache_directory": "executorlib_cache",
"exclusive": False,
"openmpi_oversubscribe": False,
}
if resource_dict is None:
resource_dict = {}
Expand Down
14 changes: 10 additions & 4 deletions tests/test_standalone_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,16 @@ def test_get_cache_execute_command_parallel(self):
self.assertEqual(output[3], sys.executable)
self.assertEqual(output[4].split(os.sep)[-1], "cache_parallel.py")
self.assertEqual(output[5], file_name)
output = get_cache_execute_command(cores=2, file_name=file_name, backend="slurm", pmi_mode="pmi2")
output = get_cache_execute_command(cores=2, file_name=file_name, backend="slurm", pmi_mode="pmi2", openmpi_oversubscribe=True, exclusive=True)
self.assertEqual(output[0], "srun")
self.assertEqual(output[1], "-n")
self.assertEqual(output[2], str(2))
self.assertEqual(output[3], "--mpi=pmi2")
self.assertEqual(output[4], sys.executable)
self.assertEqual(output[5].split(os.sep)[-1], "cache_parallel.py")
self.assertEqual(output[6], file_name)
self.assertEqual(output[4], "--oversubscribe")
self.assertEqual(output[5], "--exact")
self.assertEqual(output[6], sys.executable)
self.assertEqual(output[7].split(os.sep)[-1], "cache_parallel.py")
self.assertEqual(output[8], file_name)
Comment on lines +54 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm intent: exclusive currently asserts --exact, which likely should be Slurm's --exclusive.

Docstring semantics (“exclusively reserve the compute nodes”) map to Slurm’s --exclusive, not --exact. --exact has different semantics and does not enforce exclusive node allocation. If the intent is exclusive node allocation, both the implementation and the test should use --exclusive. If the intent is “exact task placement,” then the parameter name and docstring should be updated accordingly to avoid confusion.

Apply this test-only diff if exclusive nodes are intended:

-        self.assertEqual(output[5], "--exact")
+        self.assertEqual(output[5], "--exclusive")

And update the Slurm path in executorlib/standalone/command.py accordingly:

# Replace the --exact flag with --exclusive for 'exclusive=True'
if exclusive:
    command_prepend += ["--exclusive"]

To verify the correct Slurm flag for exclusive node allocation:


🌐 Web query:

In Slurm's srun, which option enforces exclusive node allocation: --exclusive or --exact? Also, what is the meaning of --exact?

💡 Result:

The correct option in Slurm's srun to enforce exclusive node allocation is --exclusive. The --exact option does something notably different.

Option purposes:

  • --exclusive (for srun, sbatch, or salloc): This option requests that the allocation consist of whole, exclusive nodes for the job or job step, meaning no other jobs can run on the same nodes during the allocation period—even if your job doesn't use all CPUs or resources available on those nodes[1][4]. This ensures your job has access to all resources of the assigned nodes and prevents sharing them with other users/jobs during your job's run.

  • --exact (for srun only): This option restricts each job step to exactly the amount of resources requested, allocating only the cores specified per job step[2][9][10]. Without --exact, job steps may get as many resources as are still available in the job allocation, which may result in one step grabbing all resources and later steps being blocked[2][10]. With --exact, each step receives precisely the number of CPUs/tasks you request, no more, no less; this is particularly used for fine control when launching multiple simultaneous steps within a single allocation.

Summary Table

Option Enforces exclusive node allocation? Purpose/Effect
--exclusive Yes Allocates whole nodes exclusively to your job; no sharing with other jobs.
--exact No Ensures each srun step gets exactly the requested resources (e.g., 1 CPU per step), no more.

References for option meanings:

  • [--exclusive documentation][1][4]
  • [--exact explanation][2][9][10]

In summary:
Use --exclusive for exclusive node allocations. The --exact option is used for controlling resource distribution for job steps within an already-allocated job, not for exclusive node allocation.

Citations:


Replace --exact with Slurm’s --exclusive for true exclusive node allocation

Slurm’s --exclusive enforces whole-node exclusivity; --exact only limits per-step resource counts. To align intent and documentation, update both test and implementation:

• tests/test_standalone_command.py (around line 60):

-        self.assertEqual(output[5], "--exact")
+        self.assertEqual(output[5], "--exclusive")

• executorlib/standalone/command.py (where exclusive=True is handled):

-    if exclusive:
-        command_prepend += ["--exact"]
+    if exclusive:
+        command_prepend += ["--exclusive"]
🤖 Prompt for AI Agents
In tests/test_standalone_command.py around lines 54 to 63 and
executorlib/standalone/command.py where exclusive=True is handled, the test and
implementation currently add Slurm’s --exact but should use --exclusive to
enforce whole-node exclusivity; update the implementation to emit "--exclusive"
instead of "--exact" when exclusive=True and update the test expectation to
assert "--exclusive" at the same index (replace the assertion for "--exact" with
"--exclusive"), keeping all other arguments and ordering unchanged.

output = get_cache_execute_command(cores=2, file_name=file_name, backend="slurm")
self.assertEqual(output[0], "srun")
self.assertEqual(output[1], "-n")
Expand Down Expand Up @@ -86,3 +88,7 @@ def test_get_cache_execute_command_parallel(self):
self.assertEqual(output[8], file_name)
with self.assertRaises(ValueError):
get_cache_execute_command(cores=2, file_name=file_name, backend="test")
with self.assertRaises(ValueError):
get_cache_execute_command(cores=2, file_name=file_name, backend="flux", openmpi_oversubscribe=True)
with self.assertRaises(ValueError):
get_cache_execute_command(cores=2, file_name=file_name, backend="flux", exclusive=True)
Loading