Skip to content

Commit 343bd98

Browse files
committed
Update readme
1 parent 6adbf5f commit 343bd98

File tree

2 files changed

+58
-41
lines changed

2 files changed

+58
-41
lines changed

README.md

Lines changed: 41 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,12 @@ with the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futu
2323
[ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) for parallel
2424
execution of Python functions on a single computer. executorlib extends this functionality to distribute Python
2525
functions over multiple computers within a high performance computing (HPC) cluster. This can be either achieved by
26-
submitting each function as individual job to the HPC job scheduler - [HPC Submission Mode]() - or by requesting a
27-
compute allocation of multiple nodes and then distribute the Python functions within this allocation - [HPC Allocation Mode]().
28-
Finally, to accelerate the development process executorlib also provides a - [Local Mode]() - to use the executorlib
29-
functionality on a single workstation for testing. Starting with the [Local Mode]() set by setting the backend parameter
30-
to local - `backend="local"`:
26+
submitting each function as individual job to the HPC job scheduler - [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html) -
27+
or by requesting a compute allocation of multiple nodes and then distribute the Python functions within this - allocation -
28+
[HPC Allocation Mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html). Finally, to accelerate the
29+
development process executorlib also provides a - [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) -
30+
to use the executorlib functionality on a single workstation for testing. Starting with the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html)
31+
set by setting the backend parameter to local - `backend="local"`:
3132
```python
3233
from executorlib import Executor
3334

@@ -60,8 +61,7 @@ Python function. In addition to the compute cores `cores`, the resource dictiona
6061
as `threads_per_core`, the GPUs per core as `gpus_per_core`, the working directory with `cwd`, the option to use the
6162
OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the [Simple Linux Utility for Resource
6263
Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments
63-
with the `slurm_cmd_args` parameter - [resource dictionary]().
64-
64+
with the `slurm_cmd_args` parameter - [resource dictionary](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary
6565
This flexibility to assign computing resources on a per-function-call basis simplifies the up-scaling of Python programs.
6666
Only the part of the Python functions which benefit from parallel execution are implemented as MPI parallel Python
6767
funtions, while the rest of the program remains serial.
@@ -87,7 +87,7 @@ with Executor(backend="slurm_submission") as exe:
8787
```
8888
In this case the [Python simple queuing system adapter (pysqa)](https://pysqa.readthedocs.io) is used to submit the
8989
`calc()` function to the [SLURM](https://slurm.schedmd.com) job scheduler and request an allocation with two CPU cores
90-
for the execution of the function - [HPC Submission Mode](). In the background the [sbatch](https://slurm.schedmd.com/sbatch.html)
90+
for the execution of the function - [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html). In the background the [sbatch](https://slurm.schedmd.com/sbatch.html)
9191
command is used to request the allocation to execute the Python function.
9292

9393
Within a given [SLURM](https://slurm.schedmd.com) allocation executorlib can also be used to assign a subset of the
@@ -116,23 +116,39 @@ In addition, to support for [SLURM](https://slurm.schedmd.com) executorlib also
116116
to address the needs for the up-coming generation of Exascale computers. Still even on traditional HPC clusters the
117117
hierarchical approach of the [flux](http://flux-framework.org) is beneficial to distribute hundreds of tasks within a
118118
given allocation. Even when [SLURM](https://slurm.schedmd.com) is used as primary job scheduler of your HPC, it is
119-
recommended to use [SLURM with flux]() as hierarchical job scheduler within the allocations.
119+
recommended to use [SLURM with flux](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html#slurm-with-flux)
120+
as hierarchical job scheduler within the allocations.
120121

121122
## Documentation
122123
* [Installation](https://executorlib.readthedocs.io/en/latest/installation.html)
123-
* [Compatible Job Schedulers](https://executorlib.readthedocs.io/en/latest/installation.html#compatible-job-schedulers)
124-
* [executorlib with Flux Framework](https://executorlib.readthedocs.io/en/latest/installation.html#executorlib-with-flux-framework)
125-
* [Test Flux Framework](https://executorlib.readthedocs.io/en/latest/installation.html#test-flux-framework)
126-
* [Without Flux Framework](https://executorlib.readthedocs.io/en/latest/installation.html#without-flux-framework)
127-
* [Examples](https://executorlib.readthedocs.io/en/latest/examples.html)
128-
* [Compatibility](https://executorlib.readthedocs.io/en/latest/examples.html#compatibility)
129-
* [Resource Assignment](https://executorlib.readthedocs.io/en/latest/examples.html#resource-assignment)
130-
* [Data Handling](https://executorlib.readthedocs.io/en/latest/examples.html#data-handling)
131-
* [Up-Scaling](https://executorlib.readthedocs.io/en/latest/examples.html#up-scaling)
132-
* [Coupled Functions](https://executorlib.readthedocs.io/en/latest/examples.html#coupled-functions)
133-
* [SLURM Job Scheduler](https://executorlib.readthedocs.io/en/latest/examples.html#slurm-job-scheduler)
134-
* [Workstation Support](https://executorlib.readthedocs.io/en/latest/examples.html#workstation-support)
135-
* [Development](https://executorlib.readthedocs.io/en/latest/development.html)
136-
* [Contributions](https://executorlib.readthedocs.io/en/latest/development.html#contributions)
137-
* [License](https://executorlib.readthedocs.io/en/latest/development.html#license)
138-
* [Integration](https://executorlib.readthedocs.io/en/latest/development.html#integration)
124+
* [Minimal](https://executorlib.readthedocs.io/en/latest/installation.html#minimal)
125+
* [MPI Support](https://executorlib.readthedocs.io/en/latest/installation.html#mpi-support)
126+
* [Caching](https://executorlib.readthedocs.io/en/latest/installation.html#caching)
127+
* [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/installation.html#hpc-submission-mode)
128+
* [HPC Allocation Mode](https://executorlib.readthedocs.io/en/latest/installation.html#hpc-allocation-mode)
129+
* [Visualisation](https://executorlib.readthedocs.io/en/latest/installation.html#visualisation)
130+
* [For Developers](https://executorlib.readthedocs.io/en/latest/installation.html#for-developers)
131+
* [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html)
132+
* [Basic Functionality](https://executorlib.readthedocs.io/en/latest/1-local.html#basic-functionality)
133+
* [Parallel Functions](https://executorlib.readthedocs.io/en/latest/1-local.html#parallel-functions)
134+
* [Performance Optimization](https://executorlib.readthedocs.io/en/latest/1-local.html#performance-optimization)
135+
* [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html)
136+
* [SLURM](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html#slurm)
137+
* [Flux](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html#flux)
138+
* [HPC Allocation Mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html)
139+
* [SLURM](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html#slurm)
140+
* [SLURM with Flux](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html#slurm-with-flux)
141+
* [Flux](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html#flux)
142+
* [Trouble Shooting](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html)
143+
* [Filesystem Usage](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#filesystem-usage)
144+
* [Firewall Issues](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#firewall-issues)
145+
* [Message Passing Interface](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#message-passing-interface)
146+
* [Python Version](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#python-version)
147+
* [Resource Dictionary](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary)
148+
* [SSH Connection](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#ssh-connection)
149+
* [Developer](https://executorlib.readthedocs.io/en/latest/4-developer.html)
150+
* [Communication](https://executorlib.readthedocs.io/en/latest/4-developer.html#communication)
151+
* [External Executables](https://executorlib.readthedocs.io/en/latest/4-developer.html#external-executables)
152+
* [License](https://executorlib.readthedocs.io/en/latest/4-developer.html#license)
153+
* [Modules](https://executorlib.readthedocs.io/en/latest/4-developer.html#modules)
154+
* [Interface](https://executorlib.readthedocs.io/en/latest/api.html)

docs/installation.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,13 @@ used. The mpi4py documentation covers the [installation of mpi4py](https://mpi4p
3333
in more detail.
3434

3535
## Caching
36-
While the caching is an optional feature for [Local Mode] and for the distribution of Python functions in a given
37-
allocation of an HPC job scheduler [HPC Allocation Mode], it is required for the submission of individual functions to
38-
an HPC job scheduler [HPC Submission Mode]. This is required as in [HPC Submission Mode] the Python function is stored
39-
on the file system until the requested computing resources become available. The caching is implemented based on the
40-
hierarchical data format (HDF5). The corresponding [h5py](https://www.h5py.org) package can be installed using either
41-
the [Python package manager](https://pypi.org/project/h5py/):
36+
While the caching is an optional feature for [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) and
37+
for the distribution of Python functions in a given allocation of an HPC job scheduler [HPC Allocation Mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html),
38+
it is required for the submission of individual functions to an HPC job scheduler [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html).
39+
This is required as in [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html) the
40+
Python function is stored on the file system until the requested computing resources become available. The caching is
41+
implemented based on the hierarchical data format (HDF5). The corresponding [h5py](https://www.h5py.org) package can be
42+
installed using either the [Python package manager](https://pypi.org/project/h5py/):
4243
```
4344
pip install executorlib[cache]
4445
```
@@ -67,17 +68,17 @@ documentation covers the [installation of pysqa](https://pysqa.readthedocs.io/en
6768
detail.
6869

6970
## HPC Allocation Mode
70-
For optimal performance in [HPC Allocation Mode] the [flux framework](https://flux-framework.org) is recommended as job
71-
scheduler. Even when the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) or any other
72-
job scheduler is already installed on the HPC cluster. [flux framework](https://flux-framework.org) can be installed as
73-
a secondary job scheduler to leverage [flux framework](https://flux-framework.org) for the distribution of resources
74-
within a given allocation of the primary scheduler.
71+
For optimal performance in [HPC Allocation Mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html) the
72+
[flux framework](https://flux-framework.org) is recommended as job scheduler. Even when the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com)
73+
or any other job scheduler is already installed on the HPC cluster. [flux framework](https://flux-framework.org) can be
74+
installed as a secondary job scheduler to leverage [flux framework](https://flux-framework.org) for the distribution of
75+
resources within a given allocation of the primary scheduler.
7576

76-
The [flux framework](https://flux-framework.org) uses `libhwloc` and `pmi` to understand the hardware it is running on and to booststrap MPI.
77-
`libhwloc` not only assigns CPU cores but also GPUs. This requires `libhwloc` to be compiled with support for GPUs from
78-
your vendor. In the same way the version of `pmi` for your queuing system has to be compatible with the version
79-
installed via conda. As `pmi` is typically distributed with the implementation of the Message Passing Interface (MPI),
80-
it is required to install the compatible MPI library in your conda environment as well.
77+
The [flux framework](https://flux-framework.org) uses `libhwloc` and `pmi` to understand the hardware it is running on
78+
and to booststrap MPI. `libhwloc` not only assigns CPU cores but also GPUs. This requires `libhwloc` to be compiled with
79+
support for GPUs from your vendor. In the same way the version of `pmi` for your queuing system has to be compatible
80+
with the version installed via conda. As `pmi` is typically distributed with the implementation of the Message Passing
81+
Interface (MPI), it is required to install the compatible MPI library in your conda environment as well.
8182

8283
### AMD GPUs with mpich / cray mpi
8384
For example the [Frontier HPC](https://www.olcf.ornl.gov/frontier/) cluster at Oak Ridge National Laboratory uses

0 commit comments

Comments
 (0)