Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 2.51 KB

README.md

File metadata and controls

59 lines (46 loc) · 2.51 KB

SLURM_RAY

👉Full documentation

Description

SlurmRay is a module for effortlessly distributing tasks on a Slurm cluster using the Ray library. SlurmRay was initially designed to work with the Curnagl cluster at the University of Lausanne. However, it should be able to run on any Slurm cluster with a minimum of configuration.

Installation

SlurmRay is designed to run both locally and on a cluster without any modification. This design is intended to allow work to be carried out on a local machine until the script seems to be working. It should then be possible to run it using all the resources of the cluster without having to modify the code.

pip install slurmray

Usage

from slurmray.RayLauncher import RayLauncher
import ray
import torch

def function_inside_function():
    with open("slurmray/RayLauncher.py", "r") as f:
        return f.read()[0:10]

def example_func(x):
    result = (
        ray.cluster_resources(),
        f"GPU is available : {torch.cuda.is_available()}",
        x + 1,
        function_inside_function(),
    )
    return result

launcher = RayLauncher(
    project_name="example", # Name of the project (will create a directory with this name in the current directory)
    func=example_func, # Function to execute
    args={"x": 1}, # Arguments of the function
    files=["slurmray/RayLauncher.py"], # List of files to push to the cluster (file path will be recreated on the cluster)
    modules=[], # List of modules to load on the curnagl Cluster (CUDA & CUDNN are automatically added if use_gpu=True)
    node_nbr=1, # Number of nodes to use
    use_gpu=True, # If you need A100 GPU, you can set it to True
    memory=8, # In MegaBytes
    max_running_time=5, # In minutes
    runtime_env={"env_vars": {"NCCL_SOCKET_IFNAME": "eno1"}}, # Example of environment variable
    server_run=True, # To run the code on the cluster and not locally
    server_ssh="curnagl.dcsr.unil.ch", # Address of the SLURM server
    server_username="hjamet", # Username to connect to the server
    server_password=None, # Will be asked in the terminal
)

result = launcher()
print(result)

Launcher documentation

The Launcher documentation is available here.