Skip to content

SlurmRay is a module for effortlessly distributing tasks on a Slurm cluster using the Ray library.

License

Notifications You must be signed in to change notification settings

hjamet/SLURM_RAY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SLURM_RAY

👉Full documentation

Description

SlurmRay is a module for effortlessly distributing tasks on a Slurm cluster using the Ray library. SlurmRay was initially designed to work with the Curnagl cluster at the University of Lausanne. However, it should be able to run on any Slurm cluster with a minimum of configuration.

Installation

SlurmRay is designed to run both locally and on a cluster without any modification. This design is intended to allow work to be carried out on a local machine until the script seems to be working. It should then be possible to run it using all the resources of the cluster without having to modify the code.

pip install slurmray

Usage

from slurmray.RayLauncher import RayLauncher
import ray
import torch

def function_inside_function():
    with open("slurmray/RayLauncher.py", "r") as f:
        return f.read()[0:10]

def example_func(x):
    result = (
        ray.cluster_resources(),
        f"GPU is available : {torch.cuda.is_available()}",
        x + 1,
        function_inside_function(),
    )
    return result

launcher = RayLauncher(
    project_name="example", # Name of the project (will create a directory with this name in the current directory)
    func=example_func, # Function to execute
    args={"x": 1}, # Arguments of the function
    files=["slurmray/RayLauncher.py"], # List of files to push to the cluster (file path will be recreated on the cluster)
    modules=[], # List of modules to load on the curnagl Cluster (CUDA & CUDNN are automatically added if use_gpu=True)
    node_nbr=1, # Number of nodes to use
    use_gpu=True, # If you need A100 GPU, you can set it to True
    memory=8, # In MegaBytes
    max_running_time=5, # In minutes
    runtime_env={"env_vars": {"NCCL_SOCKET_IFNAME": "eno1"}}, # Example of environment variable
    server_run=True, # To run the code on the cluster and not locally
    server_ssh="curnagl.dcsr.unil.ch", # Address of the SLURM server
    server_username="hjamet", # Username to connect to the server
    server_password=None, # Will be asked in the terminal
)

result = launcher()
print(result)

Launcher documentation

The Launcher documentation is available here.

About

SlurmRay is a module for effortlessly distributing tasks on a Slurm cluster using the Ray library.

Resources

License

Stars

Watchers

Forks

Packages

No packages published