|
| 1 | +# S3 compatible storage with Ray Train examples |
| 2 | +Some of our distributed training examples require an external storage solution so that all nodes can access the same data. <br> |
| 3 | +The following are examples for configuring S3 or Minio storage for your Ray Train script or interactive session. |
| 4 | + |
| 5 | +## S3 Bucket |
| 6 | +In your Python Script add the following environment variables: |
| 7 | +``` python |
| 8 | +os.environ["AWS_ACCESS_KEY_ID"] = "XXXXXXXX" |
| 9 | +os.environ["AWS_SECRET_ACCESS_KEY"] = "XXXXXXXX" |
| 10 | +os.environ["AWS_DEFAULT_REGION"] = "XXXXXXXX" |
| 11 | +``` |
| 12 | +Alternatively you can specify these variables in your runtime environment on Job Submission. |
| 13 | +``` python |
| 14 | +submission_id = client.submit_job( |
| 15 | + entrypoint=..., |
| 16 | + runtime_env={ |
| 17 | + "env_vars": { |
| 18 | + "AWS_ACCESS_KEY_ID": os.environ.get('AWS_ACCESS_KEY_ID'), |
| 19 | + "AWS_SECRET_ACCESS_KEY": os.environ.get('AWS_SECRET_ACCESS_KEY'), |
| 20 | + "AWS_DEFAULT_REGION": os.environ.get('AWS_DEFAULT_REGION') |
| 21 | + }, |
| 22 | + } |
| 23 | +) |
| 24 | +``` |
| 25 | +In your Trainer configuration you can specify a `run_config` which will utilise your external storage. |
| 26 | +``` python |
| 27 | +trainer = TorchTrainer( |
| 28 | + train_func_distributed, |
| 29 | + scaling_config=scaling_config, |
| 30 | + run_config = ray.train.RunConfig(storage_path="s3://BUCKET_NAME/SUB_PATH/", name="unique_run_name") |
| 31 | +) |
| 32 | +``` |
| 33 | +To learn more about Amazon S3 Storage you can find information [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). |
| 34 | + |
| 35 | +## Minio Bucket |
| 36 | +In your Python Script add the following function for configuring your run_config: |
| 37 | +``` python |
| 38 | +import s3fs |
| 39 | +import pyarrow |
| 40 | + |
| 41 | +def get_minio_run_config(): |
| 42 | + s3_fs = s3fs.S3FileSystem( |
| 43 | + key = os.getenv('MINIO_ACCESS_KEY', "XXXXX"), |
| 44 | + secret = os.getenv('MINIO_SECRET_ACCESS_KEY', "XXXXX"), |
| 45 | + endpoint_url = os.getenv('MINIO_URL', "XXXXX") |
| 46 | + ) |
| 47 | + custom_fs = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(s3_fs)) |
| 48 | + run_config = ray.train.RunConfig(storage_path='training', storage_filesystem=custom_fs) |
| 49 | + return run_config |
| 50 | +``` |
| 51 | +You can update the `run_config` to further suit your needs above. |
| 52 | +Lastly the new `run_config` must be added to the Trainer: |
| 53 | +``` python |
| 54 | +trainer = TorchTrainer( |
| 55 | + train_func_distributed, |
| 56 | + scaling_config=scaling_config, |
| 57 | + run_config = get_minio_run_config() |
| 58 | +) |
| 59 | +``` |
| 60 | +To find more information on creating a Minio Bucket compatible with RHOAI you can refer to this [documentation](https://ai-on-openshift.io/tools-and-applications/minio/minio/).<br> |
| 61 | +Note: You must have `sf3s` and `pyarrow` installed in your environment for this method. |
0 commit comments