Skip to content

Commit 45d0a74

Browse files
Added documentation for S3 compatible storage
1 parent 7eb00be commit 45d0a74

File tree

2 files changed

+62
-0
lines changed

2 files changed

+62
-0
lines changed

demo-notebooks/guided-demos/mnist_fashion.py

+1
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ def train_func_distributed():
7474
# For GPU Training, set `use_gpu` to True.
7575
use_gpu = True
7676

77+
# To learn more about configuring S3 compatible storage check out our docs -> https://github.com/project-codeflare/codeflare-sdk/blob/main/docs/s3-compatible-storage.md
7778
trainer = TorchTrainer(
7879
train_func_distributed,
7980
scaling_config=ScalingConfig(

docs/s3-compatible-storage.md

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# S3 compatible storage with Ray Train examples
2+
Some of our distributed training examples require an external storage solution so that all nodes can access the same data. <br>
3+
The following are examples for configuring S3 or Minio storage for your Ray Train script or interactive session.
4+
5+
## S3 Bucket
6+
In your Python Script add the following environment variables:
7+
``` python
8+
os.environ["AWS_ACCESS_KEY_ID"] = "XXXXXXXX"
9+
os.environ["AWS_SECRET_ACCESS_KEY"] = "XXXXXXXX"
10+
os.environ["AWS_DEFAULT_REGION"] = "XXXXXXXX"
11+
```
12+
Alternatively you can specify these variables in your runtime environment on Job Submission.
13+
``` python
14+
submission_id = client.submit_job(
15+
entrypoint=...,
16+
runtime_env={
17+
"env_vars": {
18+
"AWS_ACCESS_KEY_ID": os.environ.get('AWS_ACCESS_KEY_ID'),
19+
"AWS_SECRET_ACCESS_KEY": os.environ.get('AWS_SECRET_ACCESS_KEY'),
20+
"AWS_DEFAULT_REGION": os.environ.get('AWS_DEFAULT_REGION')
21+
},
22+
}
23+
)
24+
```
25+
In your Trainer configuration you can specify a `run_config` which will utilise your external storage.
26+
``` python
27+
trainer = TorchTrainer(
28+
train_func_distributed,
29+
scaling_config=scaling_config,
30+
run_config = ray.train.RunConfig(storage_path="s3://BUCKET_NAME/SUB_PATH/", name="unique_run_name")
31+
)
32+
```
33+
To learn more about Amazon S3 Storage you can find information [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html).
34+
35+
## Minio Bucket
36+
In your Python Script add the following function for configuring your run_config:
37+
``` python
38+
import s3fs
39+
import pyarrow
40+
41+
def get_minio_run_config():
42+
s3_fs = s3fs.S3FileSystem(
43+
key = os.getenv('MINIO_ACCESS_KEY', "XXXXX"),
44+
secret = os.getenv('MINIO_SECRET_ACCESS_KEY', "XXXXX"),
45+
endpoint_url = os.getenv('MINIO_URL', "XXXXX")
46+
)
47+
custom_fs = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(s3_fs))
48+
run_config = ray.train.RunConfig(storage_path='training', storage_filesystem=custom_fs)
49+
return run_config
50+
```
51+
You can update the `run_config` to further suit your needs above.
52+
Lastly the new `run_config` must be added to the Trainer:
53+
``` python
54+
trainer = TorchTrainer(
55+
train_func_distributed,
56+
scaling_config=scaling_config,
57+
run_config = get_minio_run_config()
58+
)
59+
```
60+
To find more information on creating a Minio Bucket compatible with RHOAI you can refer to this [documentation](https://ai-on-openshift.io/tools-and-applications/minio/minio/).<br>
61+
Note: You must have `sf3s` and `pyarrow` installed in your environment for this method.

0 commit comments

Comments
 (0)