Description
Describe the feature you'd like
Currently, when using Processors such as SKLearnProcessor
there is no way to specify where a local code=
file should be stored in S3 when used in conjunction with a ProcessingStep
. This can lead to clutter in S3 buckets, for example. The current behaviour places code in the default_bucket
of a Sagemaker session like so:
s3://{default_bucket}/auto_generated_hash/input/code/preprocess.py
A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/code/preprocess.py
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/data/train.csv
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/model/model.pkl
This should already be possible with the FrameworkProcessor
and utilising the code_location=
parameter but this seems to be ignored by the ProcessingStep
.