Improved ways of storing local code in S3 for ProcessingSteps

**Describe the feature you'd like**
Currently, when using Processors such as `SKLearnProcessor` there is no way to specify where a local `code=` file should be stored in S3 when used in conjunction with a `ProcessingStep`. This can lead to clutter in S3 buckets, for example. The current behaviour places code in the `default_bucket` of a Sagemaker session like so:

`s3://{default_bucket}/auto_generated_hash/input/code/preprocess.py`

A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:

`s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/code/preprocess.py`
`s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/data/train.csv`
`s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/model/model.pkl`

This should already be possible with the `FrameworkProcessor` and utilising the `code_location=` parameter but this seems to be ignored by the `ProcessingStep`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved ways of storing local code in S3 for ProcessingSteps #4879

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improved ways of storing local code in S3 for ProcessingSteps #4879

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions