Skip to content

Improved ways of storing local code in S3 for ProcessingSteps #4879

Open
@HFulcher

Description

@HFulcher

Describe the feature you'd like
Currently, when using Processors such as SKLearnProcessor there is no way to specify where a local code= file should be stored in S3 when used in conjunction with a ProcessingStep. This can lead to clutter in S3 buckets, for example. The current behaviour places code in the default_bucket of a Sagemaker session like so:

s3://{default_bucket}/auto_generated_hash/input/code/preprocess.py

A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:

s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/code/preprocess.py
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/data/train.csv
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/model/model.pkl

This should already be possible with the FrameworkProcessor and utilising the code_location= parameter but this seems to be ignored by the ProcessingStep.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions