Skip to content

"FastFile" for Processing Job Input #3962

Closed
@lorenzwalthert

Description

@lorenzwalthert

Describe the feature you'd like

"FastFile" to be an available option for s3_input_mode in sagemaker.Processing.ProcessingInput, in addition to "File" and "Pipe". The s3 input mode is already available for TrainingInput since 2021 and greatly improves speed (-82%) according to an AWS Blog post.

How would this feature be used? Please describe.

To speed up processing jobs compared to donwloading all data and allow complex filtering of files before accessing them.

Describe alternatives you've considered

Other methods like

  • downloading relevant files as part of training job with sagemaker.s3.S3Downloader(). Problem: I can't shard by s3 key and have to build my own sharding logic.
  • Using S3 prefix as the s3_data_type in sagemaker.Processing.ProcessingInput to filter out by prefix: Problem: Some data can't be easily filtered by prefix and you need more complex pattern matching.
  • Using a ManifestFile.

Additional context

I know it's not an SDK topic as long as the underlaying APIs don't provide that functionality but I don't know where I can put the feature request otherwise.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions