Skip to content

Document which JARs are available for PySparkProcessor by default #4116

Open
@j-adamczyk

Description

@j-adamczyk

What did you find confusing? Please describe.

In AWS EMR, it's explicitly stated in documentation what .jar files are built in. For example, optimized connector to Amazon Redshift is built in. Relation between EMR and SageMaker implementations of PySpark are not mentioned anywhere, especially about connectors and Redshift.

Describe how documentation can be improved

Note explicitly which .jar additional files and connectors are built in for SageMaker PySpark implementations.

Additional context

Redshift is a popular data source for ML, and it would be very convenient if:

  • connector from EMR was built in
  • this was stated explicitly in documentation

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions