Skip to content

Exception with ParameterString in PySparkProcessor.run() Method #3425

Open
@dipanjank

Description

@dipanjank

Describe the bug
If I use a ParameterString or any other PipelineVariable object in the list passed to the arguments argument in PySparkProcessor.run method, I get a TypeError (TypeError: Object of type ParameterString is not JSON serializable).

According to the documentation, arguments can be a list of PipelineVariables, so expecting this to work. Is this not supported?

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.

    spark_processor = PySparkProcessor(
        base_job_name="sagemaker-spark",
        framework_version="3.1",
        role=role,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        sagemaker_session=sagemaker_session,
        max_runtime_in_seconds=1200,
    )

    spark_processor.run(
        submit_app="spark_processing/preprocess.py",
        arguments=[
            "--s3_input_bucket",
            ParameterString(name="s3-input-bucket", default_value=bucket),
            "--s3_input_key_prefix",
            input_prefix_abalone,
            "--s3_output_bucket",
            bucket,
            "--s3_output_key_prefix",
            input_preprocessed_prefix_abalone,
        ],
    )

Expected behavior
A clear and concise description of what you expected to happen.

Expect a SageMaker ProcessingJob to be created.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

Traceback (most recent call last):
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 63, in <module>
    run_sagemaker_spark_job(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 37, in run_sagemaker_spark_job
    spark_processor.run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 902, in run
    return super().run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 265, in run
    return super().run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 248, in wrapper
    return run_func(*args, **kwargs)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 572, in run
    self.latest_job = ProcessingJob.start_new(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 796, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 956, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 4317, in _intercept_create_request
    return create(request)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 953, in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ParameterString is not JSON serializable

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.112.2
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PySpark
  • Framework version: 3.1
  • Python version: default
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions