Skip to content

TFX with Dataflow Python Version Error #1216

@luischinchillagarcia

Description

@luischinchillagarcia

When running a pipeline with Dataflow, BQExampleGen job always works, however, the statisticsGen, schemaGen, ExampleValidator always fails, saying the python version in the 'setup.py' is the issue. Here is a Colab file reproducing the issue.

This Beam issue briefly addresses the issue, however, upon testing the recommended Python versions >=3.7.4, I get an error that Dataflow requires Python 3.6.9. Is there any way to circumvent.

The exact error on the Dataflow logs appears as follows:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 650, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 150, in execute
    test_shuffle_sink=self._test_shuffle_sink)
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 116, in create_operation
    is_streaming=False)
  File "apache_beam/runners/worker/operations.py", line 932, in apache_beam.runners.worker.operations.create_operation
  File "apache_beam/runners/worker/operations.py", line 766, in apache_beam.runners.worker.operations.create_pgbk_op
  File "apache_beam/runners/worker/operations.py", line 822, in apache_beam.runners.worker.operations.PGBKCVOperation.__init__
  File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 283, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
    return load(file, ignore)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
    obj = pik.load()
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type
    return _reverse_typemap[name]
KeyError: 'ClassType'

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions