scikit_bring_your_own.ipynb train model pandas error

Hello!

I am following the scikit_bring_your_own tutorial and I am trying to set up BYO bring your own model for production use, but I am encountering the following issue when trying to train the model on AWS Sagemaker.

```

AlgorithmError: Exception during training: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'. Traceback (most recent call last): File "/opt/program/train", line 48, in train raw_data = [ pd.read_csv(file, header=None) for file in input_files ] File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__ self._make_engine(self.engine) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parser

```

I uploaded the data to s3 using:

```
    def upload_data(self):
        self.logger.info(
            'Uploading locally available data to s3 in path: %s, using bucket: %s using s3 directory prefix: %s'
            % (
                self.config.data_directory_path,
                self.config.data_upload_bucket,
                self.config.s3_data_directory_prefix,
            )
        )

        self.train_data_location = self.session.upload_data(
            path=self.config.data_directory_path,
            bucket=self.config.data_upload_bucket,
            key_prefix=self.config.s3_data_directory_prefix
        )

        self.logger.info('Uploaded local data to s3 path: %s' % (self.train_data_location))
```

I ran the `build_and_push.sh` script. 

Then I tried to train the model using:

```
    def estimator(self):
        self.logger.info(
            'Creating estimator for %s model %s using image %s' % (
                'BYO',
                self.config.model_name,
                self.image,
            )
        )

        return Estimator(
            image_name=self.image,
            role=self.config.role,
            train_instance_count=self.config.train_instance_count,
            train_instance_type=self.config.train_instance_type,
            output_path=self.config.output_path,
            base_job_name=self.config.base_job_name,
            sagemaker_session=self.session,
        )

```

(I'm using the same code as in the notebook, just rewritten for using it as a class)

Am I missing something or doing something wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scikit_bring_your_own.ipynb train model pandas error #219

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scikit_bring_your_own.ipynb train model pandas error #219

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions