-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Closed
Description
Hello!
I am following the scikit_bring_your_own tutorial and I am trying to set up BYO bring your own model for production use, but I am encountering the following issue when trying to train the model on AWS Sagemaker.
AlgorithmError: Exception during training: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'. Traceback (most recent call last): File "/opt/program/train", line 48, in train raw_data = [ pd.read_csv(file, header=None) for file in input_files ] File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__ self._make_engine(self.engine) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parser
I uploaded the data to s3 using:
def upload_data(self):
self.logger.info(
'Uploading locally available data to s3 in path: %s, using bucket: %s using s3 directory prefix: %s'
% (
self.config.data_directory_path,
self.config.data_upload_bucket,
self.config.s3_data_directory_prefix,
)
)
self.train_data_location = self.session.upload_data(
path=self.config.data_directory_path,
bucket=self.config.data_upload_bucket,
key_prefix=self.config.s3_data_directory_prefix
)
self.logger.info('Uploaded local data to s3 path: %s' % (self.train_data_location))
I ran the build_and_push.sh
script.
Then I tried to train the model using:
def estimator(self):
self.logger.info(
'Creating estimator for %s model %s using image %s' % (
'BYO',
self.config.model_name,
self.image,
)
)
return Estimator(
image_name=self.image,
role=self.config.role,
train_instance_count=self.config.train_instance_count,
train_instance_type=self.config.train_instance_type,
output_path=self.config.output_path,
base_job_name=self.config.base_job_name,
sagemaker_session=self.session,
)
(I'm using the same code as in the notebook, just rewritten for using it as a class)
Am I missing something or doing something wrong?
Metadata
Metadata
Assignees
Labels
No labels