You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am having an issue with the StataReader class, which is found in stata.py ("pandas/io/stata.py").
I have pandas: 0.17.1.
The following is the python code I am trying to run:
import sys
reload(sys).setdefaultencoding('utf-8')
import pandas as pd
from pandas.io import stata
sr=stata.StataReader(fileName)
where fileName is a stata file.
The following code is part of the _read_old_header method(which starts on line 1184) of the StataReader class in stata.py, which gets called during the initialization of a StataReader object:
if self.format_version > 108:
typlist = [ord(self.path_or_buf.read(1))
for i in range(self.nvar)]
else:
typlist = [
self.OLD_TYPE_MAPPING[
self._decode_bytes(self.path_or_buf.read(1))
] for i in range(self.nvar)
]
I have no errors when my stata files are newer than version 108, but with files that are version 105, there seems to be a bug in _decode_bytes. The above code passes in self and only one additional argument to _decode_bytes, the string that is returned by path_or_buf.read(1).
Here is the the method _decode_bytes (line 896):
def _decode_bytes(self, str, errors=None):
if compat.PY3 or self._encoding is not None:
return str.decode(self._encoding, errors)
else:
return str
When no third argument is passed in (as is the case when it is called by _read_old_header), the argument "errors" is set to None. Here is where the error is thrown. The error is:
TypeError: decode() argument 2 must be string, not None
That is the issue: the decode method of the string class is expecting the second argument to not be a None type, but _decode_bytes passes in errors as None by default.
The text was updated successfully, but these errors were encountered:
@ckingdon95 thanks for the detailed report. We don't have any test files that old, and I cannot create a file that old with the latest version of stata, which is the only one I can access (see link below). So we might need someone to provide us with a test file to troubleshoot this. Are there any version 108 files floating around on the web?
Closespandas-dev#12232, although the issue may resurface for files
containing double values (I can't determine the old type code for
doubles).
Author: Kerby Shedden <[email protected]>
Closespandas-dev#12233 from kshedden/old_stata and squashes the following commits:
aba666c [Kerby Shedden] Read old stat files (bugfix)
I am having an issue with the StataReader class, which is found in stata.py ("pandas/io/stata.py").
I have pandas: 0.17.1.
The following is the python code I am trying to run:
where fileName is a stata file.
The following code is part of the _read_old_header method(which starts on line 1184) of the StataReader class in stata.py, which gets called during the initialization of a StataReader object:
I have no errors when my stata files are newer than version 108, but with files that are version 105, there seems to be a bug in _decode_bytes. The above code passes in self and only one additional argument to _decode_bytes, the string that is returned by path_or_buf.read(1).
Here is the the method _decode_bytes (line 896):
When no third argument is passed in (as is the case when it is called by _read_old_header), the argument "errors" is set to None. Here is where the error is thrown. The error is:
That is the issue: the decode method of the string class is expecting the second argument to not be a None type, but _decode_bytes passes in errors as None by default.
The text was updated successfully, but these errors were encountered: