Skip to content

read_fwf - parsers.py PythonParser._rows_to_columns line 2814 object of type 'NoneType' has no len() #19436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
C5G6M7 opened this issue Jan 29, 2018 · 5 comments
Labels
Bug good first issue IO Data IO issues that don't fit into a more specific label Needs Info Clarification about behavior needed to assess issue

Comments

@C5G6M7
Copy link

C5G6M7 commented Jan 29, 2018

Line 2814 in parsers.py throws an error if self.delimiter is None:

"object of type 'NoneType' has no len()"

Here is the current line of code where the error happens:

if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
    # see gh-13374
    reason = ('Error could possibly be due to quotes being '
        'ignored when a multi-char delimiter is used.')
    msg += '. ' + reason

I propose the following fix, which I believe should be a safe replacement:

if self.delimiter is not None and len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

@C5G6M7 can you write a small, reproducible example that currently fails, and submit a fix with it as a test case?

@TomAugspurger TomAugspurger added Bug IO Data IO issues that don't fit into a more specific label Effort Low Needs Info Clarification about behavior needed to assess issue good first issue labels Jan 30, 2018
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jan 30, 2018
@C5G6M7
Copy link
Author

C5G6M7 commented Jan 31, 2018

@TomAugspurger Yes, I can work on this tonight. I encountered it on quite a big dataset running inside of a another application that uses pandas, so I'm going to have to do a bit of debugging to see how to reproduce this with a simpler input.

In general though, in the above code if self.delimiter is ever None during the execution of this line it will cause an error. I did make the quick patch proposed above to my pandas installation and the problem went away. I believe it is safe to make given that the following code executed in the conditional is just an error message related to a multi-char delimiter which wouldn't be applicable anyway if the delimiter was none.

However there could be another issue earlier in the code if it is always expected that either the delimiter should have a default string value assignment such as a comma so that it has always len() method or self.quoting == csv.QUOTE_NONE whenever the delimiter does not have a value with a len() method.

I'm not 100% sure but it also might fix the issue by just rearranging the order of the conditionals so that "self.quoting != csv.QUOTE_NONE" is executed first so that if this evaluates to false it never checks "len(self.delimiter)"

@C5G6M7
Copy link
Author

C5G6M7 commented Feb 3, 2018

@TomAugspurger still working on reproducing this. I removed edits I made initially to handle this and haven't encountered the error again yet, but also it can only occur with files that have bad lines, which means the column names must be explicitly passed so that it does not automatically create the extra columns.

Unfortunately I can't remember which file it was that caused this. I'm going to continue running this and as soon as I encounter a file that produces the issue again I will update this.

@markjszy
Copy link

markjszy commented Feb 9, 2018

@TomAugspurger

It looks like it was already fixed in development a few months back, with the same sort of solution that @C5G6M7 proposed:

Commit:
23050dc

This looks isolated enough to be backported, or else just closed since it has been fixed in future release.

@TomAugspurger
Copy link
Contributor

Indeed, dupe of #13374

@TomAugspurger TomAugspurger modified the milestones: Next Major Release, No action Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue IO Data IO issues that don't fit into a more specific label Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

3 participants