Skip to content

"TypeError: 'set' object does not support indexing" using na_values in read_csv() #11374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
goyodiaz opened this issue Oct 19, 2015 · 11 comments
Closed
Labels
IO CSV read_csv, to_csv

Comments

@goyodiaz
Copy link
Contributor

Test case:

user@host:~$ python3
Python 3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import StringIO
>>> import pandas as pd
>>> src = """first, second
... 0,0.1
... 1,1.1
... """
>>> df = pd.read_csv(StringIO(src), na_values='XX')
>>> print(df)
   first   second
0      0      0.1
1      1      1.1
>>> df = pd.read_csv(StringIO(src), na_values='-999.99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 491, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 278, in _read
    return parser.read()
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 740, in read
    ret = self._engine.read(nrows)
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 1187, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082)
  File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338)
  File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465)
  File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858)
  File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744)
  File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634)
  File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996)
  File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing
>>> pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8

pandas: 0.17.0
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Oct 19, 2015

You must be picking up another version of pandas somehow. The error you are seeing IIRC is from a somewhat older version of pandas

This works just fine on linux with 3.4 (mac is below).
I know this is also tested.

Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: pd.__version__ 
Out[1]: '0.17.0'

In [2]: src = 'first, second\n0,0.1\n1,1.1'

In [3]: from io import StringIO

In [4]: pd.read_csv(StringIO(src), na_values='-999.99')
Out[4]: 
   first   second
0      0      0.1
1      1      1.1

@jreback jreback added Can't Repro IO CSV read_csv, to_csv labels Oct 19, 2015
@jreback
Copy link
Contributor

jreback commented Oct 19, 2015

show pd.__version__.

it looks like you are directly running print_versions which is another indication you are actually using an older version (BUT ``print_versions actually will look at your environment and NOT from where it is called)

@jreback jreback closed this as completed Oct 19, 2015
@vlasisva
Copy link

I see exactly the same error:

python b.py

0.17.0
Traceback (most recent call last):
File "b.py", line 8, in
df = pd.read_csv(StringIO(src), na_values='-999.99')
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 278, in _read
return parser.read()
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 740, in read
ret = self._engine.read(nrows)
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1187, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082)
File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465)
File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858)
File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744)
File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634)
File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996)
File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing


cat b.py
from StringIO import StringIO
import pandas as pd
src = """first, second
0,0.1
1,1.1
"""
print pd.version
df = pd.read_csv(StringIO(src), na_values='-999.99')


lsb_release --all
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04

Codename: trusty

python
Python 2.7.10 |Anaconda 2.1.0 (64-bit)| (default, May 28 2015, 17:02:03)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org


@goyodiaz
Copy link
Contributor Author

pd.__version__ is also 0.17.0 here.

More facts:

  • It happened in both python2.7 and python3.4 after pip-upgrading pandas to 0.17.0 and ubuntu to 15.10 (but it looks like @vlasisva is using ubuntu 14.04 and anaconda)
  • Using engine='python makes the test pass.
  • Everything else seems to be working as expected in pandas and python.

In order to clean my python environment as much as possible I uninstalled every non-distro package/version and every distro package not installed by default except dependencies of other software I use: python2.7 numpy, python2.7 gdal bindings, gnome stuff... I even uninstalled pip (packaged python3 pip is almost useless in willy anyway).

I also did my best to ensure there where nothing python-related in ~/.local/bin, ~/.local/lib, /usr/local/bin and /usr/local/lib. I also made sure there were nothing called pandas in every mounted file system. I then used get-pip.py to install pip2 and pip3 and installed python2 and python3 pandas. The issue is still present.

While this is not critical to me (it just broke one test for a function I never use in that way) I would really like to understand what's going on, but I do not know where to look at.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2015

so the error line:

File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing

tells me that you are using some kind of development version of pandas (somewhere). This function DOES not exist in master or 0.17.0.

pls make sure that you are not in a development directory when trying to import pandas.

Its not clear what you actually have installed, so pls create a new virtual env or use conda.

@vlasisva
Copy link

I installed pandas via pip
Either our environment is contaminated somehow, or what pip brings is now what you/we expect?

Will check and get back to you.

@vlasisva
Copy link

My "pip install pandas==0.17.0" downloads
https://pypi.python.org/packages/source/p/pandas/pandas-0.17.0.tar.gz#md5=55d34c4d5655c94ca30a59dea6b36316

which contains file pandas/parser.c,
which contains the following in line 1554:

static kh_float64_t ___pyx_f_6pandas_6parser_kset_float64_from_list(PyObject ); /_proto/

@jreback
Copy link
Contributor

jreback commented Oct 28, 2015

ok, it appears that when I distributed this it didn't rebuild the .c files (and had a newer version I was testing out). very odd.

so will fix for 0.17.1 (e.g. will make a clean version). you can simply regenerate the .c files (you need cython installed).

e.g.

make clean
python setup.py install

@goyodiaz
Copy link
Contributor Author

Thanks, Jeff. That worked.

@vlasisva
Copy link

Other than this bug, would you consider pip-obtained pandas 0.17.0 as safe to use?

@jreback
Copy link
Contributor

jreback commented Oct 29, 2015

yep as I said the .c for he parser came from a or which is now merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants