-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
It looks like various functions that rely on hash tables fail when complex numbers are present, as it appears they only use the real component.
Consider the following list of values:
In [2]: x = [0, 1j, 1, 1+1j, 1+2j]
In [3]: x
Out[3]: [0, 1j, 1, (1+1j), (1+2j)]Using value_counts:
In [4]: pd.value_counts(x)
Out[4]:
(1+0j) 3
0j 2
dtype: int64Using unique:
In [5]: pd.unique(x)
Out[5]: array([ 0., 1.])Using duplicated:
In [6]: pd.Series(x).duplicated()
Out[6]:
0 False
1 True
2 False
3 True
4 True
dtype: boolUsing isin:
In [7]: pd.Series(x).isin([1j, 1+1j, 1+2j])
Out[7]:
0 False
1 False
2 False
3 False
4 False
dtype: bool
In [8]: pd.Series(x).isin([0, 1])
Out[8]:
0 True
1 True
2 True
3 True
4 True
dtype: boolUsing factorize fails as described in #16399, and multiple other functions fail in similar ways (rank, nunique, mode, etc.).
Note that these appear to work if the dtype is explicitly set to object instead of complex64:
In [9]: pd.Series(x, dtype=object).isin([0, 1])
Out[9]:
0 True
1 False
2 True
3 False
4 False
dtype: boolOutput of pd.show_versions()
INSTALLED VERSIONS
commit: 51c5f4d
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.21.0rc1+26.g51c5f4d
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None