Skip to content

Selecting "subsets" of a MultiIndex DataFrame sometimes changes dtypes #20757

@Mofef

Description

@Mofef

Code Sample, a copy-pastable example if possible

Definition of data and columns:

from numpy import nan
data = [['n',  1,  0,  False,  2,  1,  False,  0,  0,  False,  2,  0,  False,  0,  1,  False,  1,  1,  False,  'o',
  1521734085.289453,  'p',  3233,  1521734085.289494]]
columns = [('a', 'd', 'i', nan, nan),
 ('a', 'd', 'j', 0.0, 'k'),
 ('a', 'd', 'j', 0.0, 'l'),
 ('a', 'd', 'j', 0.0, 'm'),
 ('a', 'd', 'j', 1.0, 'k'),
 ('a', 'd', 'j', 1.0, 'l'),
 ('a', 'd', 'j', 1.0, 'm'),
 ('a', 'd', 'j', 2.0, 'k'),
 ('a', 'd', 'j', 2.0, 'l'),
 ('a', 'd', 'j', 2.0, 'm'),
 ('a', 'd', 'j', 3.0, 'k'),
 ('a', 'd', 'j', 3.0, 'l'),
 ('a', 'd', 'j', 3.0, 'm'),
 ('a', 'd', 'j', 4.0, 'k'),
 ('a', 'd', 'j', 4.0, 'l'),
 ('a', 'd', 'j', 4.0, 'm'),
 ('a', 'd', 'j', 5.0, 'k'),
 ('a', 'd', 'j', 5.0, 'l'),
 ('a', 'd', 'j', 5.0, 'm'),
 ('b', 'f', nan, nan, nan),
 ('b', 'h', nan, nan, nan),
 ('c', 'e', nan, nan, nan),
 ('c', 'g', nan, nan, nan),
 ('c', 'h', nan, nan, nan)]
pd.DataFrame(data, columns=pd.MultiIndex.from_tuples(columns)).dtypes.a.d.i
# object

pd.DataFrame(data, columns=pd.MultiIndex.from_tuples(columns)).a.d.i.dtypes
# float64

this causes for example:

pd.DataFrame(np.array(data), columns=pd.MultiIndex.from_tuples(columns)).a.d.i 
# "n", dtype: object

pd.DataFrame(data, columns=pd.MultiIndex.from_tuples(columns)).a.d.i 
#  nan, dtype: float32

Problem description

I think the example is self explaining

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-119-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.0.dev0+38.g6552718
pytest: 2.8.7
pip: 9.0.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.3.6
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions