Skip to content

Column loses category when using .loc for a one row dataframe #16360

@nyejon

Description

@nyejon

If I convert a list of columns to type 'category'

import pandas as pd 

d1 = {'one' : ['a'],
     'two' : ['a']}

d2 = {'one' : ['a', 'b'],
     'two' : ['a', 'b']}


df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
df1.loc[: , 'one']= df1['one'].astype('category')
df2.loc[: , 'one'] = df2['one'].astype('category')

print('df1')
print(df1)
print(df1.dtypes)
print('df2')
print(df2)
print(df2.dtypes)

df1
  one two
0   a   a
one    category
two      object
dtype: object
df2
  one two
0   a   a
1   b   b
one    category
two      object
dtype: object

df1['one'] = df1['one'].cat.set_categories(df2['one'].cat.categories)

print('Assigning without loc')
print(df1)
print(df1.dtypes)

Assigning without loc
  one two
0   a   a
one    category
two      object
dtype: object

df1.loc[:, 'one'] = df1['one'].cat.set_categories(df2['one'].cat.categories)
print('Assigning with loc')
print(df1)
print(df1.dtypes)

Assigning with loc
  one two
0   a   a
one    object
two    object
dtype: object

df2.loc[:, 'one'] = df2['one'].cat.set_categories(df2['one'].cat.categories)
print('Assigning df2 with loc')
print(df2)
print(df2.dtypes)

Assigning df2 with loc
  one two
0   a   a
1   b   b
one    category
two      object
dtype: object

Problem description

I am trying to convert my defined categorical columns to the category type. It works when the dataframe is longer than one row, but if it is only one row it keeps the datatype as object.

With only one row I get the following column outputs:

df.dtypes

  one two
0   a   a
one    object
two    object
dtype: object

Expected Output

I would expect the column to be type category even for one row.

df.dtypes

  one two
0   a   a
one    category
two    object
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: No

ne
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.12.1
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

CategoricalCategorical Data TypeIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions