Skip to content

Category type is discarded with where series method #18888

@andrewdalecramer

Description

@andrewdalecramer

Code Sample, a copy-pastable example if possible

>>> s = pd.Series(["A","A","B","B","C"],dtype='category')
>>> s.dtype
CategoricalDtype(categories=['A', 'B', 'C'], ordered=False)
>>> s.where(s!="C", s)
0    A
1    A
2    B
3    B
4    C
dtype: object
>>> s2 = pd.Series(range(5))
>>> s2.dtype
dtype('int64')
>>> s2.where(s2<3,s2+3)
0    0
1    1
2    2
3    6
4    7
dtype: int64

Problem description

Categories are dropped when using the pd.Series.where method. This increases memory usage for categorical data by making large temporaries and increases the noisiness of code as the type must be reinforced after the statement.

Expected Output

>>> s.where(s!="C", s).dtype
CategoricalDtype(categories=['A', 'B', 'C'], ordered=False)

Output of pd.show_versions()

Is the one in the ubuntu repos:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-43-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions