-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
CategoricalDtype does not work properly with bool column with missing values. #19182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
This patch fixes. It doesn't break anything either, which means this path was not being fully exercised. Note that this might show some reduced performance in a couple of cases (so need to run asv). as well as add some test coverage of the hash bool path. (also let's update the comment there).
|
CategoricalDtype
does not work properly with bool column with missing values.
This looks to work on master. Could use a test
|
4 tasks
closed in #29344 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem description
When I'd like to convert a boolean column with missing values (so the column is indeed of type
object
instead of abool
) into a category variable using a customized category order, bothastype('category', categories=...)
andpd.api.types.CategoricalDtype(...)
failed to do so.However if no customized ordered given,
astype('category')
do work without error.Expected outcome
I should be able to run
and equivalently
without error.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.22.0
pytest: 3.0.6
pip: 9.0.1
setuptools: 36.2.5
Cython: 0.25.2
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.7 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: