-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: df.where() inconsistently casts columns to integers #42295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
3 tasks
It wasn't cast to integers at all, it was just displayed without |
@MemphisMeng Thanks for your reply. Actually it seems it was really casted to integers: import pandas as pd
import numpy as np
# Float column casted to int
d1 = pd.DataFrame({"a": [1.0, 2.0], 'b': [3, np.nan]})
print('\nd1:')
print(d1)
print('\nd1.dtypes:')
print(d1.dtypes)
print('\nd1.where(pd.notnull(d1), None):')
print(d1.where(pd.notnull(d1), None))
print('\nd1.where(pd.notnull(d1), None).dtypes:')
print(d1.where(pd.notnull(d1), None).dtypes)
print()
# Float column not casted to int
d2 = pd.DataFrame({"a": [1.0, 2.0], 'b': [3, 4]})
print('\nd2:')
print(d2)
print('\nd2.dtypes:')
print(d2.dtypes)
print('\nd2.where(pd.notnull(d2), None):')
print(d2.where(pd.notnull(d2), None))
print('\nd2.where(pd.notnull(d2), None).dtypes:')
print(d2.where(pd.notnull(d2), None).dtypes) |
Alright, I mean when I tried |
This looks consistent on master now. Could use a test
|
take |
3 tasks
take |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
In both previous examples, the column
a
is identical (a float column containing 1.0 and 2.0), but the outcome ofdf.where(pd.notnull(df), None)
differs.Expected Output
For a given column, I would expect the outcome of
df.where(pd.notnull(df), None)
to be consistent, regardless of the other columns of the dataframe.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.9.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 49.6.0
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 3.5.4
blosc : None
feather : None
xlsxwriter : 1.4.3
lxml.etree : 4.6.3
html5lib : None
pymysql : 1.0.2
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.17
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: