Skip to content

BUG: NDFrame._where returns an unexpected dtype #52662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
ELHoussineT opened this issue Apr 13, 2023 · 2 comments
Closed
3 tasks done

BUG: NDFrame._where returns an unexpected dtype #52662

ELHoussineT opened this issue Apr 13, 2023 · 2 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ELHoussineT
Copy link

ELHoussineT commented Apr 13, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd 
s = pd.Series(['a','b']) # dtype: object
s.mask([True, True], 1) # dtype: object but expected to be int64

Issue Description

NDFrame._where which is used under the hood of .where and .mask returns sometimes unexpected dtype.

To illustrate, the final Series in the example below is expected to have int64 dtype but it remains a object.

>>> s = pd.Series(['a','b'])
0    a
1    b
dtype: object

>>> s.mask([True, True], 1)
0    1
1    1
dtype: object

This issue came to our attention during the construction of this PR: #50343 (comment)

Expected Behavior

It is expected for the resulting object to "refresh" its dtype. In the example above, the dtype shall be int64.

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.0-38-generic
Version : #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.3
numpy : 1.23.5
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
...
xlrd : 2.0.1
xlwt : None
zstandard : 0.19.0
tzdata : 2022.7

@ELHoussineT ELHoussineT added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 13, 2023
@rhshadrach
Copy link
Member

I disagree, I don't think we should be doing inference here. If you instead did s.mask([True, False], 1), then this would need to be object dtype. The dtype of the result should not depend on the values of the mask.

@rhshadrach rhshadrach added the Dtype Conversions Unexpected or buggy dtype conversions label Apr 14, 2023
@phofl
Copy link
Member

phofl commented Apr 14, 2023

Agreed with @rhshadrach

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Closing Candidate May be closeable, needs more eyeballs labels Apr 14, 2023
@phofl phofl closed this as completed Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants