-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
BugCategoricalCategorical Data TypeCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsError ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action
Description
Currently any type conversions on merge are silent, e.g.
In [24]: a = pd.DataFrame({'cat_key': pd.Categorical(['a', 'b', 'c']), 'int_key': [1, 2, 3]})
In [25]: b = pd.DataFrame({'cat_key': pd.Categorical(['b', 'a', 'c']), 'values': [1, 2, 3]})
In [26]: a.merge(b).dtypes
Out[26]:
cat_key object
int_key int64
values int64
dtype: object
In [29]: b2 = pd.DataFrame({'int_key': [2.0, 1.0, 3.0], 'values': [1, 2, 3]})
In [30]: a.merge(b2)
Out[30]:
cat_key int_key values
0 a 1 2
1 b 2 1
2 c 3 3
In [31]: a.merge(b2).dtypes
Out[31]:
cat_key object
int_key int64
values int64
dtype: object
#15321 will make [26]
preserve a categorical dtype, but if the categories don't overlap, it will be converted to object.
So, should there be a something like a conversions='ignore'|'warn'|'error'
option?
Metadata
Metadata
Assignees
Labels
BugCategoricalCategorical Data TypeCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsError ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action