-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
BugReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, ExplodeTestingpandas testing functions or related to the test suitepandas testing functions or related to the test suite
Milestone
Description
See http://stackoverflow.com/questions/17236852/pandas-crosstab-double-counting-when-using-two-aggregate-functions for discussion.
To reproduce:
Create a test dataframe:
df = DataFrame({'A': ['one', 'one', 'two', 'three'] * 6, 'B': ['A', 'B', 'C'] * 8, 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4, 'D': np.random.randn(24), 'E': np.random.randn(24)})
Crosstab gives the expected results:
crosstab(rows=[df['A'],df['B']], cols=[df['C']], margins=True)
C bar foo All
A B
one A 2 2 4
B 2 2 4
C 2 2 4
three A 2 0 2
B 0 2 2
C 2 0 2
two A 0 2 2
B 2 0 2
C 0 2 2
All 12 12 24
However, if you try to get mean and count in the same crosstab, specifying values for the mean, the crosstab will double count the elements when calculating the margin total:
crosstab(rows=[df['A'],df['B']], cols=[df['C']], margins=True, aggfunc=[np.size, np.mean], values=df['D'])
size mean
C bar foo All bar foo All
A B
one A 2 2 4 0.245998 0.076366 0.161182
B 2 2 4 -0.739757 0.137780 -0.300988
C 2 2 4 -1.555759 -1.446554 -1.501157
three A 2 NaN 2 1.216109 NaN 1.216109
B NaN 2 2 NaN 0.255482 0.255482
C 2 NaN 2 0.732448 NaN 0.732448
two A NaN 2 2 NaN -0.273747 -0.273747
B 2 NaN 2 -0.001649 NaN -0.001649
C NaN 2 2 NaN 0.685422 0.685422
All 24 24 24 -0.017102 -0.094208 -0.055655
Metadata
Metadata
Assignees
Labels
BugReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, ExplodeTestingpandas testing functions or related to the test suitepandas testing functions or related to the test suite