Multicolumn GroupBy appears to convert unit64s to floats #30859

brianwgoldman · 2020-01-09T19:13:45Z

Code Sample, a copy-pastable example if possible

pd.DataFrame({'first': [1], 'second': [1], 'value': [16148277970000000000]}).groupby(['first', 'second'])['value'].max()

Problem description

When that code snippet runs, the result is not 16148277970000000000 as you would expect, but 16148277969999998976. Note that int(float(16148277970000000000)) == 16148277969999998976.

Additional notes:

The problem only appears to happen if there are multiple groupby keys. For example just doing groupby(['first']) returns the expected result. So does removing the groupby statement entirely.
The problem is not specific to max. I get the same problem for min, first, last, median, mean, but not head, tail, or apply.
The problem also happens if you do .transform(max).
The smallest number I've found so far that has a problem is 2**63 + 1

Expected Output

16148277970000000000

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.137+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.5
pytz : 2018.9
dateutil : 2.6.1
pip : 19.3.1
setuptools : 42.0.2
Cython : 0.29.14
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 5.5.0
pandas_datareader: 0.7.4
bs4 : 4.6.3
bottleneck : 1.3.1
fastparquet : None
gcsfs : 0.6.0
lxml.etree : 4.2.6
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.4.4
xarray : 0.14.1
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

WillAyd · 2020-01-09T22:46:24Z

Can you try on master? Looks OK for me:

>>> pd.DataFrame({'first': [1], 'second': [1], 'value': [16148277970000000000]}).groupby(['first', 'second'])['value'].max()
first  second
1      1         16148277970000000000
Name: value, dtype: uint64

brianwgoldman · 2020-01-10T14:45:59Z

If you can reproduce the problem at 0.25.3 but not at master, I think we can call this closed as already fixed.

If not, I'm going to need your help on how to build the master version as I've not done it before.

Dr-Irv · 2020-09-05T15:09:45Z

Works in 1.1.1

Dr-Irv · 2020-09-05T18:33:51Z

A PR that creates a test case would be welcomed.

TAJD · 2020-09-06T10:51:23Z

take

…andas-dev#36164)

Dr-Irv closed this as completed Sep 5, 2020

Dr-Irv reopened this Sep 5, 2020

Dr-Irv added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Sep 5, 2020

github-actions bot assigned TAJD Sep 6, 2020

TAJD pushed a commit to TAJD/pandas that referenced this issue Sep 6, 2020

TST verify groupby doesn't alter unit64s to floats pandas-dev#30859

2626073

TAJD mentioned this issue Sep 6, 2020

TST verify groupby doesn't alter unit64s to floats #30859 #36164

Merged

4 tasks

jreback added this to the 1.2 milestone Sep 6, 2020

jreback closed this as completed in #36164 Sep 7, 2020

jreback pushed a commit that referenced this issue Sep 7, 2020

TST verify groupby doesn't alter unit64s to floats #30859 (#36164)

b37c9f8

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this issue Sep 8, 2020

TST verify groupby doesn't alter unit64s to floats pandas-dev#30859 (p…

7cb1421

…andas-dev#36164)

kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020

TST verify groupby doesn't alter unit64s to floats pandas-dev#30859 (p…

5916ecc

…andas-dev#36164)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multicolumn GroupBy appears to convert unit64s to floats #30859

Multicolumn GroupBy appears to convert unit64s to floats #30859

brianwgoldman commented Jan 9, 2020

INSTALLED VERSIONS

WillAyd commented Jan 9, 2020

Uh oh!

brianwgoldman commented Jan 10, 2020

Uh oh!

Dr-Irv commented Sep 5, 2020

Uh oh!

Dr-Irv commented Sep 5, 2020

Uh oh!

TAJD commented Sep 6, 2020

Uh oh!

Uh oh!

Multicolumn GroupBy appears to convert unit64s to floats #30859

Multicolumn GroupBy appears to convert unit64s to floats #30859

Comments

brianwgoldman commented Jan 9, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Jan 9, 2020

Uh oh!

brianwgoldman commented Jan 10, 2020

Uh oh!

Dr-Irv commented Sep 5, 2020

Uh oh!

Dr-Irv commented Sep 5, 2020

Uh oh!

TAJD commented Sep 6, 2020

Uh oh!

Output of `pd.show_versions()`