BUG: nansum platform overflow #83

jreback · 2014-04-24T14:17:47Z

this is on 32-bit linux
on 64-bit the the int dtypes work correctly
stems from here

workaround with numpy is to do arithmetic in highest dtype, e.g. values.sum(dtype='float64') then cast back

>>> import numpy as np
>>> import bottleneck as bn
>>> bn.__version__
'0.8.0'
>>> np.__version__
'1.8.1'

>>> float(bn.nansum(np.arange(5000000,dtype='float32')))
12499997949952.0
>>> float(bn.nansum(np.arange(5000000,dtype='float64')))
12499997500000.0
>>> int(bn.nansum(np.arange(5000000,dtype='int32')))
1642668640
>>> int(bn.nansum(np.arange(5000000,dtype='int64')))
12499997500000L

The text was updated successfully, but these errors were encountered:

kwgoodman · 2014-04-24T16:32:29Z

These issues are a pain.

The target for bn.nansum is np.sum. For the example you give, bn.nansum behaves like np.sum, at least on my 64-bit linux system.

jreback · 2014-04-24T16:38:05Z

agreed!

they all work on 64-bit correctly (because the return dtype is the platform dtype, e.g. np.int64 or np.float64) (above examples are on a 32-bit linux platform)

the problem ONLY occurs on a 32-bit platform where the default return type is np.int32/np.float64 and it overflows

kwgoodman · 2014-04-24T16:39:40Z

Can you show me an example where bn.nansum and np.sum give a different answer?

jreback · 2014-04-24T16:45:42Z

oh i c what you mean; I think they give the same result

>>> import numpy as np
>>> import bottleneck as bn
>>> v = np.arange(5000000,dtype='int32')
>>> np.sum(v)
1642668640
>>> bn.nansum(v)
1642668640
>>> np.sum(v,dtype='int64')
12499997500000
>>> bn.nansum(v,dtype='int64')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "nansum.pyx", line 3, in func.nansum (bottleneck/src/func/func.c:35139)
TypeError: nansum() got an unexpected keyword argument 'dtype'
>>> bn.__version__
'0.8.0'
>>> np.__version__
'1.8.1'

numpy guys says that the user is responsible for this (e.g. to pass a capable dtype in)
but I think they should raise an OverFlow error

I fixed it by doing this: https://github.com/pydata/pandas/pull/6954/files

kwgoodman · 2014-04-24T16:55:36Z

Ugh, your fix looks like it was painful to make.

bn.nansum does not support all of the input parameters of np.sum :(

jreback · 2014-04-24T17:11:25Z

hah....!

I think to avoid issues, you should simply always use 64 bit dtypes or > when their is a possibility of overflow (and then cast back if necessary / possible) to the correct return type of scalar

on 64-bit this is not an issue at all; and on 32-bit it is ONLY an issue when the operation overflows

numpy is wrong (though at the very least they should raise OverFlow)

kwgoodman closed this as completed Jul 8, 2014

lumbric mentioned this issue Feb 13, 2019

Wrong result for float32 Series when using bottleneck pandas-dev/pandas#25307

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: nansum platform overflow #83

BUG: nansum platform overflow #83

jreback commented Apr 24, 2014

kwgoodman commented Apr 24, 2014

Uh oh!

jreback commented Apr 24, 2014

Uh oh!

kwgoodman commented Apr 24, 2014

Uh oh!

jreback commented Apr 24, 2014

Uh oh!

kwgoodman commented Apr 24, 2014

Uh oh!

jreback commented Apr 24, 2014

Uh oh!

BUG: nansum platform overflow #83

BUG: nansum platform overflow #83

Comments

jreback commented Apr 24, 2014

kwgoodman commented Apr 24, 2014

Uh oh!

jreback commented Apr 24, 2014

Uh oh!

kwgoodman commented Apr 24, 2014

Uh oh!

jreback commented Apr 24, 2014

Uh oh!

kwgoodman commented Apr 24, 2014

Uh oh!

jreback commented Apr 24, 2014

Uh oh!