Skip to content

support axis=None for nanmedian ( issue #7352 ) #7440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2014
Merged

support axis=None for nanmedian ( issue #7352 ) #7440

merged 1 commit into from
Jun 12, 2014

Conversation

toddrjen
Copy link
Contributor

This fixes #7352, where nanmedian does not work when axis==None.

@jreback jreback added this to the 0.14.1 milestone Jun 12, 2014
@toddrjen
Copy link
Contributor Author

I have added a fix (hopefully) for the rounding errors

@@ -118,11 +120,39 @@ def check_results(self, targ, res, axis):
res = getattr(res, 'values', res)
if axis != 0 and hasattr(targ, 'shape') and targ.ndim:
res = np.split(res, [targ.shape[0]], axis=0)[0]
tm.assert_almost_equal(targ, res)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just pass check_less_precise=True if its a complex number (or explicty astype before the comparison)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, it doesn't fix the problem. It still raises an AssertionError even if they only differ in their 16th digit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> a=np.array([1+.1111111111111111*1j])
>>> b=np.array([1+.1111111111111112*1j])
>>> tm.assert_almost_equal(a, b, check_less_precise=True)
AssertionError: (1+0.1111111111111111j) != (1+0.1111111111111112j)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare the real and imag pars separately and then it works correctly. side issue is to patch tm.assert_almost_equal to deal with complex numbers by this method

In [1]: a=np.array([1+.1111111111111111*1j])

In [2]: b=np.array([1+.1111111111111112*1j])

In [4]: tm.assert_almost_equal(a.real, b.real)
Out[4]: True

In [5]: tm.assert_almost_equal(a.imag, b.imag)
Out[5]: True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

jreback added a commit that referenced this pull request Jun 12, 2014
support axis=None for nanmedian ( issue #7352 )
@jreback jreback merged commit 326ef95 into pandas-dev:master Jun 12, 2014
@jreback
Copy link
Contributor

jreback commented Jun 12, 2014

thanks!

@jreback
Copy link
Contributor

jreback commented Jun 12, 2014

@toddrjen a bunch of tests failing on windows. I debugged one of them below.

If you can debug this would be great.

C:\Users\Jeff Reback\Documents\GitHub\pandas>more test.27-64.log
..............................................S................S................S................S......................................................................................................
...........................................................................................................S...S.S....SSS......................................S........................................
..............S.....................................................S..................................................SS...............................................................................
........................................................................................................................................................................................................
.................................................................................S......................................................................................................................
.................S............S.................................................................................................................SS..........................SSSS........................
....S.....S.............................................S.................SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS....S...................................................................S.................
............SSSSSSSSSSS...................................S.S...........................................................................................................................................
............S..................................S.......................SSSSSSS...................................................................................................S......................
............................................................................................................................................S...........................................................
..........................................................................S..S..........................................................................................................................
...................................................S...S................................................................................................................................................
........................................................................................................................................................................................................
..................................................................................................................S.....................................................................................
.........................................................................S..............................................................................................................................
.................S......................................................................................................................................................................................
...................................................................C:\python27-64\lib\site-packages\numpy\core\_methods.py:55: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
...SS.....S...SS.......S.S.......................SS.....S...SS.......S.S..............SS..SS.......................................................................S....................................
.........................................C:\python27-64\lib\site-packages\matplotlib\axes.py:4747: UserWarning: No labeled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labeled objects found. "
....................................................................................................................................................C:\python27-64\lib\site-packages\matplotlib\__init__
.py:1172: UserWarning:  This call to matplotlib.use() has no effect
because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

  warnings.warn(_use_error_msg)
..............................................................................................................................................c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win
-amd64-2.7\pandas\core\index.py:1013: RuntimeWarning: Cannot compare type 'Timestamp' with type 'str', sort order is undefined for incomparable objects
  "incomparable objects" % e, RuntimeWarning)
..................................................c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\index.py:1013: RuntimeWarning: Cannot compare type 'Timestamp' with t
ype 'long', sort order is undefined for incomparable objects
  "incomparable objects" % e, RuntimeWarning)
.................................................................................................................................................................................S.....S................
................................................S.......................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
...........................................................................................................................................................S............................................
......................c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py:193: ComplexWarning: Casting complex values to real discards the imaginary part
  return ~np.isfinite(values.astype('float64'))
.............E..EE.E...E......................................................................................................................................S...S.............SSS.........S...........
.SS.............SS.....SSSS.............................................................................................................................................................................
.................................................S..............................S.......................................................................................................................
..................................................................................................
======================================================================
ERROR: test_nankurt (pandas.tests.test_nanops.TestnanopsDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 373, in test_nankurt
    allow_complex=False, allow_str=False, allow_date=False)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 213, in check_funs
    self.check_fun(testfunc, targfunc, 'arr_float', **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 202, in check_fun
    testarval, targarval, targarnanval, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 160, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 43, in _f
    return f(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 483, in nankurt
    count = _get_counts(mask, axis)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 542, in _get_counts
    count = (mask.shape[axis] - mask.sum(axis)).astype(float)
AttributeError: ("'long' object has no attribute 'astype'", 'axis: 0 of 0', 'skipna: False', 'kwargs: {}', 'testar: arr_float', 'targar: arr_float', 'targarnan: arr_float')

======================================================================
ERROR: test_nanmax (pandas.tests.test_nanops.TestnanopsDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 320, in test_nanmax
    allow_str=False, allow_obj=False)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 241, in check_funs
    self.check_fun(testfunc, targfunc, 'arr_tdelta', **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 202, in check_fun
    testarval, targarval, targarnanval, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 160, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 88, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 418, in nanmax
    return _maybe_null_out(result, axis, mask)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 552, in _maybe_null_out
    if null_mask.any():
AttributeError: ("'bool' object has no attribute 'any'", 'axis: 0 of 0', 'skipna: False', 'kwargs: {}', 'testar: arr_tdelta', 'targar: arr_tdelta', 'targarnan: arr_tdelta')

======================================================================
ERROR: test_nanmean (pandas.tests.test_nanops.TestnanopsDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 290, in test_nanmean
    allow_str=False, allow_date=False)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 213, in check_funs
    self.check_fun(testfunc, targfunc, 'arr_float', **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 202, in check_fun
    testarval, targarval, targarnanval, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 160, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 43, in _f
    return f(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 88, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 261, in nanmean
    count = _get_counts(mask, axis)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 542, in _get_counts
    count = (mask.shape[axis] - mask.sum(axis)).astype(float)
AttributeError: ("'long' object has no attribute 'astype'", 'axis: 0 of 0', 'skipna: False', 'kwargs: {}', 'testar: arr_float', 'targar: arr_float', 'targarnan: arr_float')

======================================================================
ERROR: test_nanmin (pandas.tests.test_nanops.TestnanopsDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 315, in test_nanmin
    allow_str=False, allow_obj=False)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 241, in check_funs
    self.check_fun(testfunc, targfunc, 'arr_tdelta', **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 202, in check_fun
    testarval, targarval, targarnanval, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 160, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 88, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 388, in nanmin
    return _maybe_null_out(result, axis, mask)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 552, in _maybe_null_out
    if null_mask.any():
AttributeError: ("'bool' object has no attribute 'any'", 'axis: 0 of 0', 'skipna: False', 'kwargs: {}', 'testar: arr_tdelta', 'targar: arr_tdelta', 'targarnan: arr_tdelta')

======================================================================
ERROR: test_nanskew (pandas.tests.test_nanops.TestnanopsDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 365, in test_nanskew
    allow_complex=False, allow_str=False, allow_date=False)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 213, in check_funs
    self.check_fun(testfunc, targfunc, 'arr_float', **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 202, in check_fun
    testarval, targarval, targarnanval, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 188, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_nanops.py", line 160, in check_fun_data
    **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 43, in _f
    return f(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 449, in nanskew
    count = _get_counts(mask, axis)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py", line 542, in _get_counts
    count = (mask.shape[axis] - mask.sum(axis)).astype(float)
AttributeError: ("'long' object has no attribute 'astype'", 'axis: 0 of 0', 'skipna: False', 'kwargs: {}', 'testar: arr_float', 'targar: arr_float', 'targarnan: arr_float')

----------------------------------------------------------------------
Ran 7366 tests in 516.370s

FAILED (SKIP=135, errors=5)

C:\Users\Jeff Reback\Documents\GitHub\pandas>c:\python27-64\Scripts\nosetests.exe build\lib.win-amd64-2.7\pandas\tests\test_nanops.py --pdb --pdb-failure
......C:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py:193: ComplexWarning: Casting complex values to real discards the imaginary part
  return ~np.isfinite(values.astype('float64'))
.............> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py(542)_get_counts()
-> count = (mask.shape[axis] - mask.sum(axis)).astype(float)
(Pdb) l
537         return result
538
539
540     def _get_counts(mask, axis):
541         if axis is not None:
542  ->         count = (mask.shape[axis] - mask.sum(axis)).astype(float)
543         else:
544             count = float(mask.size - mask.sum())
545
546         return count
547
(Pdb) p mask
array([False, False, False, False, False, False, False, False, False,
       False, False], dtype=bool)
(Pdb) p axis
0
(Pdb) p mask.ndim
1
(Pdb) p mask.shape[axis]-mask.sum(axis)
11L
(Pdb) p mask.sum(axis)
0
(Pdb) u
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\nanops.py(483)nankurt()
-> count = _get_counts(mask, axis)
(Pdb) l
478     def nankurt(values, axis=None, skipna=True):
479         if not isinstance(values.dtype.type, np.floating):
480             values = values.astype('f8')
481
482         mask = isnull(values)
483  ->     count = _get_counts(mask, axis)
484
485         if skipna:
486             values = values.copy()
487             np.putmask(values, mask, 0)
488
(Pdb) p values
array([-0.38825224,  2.25028687,  0.97792431,  0.05118711, -0.38908183,
       -1.25383019, -0.97858595,  0.50348946,  0.91971294,  0.18107761,
       -0.97499552])
(Pdb) p mask
array([False, False, False, False, False, False, False, False, False,
       False, False], dtype=bool)
(Pdb)

@toddrjen
Copy link
Contributor Author

The problem seems to be that the values are getting converted to a python long instead of a numpy scalar. The workaround is easy, I can either include it in a separate patch or as part of the next one.

But could you also try the following code and see what you get?

a=np.zeros(11).astype('bool')
b=a.shape[0] - a.sum(0)
type(b)

I have tried this on linux, winpython3 x64, and pythonxy and in all of them I get numpy.int64. Based on the unit test results, I would expect that you get long or int.

@toddrjen
Copy link
Contributor Author

At least looking at the last error, for nanskew, it doesn't appear that I have touched any of the code in that code path, so this doesn't seem to be a new bug.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

All of the errors are related to the changes for _get_counts

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Jeff Reback>c:\python27-64\python
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'1.8.0'
>>> a = np.zeros(11).astype('bool')
>>> b = a.shape[0]-a.sum(0)
>>> type(b)
<type 'long'>

@toddrjen
Copy link
Contributor Author

There were no changes to _get_counts. Looking at the blame, it hasn't been touched since 2011.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

it wasn't broken before I merged your first PR.

@toddrjen
Copy link
Contributor Author

The first version of test_nanops.py didn't test 1D arrays at all, so it wouldn't have identified this problem.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

ok, its a problem now. pls have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: nanops.nanmedian doesn't work when axis=None
2 participants