Skip to content

TST: compat with numpy 1.14 #18123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Nov 5, 2017 · 8 comments · Fixed by #18157
Closed

TST: compat with numpy 1.14 #18123

jreback opened this issue Nov 5, 2017 · 8 comments · Fixed by #18157
Labels
Compat pandas objects compatability with Numpy or Python functions Testing pandas testing functions or related to the test suite
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

I think this is a very recent change in numpy in how ndarrays are printed. So we would conditionally change the expected if not _np_version_under1p14

https://travis-ci.org/pandas-dev/pandas/jobs/297507212

____________________ TestDataFrameDataTypes.test_astype_str ____________________
[gw0] linux -- Python 3.6.3 /home/travis/miniconda3/envs/pandas/bin/python
self = <pandas.tests.frame.test_dtypes.TestDataFrameDataTypes object at 0x7f31d2d6d748>
    def test_astype_str(self):
        # GH9757
        a = Series(date_range('2010-01-04', periods=5))
        b = Series(date_range('3/6/2012 00:00', periods=5, tz='US/Eastern'))
        c = Series([Timedelta(x, unit='d') for x in range(5)])
        d = Series(range(5))
        e = Series([0.0, 0.2, 0.4, 0.6, 0.8])
    
        df = DataFrame({'a': a, 'b': b, 'c': c, 'd': d, 'e': e})
    
        # datetimelike
        # Test str and unicode on python 2.x and just str on python 3.x
        for tt in set([str, compat.text_type]):
            result = df.astype(tt)
    
            expected = DataFrame({
                'a': list(map(tt, map(lambda x: Timestamp(x)._date_repr,
                                      a._values))),
                'b': list(map(tt, map(Timestamp, b._values))),
                'c': list(map(tt, map(lambda x: Timedelta(x)
                                      ._repr_base(format='all'), c._values))),
                'd': list(map(tt, d._values)),
                'e': list(map(tt, e._values)),
            })
    
            assert_frame_equal(result, expected)
    
        # float/nan
        # 11302
        # consistency in astype(str)
        for tt in set([str, compat.text_type]):
            result = DataFrame([np.NaN]).astype(tt)
            expected = DataFrame(['nan'])
            assert_frame_equal(result, expected)
    
            result = DataFrame([1.12345678901234567890]).astype(tt)
            expected = DataFrame(['1.12345678901'])
>           assert_frame_equal(result, expected)
pandas/tests/frame/test_dtypes.py:535: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1397: in assert_frame_equal
    obj='DataFrame.iloc[:, {idx}]'.format(idx=i))
pandas/util/testing.py:1276: in assert_series_equal
    obj='{obj}'.format(obj=obj))
pandas/_libs/testing.pyx:59: in pandas._libs.testing.assert_almost_equal
    cpdef assert_almost_equal(a, b,
pandas/_libs/testing.pyx:173: in pandas._libs.testing.assert_almost_equal
    raise_assert_detail(obj, msg, lobj, robj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
obj = 'DataFrame.iloc[:, 0]'
message = 'DataFrame.iloc[:, 0] values are different (100.0 %)'
left = '[1.1234567890123457]', right = '[1.12345678901]', diff = None
    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        elif is_categorical_dtype(left):
            left = repr(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
        elif is_categorical_dtype(right):
            right = repr(right)
    
        msg = """{obj} are different
    
    {message}
    [left]:  {left}
    [right]: {right}""".format(obj=obj, message=message, left=left, right=right)
    
        if diff is not None:
            msg += "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: DataFrame.iloc[:, 0] are different
E       
E       DataFrame.iloc[:, 0] values are different (100.0 %)
E       [left]:  [1.1234567890123457]
E       [right]: [1.12345678901]
pandas/util/testing.py:1093: AssertionError
@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Novice Testing pandas testing functions or related to the test suite labels Nov 5, 2017
@jreback jreback added this to the 0.21.1 milestone Nov 5, 2017
@jreback
Copy link
Contributor Author

jreback commented Nov 5, 2017

in 1.13.3

In [19]: DataFrame([1.12345678901234567890]).astype(str)
Out[19]: 
               0
0  1.12345678901

cc @charris

@charris
Copy link

charris commented Nov 5, 2017

1.13.3 or current 1,14? In any case, this is probably numpy/numpy#9941. NumPy now has its own value -> string conversion functions and there will probably be some small changes in the output. However, the strings should maintain value on back conversion.

@charris
Copy link

charris commented Nov 5, 2017

Although back conversion doesn't succeed here.

In [3]: a
Out[3]: array([1.12345679])

In [4]: b = array([1.12345679])

In [5]: a == b
Out[5]: array([False], dtype=bool)

So there may be other things going on.

@charris
Copy link

charris commented Nov 5, 2017

@ahaldane Thoughts?

@charris
Copy link

charris commented Nov 5, 2017

Yeah, just looks like a printing change

In [1]: a = array([1.12345678901234567890])

In [2]: a[0]
Out[2]: 1.1234567890123457

@jreback
Copy link
Contributor Author

jreback commented Nov 5, 2017

yep i think we can just fix the test on our side

@ahaldane
Copy link

ahaldane commented Nov 5, 2017

I see what is going on here. Numpy's casting code actually uses str with a python-float as intermediate, which drops the extra precision.

When casting from a f8 array to an S array, numpy essentially does this:

for i in range(len(arr)):
    dst[i] = str(float(src[i]))

and that's using python's str and float functions. Note that str(float) in python truncates at 10 digits while the repr prints all necessary digits.

We can see this more clearly as follows using the numpy casting loop:

>>> a = np.array([1.12345678901234567890])
>>> b = np.zeros(1, dtype='S20')
>>> b[:] = a
>>> b
array(['1.12345678901'],
      dtype='|S20')

Now let's avoid numpy's casting code by assigning directly:

>>> b[0] = a[0]
>>> b
array(['1.1234567890123457'],
      dtype='|S20')

Compare to:

>>> b[0] = str(float(a[0]))
>>> b
array(['1.12345678901'],
      dtype='|S20')

I see a few possible things we could do to make your tests work:

  1. In numpy_gh-9941 I made the str(np.float64) output full precision. I could roll that back so it only outputs 8 digits, to be like python str(float).
  2. It is easy to make the np.float -> str casting code use repr instead of str. That would also make it so we can round-trip floats through the casts. However, it would be a behavior change for all casts to string type. (Also, it wouldn't be right for float128)
  3. Write more careful casting-code for np.float -> string. A lot of work, I don't really want to do it right now.

I might just try out option 1.

@ahaldane
Copy link

ahaldane commented Nov 6, 2017

Note that this is only a problem in python2, since str(float) is truncaded only there. In python3, both str(float) and repr(float) output all the digits.

This means your test probably fails in python3, even I don't precisely understand why our recent changes affected this test the way it did. Pandas probably has an overriden astype function that calls str(np.float64()) somehow.

In any case, in numpy 1.14 we are planning not to truncate the str, even in python2. So I think you will need to add a few digits of precision here.

jreback added a commit to jreback/pandas that referenced this issue Nov 7, 2017
COMPAT: compat with numpy >= 1.14 on str repr

closes pandas-dev#18123
jreback added a commit to jreback/pandas that referenced this issue Nov 8, 2017
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes pandas-dev#18123
jreback added a commit to jreback/pandas that referenced this issue Nov 8, 2017
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes pandas-dev#18123
jreback added a commit that referenced this issue Nov 8, 2017
CI: don't show miniconda output on install
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes #18123
watercrossing pushed a commit to watercrossing/pandas that referenced this issue Nov 10, 2017
…s-dev#18157)

CI: don't show miniconda output on install
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes pandas-dev#18123
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
…s-dev#18157)

CI: don't show miniconda output on install
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes pandas-dev#18123
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017
…s-dev#18157)

CI: don't show miniconda output on install
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes pandas-dev#18123

(cherry picked from commit 8dac633)
TomAugspurger pushed a commit that referenced this issue Dec 11, 2017
CI: don't show miniconda output on install
COMPAT: compat with numpy >= 1.14 on str repr
TST: temp disable python-dateutil from master

closes #18123

(cherry picked from commit 8dac633)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants