API design discussion #1844

halleygithub · 2012-09-05T16:51:32Z

kk=DataFrame({'a':[1,0,3,5],'b':[2,4,6,8]})
kk['c']=kk.b/kk.a

kk
a b c
0 1 2 2
1 0 4 0
2 3 6 2
3 5 8 1

Two issues:
(1) I don't think kk[1,'c']=0 is a suitable answer
(2) I don't think kk[3,'c']=1 is a suitable answer, it should be 1.6

while kk['c']=1.0*kk.b/kk.a give an OK answer

kk
a b c
0 1 2 2.0
1 0 4 NaN
2 3 6 2.0
3 5 8 1.6

So pls treat 'int/int' to 'float/float', for current 'int/int' treatment will cause difficult checking error

lodagro · 2012-09-05T20:10:29Z

This is numpy like behavior, division by zero always yields zero in integer arithmetic. You can flag div by zero using np.seterrr

In [1]: import pandas

In [2]: import numpy as np

In [3]: a = [1,0,3,5]

In [4]: b = [2,4,6,8]

In [5]: kk = pandas.DataFrame({'a': a, 'b': b})

In [6]: np.array(b) / np.array(a)
Out[6]: array([2, 0, 2, 1])

In [7]: kk['c'] = kk.b / kk.a

In [8]: kk
Out[8]: 
   a  b  c
0  1  2  2
1  0  4  0
2  3  6  2
3  5  8  1

In [9]: np.seterr(divide='raise')
Out[9]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ignore'}

In [10]: kk['c'] = kk.b / kk.a
---------------------------------------------------------------------------
FloatingPointError                        Traceback (most recent call last)
...
FloatingPointError: divide by zero encountered in divide

This is the way integer division works in python

In [12]: 8 / 5
Out[12]: 1

wesm · 2012-09-08T02:15:00Z

What you are asking for is Python3-like behavior. To get this, you need to do from __future__ import division:

In [3]: from __future__ import division

In [4]: kk.b / kk.a
Out[4]: 
0    2.0
1    NaN
2    2.0
3    1.6

halleygithub · 2012-09-08T10:37:42Z

thanks..

------------------ 原始邮件 ------------------
发件人: "Wes McKinney"[email protected];
发送时间: 2012年9月8日(星期六) 上午10:15
收件人: "pydata/pandas"[email protected];
抄送: "halleygithub"[email protected];
主题: Re: [pandas] Calculation error or bad design (#1844)

What you are asking for is Python3-like behavior. To get this, you need to do from future import division:
In [3]: from future import division In [4]: kk.b / kk.a Out[4]: 0 2.0 1 NaN 2 2.0 3 1.6
—
Reply to this email directly or view it on GitHub.

halleygithub · 2012-09-08T12:26:49Z

Hi, Wes or other Pandas guys,

I strongly suggest to let Pandas treat dataframe index as the same as the common columns (modify the underground structure or add syntax sugar to wrap it) . As it will lower the learning curve and simplify the syntax.

I can 'df.col_name' to get the column data, but I can't 'df.index_name' to get the index data , I can 'df[df[col_name]==0]' to filter/query on the common column, but I can not do that on index (yes, there is index-type filter/query functions, but it is different command) , and more ........ So in order to filter/query/aggregate/apply/.. with index, I have to 'reset_index'->filter/query/aggregate/apply/...->'set_index'..

So there exists two set of function to slice/query/..., one for common columns, one for index (and what's worse, we have 'multilevel index' and 'single level index', the syntax to deal with them seems to be different too).

In my understanding, index is a special column, thought it is special, it should also be a column. So the function apply to common columns should be able to apply to index as well.

I quite familiar with SQL. SQL database has 'key', and the way to deal with 'key' is same as common columns, so you can use 'select' in a unified way.

I want to use Python to develop a business tool and need a '2D-matrix' datatype which Python/numpy/dict_of_dict can not handle it simply & elegantly. So I found Pandas. But Pandas costs me a lot of time & effort to find/know/diffirenciate its philosophy/function on index and columns. Stackoverflow can prove that ..

I am a Pandas newbie and I might be wrong or mis-use the terms, but the 'newbie' role gives me a valuable opportunity to experience the learning curve of Pandas as a common end user.

So again, can Pandas unifiy the function to process (slice/filter/query/aggregate/apply..) index and columns ?

Thanks,
Halley

jreback · 2015-01-17T18:00:07Z

pandas defaults true div since 0.14 IIRC. The other issue mentioned is detailed in several issues.

wesm closed this as completed Sep 8, 2012

wesm reopened this Sep 26, 2012

ghost mentioned this issue Dec 12, 2012

ENH: df.grep(col,pat) and df.dselect(col,"expr") #2460

Closed

jreback closed this as completed Jan 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API design discussion #1844

API design discussion #1844

halleygithub commented Sep 5, 2012

lodagro commented Sep 5, 2012

Uh oh!

wesm commented Sep 8, 2012

Uh oh!

halleygithub commented Sep 8, 2012

Uh oh!

halleygithub commented Sep 8, 2012

Uh oh!

jreback commented Jan 17, 2015

Uh oh!

Uh oh!

API design discussion #1844

API design discussion #1844

Comments

halleygithub commented Sep 5, 2012

lodagro commented Sep 5, 2012

Uh oh!

wesm commented Sep 8, 2012

Uh oh!

halleygithub commented Sep 8, 2012

Uh oh!

halleygithub commented Sep 8, 2012

Uh oh!

jreback commented Jan 17, 2015

Uh oh!