Skip to content

API design discussion #1844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
halleygithub opened this issue Sep 5, 2012 · 5 comments
Closed

API design discussion #1844

halleygithub opened this issue Sep 5, 2012 · 5 comments

Comments

@halleygithub
Copy link

kk=DataFrame({'a':[1,0,3,5],'b':[2,4,6,8]})
kk['c']=kk.b/kk.a

kk
a b c
0 1 2 2
1 0 4 0
2 3 6 2
3 5 8 1

Two issues:
(1) I don't think kk[1,'c']=0 is a suitable answer
(2) I don't think kk[3,'c']=1 is a suitable answer, it should be 1.6

while kk['c']=1.0*kk.b/kk.a give an OK answer

kk
a b c
0 1 2 2.0
1 0 4 NaN
2 3 6 2.0
3 5 8 1.6

So pls treat 'int/int' to 'float/float', for current 'int/int' treatment will cause difficult checking error

@lodagro
Copy link
Contributor

lodagro commented Sep 5, 2012

  1. This is numpy like behavior, division by zero always yields zero in integer arithmetic. You can flag div by zero using np.seterrr
In [1]: import pandas

In [2]: import numpy as np

In [3]: a = [1,0,3,5]

In [4]: b = [2,4,6,8]

In [5]: kk = pandas.DataFrame({'a': a, 'b': b})

In [6]: np.array(b) / np.array(a)
Out[6]: array([2, 0, 2, 1])

In [7]: kk['c'] = kk.b / kk.a

In [8]: kk
Out[8]: 
   a  b  c
0  1  2  2
1  0  4  0
2  3  6  2
3  5  8  1

In [9]: np.seterr(divide='raise')
Out[9]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ignore'}

In [10]: kk['c'] = kk.b / kk.a
---------------------------------------------------------------------------
FloatingPointError                        Traceback (most recent call last)
...
FloatingPointError: divide by zero encountered in divide
  1. This is the way integer division works in python
In [12]: 8 / 5
Out[12]: 1

@wesm
Copy link
Member

wesm commented Sep 8, 2012

What you are asking for is Python3-like behavior. To get this, you need to do from __future__ import division:

In [3]: from __future__ import division

In [4]: kk.b / kk.a
Out[4]: 
0    2.0
1    NaN
2    2.0
3    1.6

@wesm wesm closed this as completed Sep 8, 2012
@halleygithub
Copy link
Author

thanks..

------------------ 原始邮件 ------------------
发件人: "Wes McKinney"[email protected];
发送时间: 2012年9月8日(星期六) 上午10:15
收件人: "pydata/pandas"[email protected];
抄送: "halleygithub"[email protected];
主题: Re: [pandas] Calculation error or bad design (#1844)

What you are asking for is Python3-like behavior. To get this, you need to do from future import division:
In [3]: from future import division In [4]: kk.b / kk.a Out[4]: 0 2.0 1 NaN 2 2.0 3 1.6

Reply to this email directly or view it on GitHub.

@halleygithub
Copy link
Author

Hi, Wes or other Pandas guys,

I strongly suggest to let Pandas treat dataframe index as the same as the common columns (modify the underground structure or add syntax sugar to wrap it) . As it will lower the learning curve and simplify the syntax.

I can 'df.col_name' to get the column data, but I can't 'df.index_name' to get the index data , I can 'df[df[col_name]==0]' to filter/query on the common column, but I can not do that on index (yes, there is index-type filter/query functions, but it is different command) , and more ........ So in order to filter/query/aggregate/apply/.. with index, I have to 'reset_index'->filter/query/aggregate/apply/...->'set_index'..

So there exists two set of function to slice/query/..., one for common columns, one for index (and what's worse, we have 'multilevel index' and 'single level index', the syntax to deal with them seems to be different too).

In my understanding, index is a special column, thought it is special, it should also be a column. So the function apply to common columns should be able to apply to index as well.

I quite familiar with SQL. SQL database has 'key', and the way to deal with 'key' is same as common columns, so you can use 'select' in a unified way.

I want to use Python to develop a business tool and need a '2D-matrix' datatype which Python/numpy/dict_of_dict can not handle it simply & elegantly. So I found Pandas. But Pandas costs me a lot of time & effort to find/know/diffirenciate its philosophy/function on index and columns. Stackoverflow can prove that ..

I am a Pandas newbie and I might be wrong or mis-use the terms, but the 'newbie' role gives me a valuable opportunity to experience the learning curve of Pandas as a common end user.

So again, can Pandas unifiy the function to process (slice/filter/query/aggregate/apply..) index and columns ?

Thanks,
Halley

@jreback
Copy link
Contributor

jreback commented Jan 17, 2015

pandas defaults true div since 0.14 IIRC. The other issue mentioned is detailed in several issues.

@jreback jreback closed this as completed Jan 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants