Skip to content

ENH: allow in-line expression assignment with df.eval #5343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 27, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 26, 2013

This was a relatively easy extension of eval to allow in-line creation/assignment.

Allows one to basically use formulas to do things (pandas conquers excel!!!!)

  • docs
  • tests non-frame
In [11]: df = DataFrame(dict(a = range(5), b = range(5,10)))

In [12]: df
Out[12]: 
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9

In [13]: df.eval('c=a+b')

In [14]: df.eval('d=a+b+c')

In [15]: df.eval('a=1')

In [16]: df
Out[16]: 
   a  b   c   d
0  1  5   5  10
1  1  6   7  14
2  1  7   9  18
3  1  8  11  22
4  1  9  13  26

You can do this (this could maybe have a bit better syntax though)

In [31]: df = DataFrame(dict(a = range(5), b = range(5,10)))

In [32]: formulas = Series(['c=a+b','d=a*b'],index=['a','b'])

In [33]: df.apply(lambda x: df.eval(formulas[x.name]))
Out[33]: 
a    None
b    None
dtype: object

In [34]: df
Out[34]: 
   a  b   c   d
0  0  5   5   0
1  1  6   7   6
2  2  7   9  14
3  3  8  11  24
4  4  9  13  36

@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2013

@cpcloud what do you think?

return self.visit(expr, **kwargs)

# allow a single assignment
if isinstance(expr, ast.Assign):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to put this in a visit_Assign method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try that because we have one which basically switches the Assign to a Compare and I think its needed for that? (e.g. if its on the rhs of an expression). I also wanted to ensure only a single assignment, e.g. a = b = c is invalid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's only for PyTables, though. Not meaning to be nitpicky.

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

What happens with overlapping locals?

@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2013

you mean like

a = 5
df.eval('a = b + c')

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

yep

@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2013

I actually don't think its a problem because in this case the local doesn't matter

my question to use is, I am using env.resolvers[0] to figure out whom to do the assignment to, (its there when you call df.eval, but not when you call pd.eval). is that the 'right' way?

@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2013

I think this works as expected (put tests in for this)

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('ab'))

In [2]: a = 1

In [3]: df.eval('a=1+b')

In [4]: df
Out[4]: 
          a         b
0 -0.641769 -1.641769
1  1.342992  0.342992
2  1.688316  0.688316
3  0.321509 -0.678491
4  1.652393  0.652393

In [5]: df.eval('a=a+b')

NameResolutionError: resolvers and locals overlap on names ['a']

In [6]: df.eval('a=@a+b')

In [7]: df
Out[7]: 
          a         b
0 -0.641769 -1.641769
1  1.342992  0.342992
2  1.688316  0.688316
3  0.321509 -0.678491
4  1.652393  0.652393

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

If you pass in resolvers explicitly (I actually think this should be disallowed and added if people want it), then resolvers[0] will not work in some cases. One thing that it sort of breaks is that variable look ups occur in the order of the resolvers, so shouldn't assignment be attempted until it either succeeds or exhausts all of the resolvers?

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

I wonder if an assign(name, value) method on Scope would be useful here?

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

I suppose __setitem__ could be enlisted to do that.

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

FWIW I would like to eventually gut Scope and use ChainMap (in Python 3 collections) that is essentially Scope, but much less hacky and basically behaves like dict with a few bells a whistles to control the order of lookup. I'll do that for 0.14.

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

At some point I should also add vars(pd.compat.builtin) to locals so that builtin constants are there.

@cpcloud
Copy link
Member

cpcloud commented Oct 26, 2013

ChainMap is pretty cool.

TST: tests for local name overlaps

ENH: moved assign to visit_Assign from visit_Module
@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2013

ok...now use visit_Assign pretty easy, just had to figure out that _visitor actually has the data..

@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2013

on the resolvers.

I think that I need to have an explicit way of having the target recorded by df.eval, maybe i'll just add a target ky to Scope?

TST: addtional tests for multiple assignment, targets
ENH: add target to Scope, use instead of resolvers
@jreback
Copy link
Contributor Author

jreback commented Oct 27, 2013

@cpcloud all updated now....take a look when you have a chance, added a short note in the docs too.

orig_df = df.copy()

# multiple assignees
self.assertRaises(SyntaxError, df.eval, 'd c = a + b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be switched with test right below? This is not valid Python syntax while the one below is valid Python syntax, but not valid eval syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check but iirc it fails in the assign block with a left hand side that is a list with a length of 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange ... that should simply throw the usual syntax error and not do any parsing outside of Python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right 'd c = a + b' raises a SyntaxError in fix_missing_locations....I think I meant the a = b = c, which ends up having multiple assignment nodes (which I raise a Syntax Error); though in theory in the future you could handle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u change the test to make sure the multiple assignment fails? again not meaning to be a troll....just want to make sure the correct error is checked

@jreback
Copy link
Contributor Author

jreback commented Oct 27, 2013

otherwise look ok @cpcloud ?

@cpcloud
Copy link
Member

cpcloud commented Oct 27, 2013

yep looks okay

# multiple assignment
df = orig_df.copy()
df.eval('c = a + b')
self.assertRaises(SyntaxError, df.eval, 'c = a = b')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud did you mean something additional besides this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nope! sorry didn't see that 👍

jreback added a commit that referenced this pull request Oct 27, 2013
ENH: allow in-line expression assignment with df.eval
@jreback jreback merged commit f6f06aa into pandas-dev:master Oct 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants