Skip to content

indexed assignment does not work for dataframe #5938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RayVR opened this issue Jan 14, 2014 · 5 comments
Closed

indexed assignment does not work for dataframe #5938

RayVR opened this issue Jan 14, 2014 · 5 comments

Comments

@RayVR
Copy link

RayVR commented Jan 14, 2014

In [82]: d = {'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']}
In [83]: df = pd.DataFrame(d)
In [84]: df
Out[84]:
   a  b    c
0  0  a    a
1  1  b    b
2  2  .  NaN
3  3  .    d
In [85]: df[['c']][pd.isnull(df.c)] = df[['b']][pd.isnull(df.c)]
In [86]: df
Out[86]:
   a  b    c
0  0  a    a
1  1  b    b
2  2  .  NaN
3  3  .    d

In [87]: df['c'][pd.isnull(df.c)] = df[['b']][pd.isnull(df.c)]

In [88]: df
Out[88]:
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

@jreback
Copy link
Contributor

jreback commented Jan 14, 2014

You are doing a chained assignment to a copy, see here:

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy

Use loc/ix/iloc

In [8]: df.loc[pd.isnull(df.c),'c'] = df.loc[pd.isnull(df.c),'b']

In [9]: df
Out[9]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

@jreback jreback closed this as completed Jan 14, 2014
@RayVR
Copy link
Author

RayVR commented Jan 14, 2014

this does not work for the same reason.

In [96]: df.loc[pd.isnull(df.c), ['c']] = df.loc[pd.isnull(df.c), ['b']]

In [97]: df
Out[97]:
   a  b    c
0  0  a    a
1  1  b    b
2  2  .  NaN
3  3  .    d

this operation works on Series objects just fine, but if the column(s) selection is done like ['c'] it does not work. I've read the docs but this still doesn't work in the non-trivial case of assigning to a subset of n columns. What have I missed?

@jreback
Copy link
Contributor

jreback commented Jan 14, 2014

you are using 0.13? This works on master; I think this was just fixed on 0.13

you can do as a work-around:

In [6]: df.loc[pd.isnull(df.c), 'c'] = df.loc[pd.isnull(df.c), 'b']

In [7]: df
Out[7]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

@jreback
Copy link
Contributor

jreback commented Jan 14, 2014

You can also do these. The rhs is aligned to what is selected on the lhs. As long as its a frame/series it should work (if its an ndarray/list its tricker as their are no labels to exiplicity to align)

In [8]: df = pd.DataFrame({'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']})

In [9]: df.loc[pd.isnull(df.c), 'c'] = df['b']

In [10]: df
Out[10]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

In [11]: df = pd.DataFrame({'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']})

In [12]: df.loc[pd.isnull(df.c), ['c']] = df['b']

In [13]: df
Out[13]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

@RayVR
Copy link
Author

RayVR commented Jan 14, 2014

I'm using the latest version I can get from Anaconda distribution, which is 0.12. I'll see if I can get 0.13. This solution works for this example but I'll have to see if it generalizes to the more complex use case. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants