Skip to content

Conversation

@deniederhut
Copy link
Contributor

@deniederhut deniederhut commented Jul 15, 2017

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2017

Hello @deniederhut! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 04, 2017 at 12:16 Hours UTC

You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a
if you try to use attribute access to create a new column, it issues a `UserWarning` and creates a new attribute rather than a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-backticks around UserWarning

if (self.ndim > 1) and (is_list_like(value)):
warnings.warn("Pandas doesn't allow Series to be assigned "
"into nonexistent columns - see "
"https://pandas.pydata.org/pandas-docs/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need a stacklevel=2 (or maybe higher here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed -- not sure why GH isn't picking it up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use if isinstance(self, ABCDataFrame) and is_list_like(value):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change here

with catch_warnings(record=True) as w:
self.series.not_an_index = [1, 2]
assert len(w) == 0 # fail if false warning on Series
with pytest.warns(UserWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_raises_warning

assert isinstance(self.sparse_array, gt.ABCSparseArray)
assert isinstance(self.categorical, gt.ABCCategorical)
assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod)
with catch_warnings(record=True) as w:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you testing here?

assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod)
with catch_warnings(record=True) as w:
self.series.not_an_index = [1, 2]
assert len(w) == 0 # fail if false warning on Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to a new test add the issue number as a comment

@jreback
Copy link
Contributor

jreback commented Jul 15, 2017

Replicate this test from the original issue (and assert each of these things).

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

# this adds a column to df
df['three'] = df.two + 1

# this successfully modifies an existing column
df.one += 1

# this seems to work but does not add a column to df
df.four = df.two + 2

print df.three
print df.four
print df

Here is the warning.IOW assignment with a pandas object to a non-existing attribute (actually would be ok with this raising I think), though this might break people downstream.

# this seems to work but does not add a column to df
df.four = df.two + 2

@jreback
Copy link
Contributor

jreback commented Jul 15, 2017

also would love to warn on this: #5904 as well.

@deniederhut
Copy link
Contributor Author

Wilco. Still in test_generic.py?

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Jul 15, 2017
@jreback
Copy link
Contributor

jreback commented Jul 15, 2017

Wilco. Still in test_generic.py?

yeah that seems reasonable

@codecov
Copy link

codecov bot commented Jul 15, 2017

Codecov Report

Merging #16951 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16951      +/-   ##
==========================================
- Coverage   90.98%   90.96%   -0.02%     
==========================================
  Files         161      161              
  Lines       49288    49290       +2     
==========================================
- Hits        44846    44839       -7     
- Misses       4442     4451       +9
Flag Coverage Δ
#multiple 88.74% <100%> (ø) ⬆️
#single 40.2% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 92.3% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.76% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c498f8...cfe1bad. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 15, 2017

Codecov Report

Merging #16951 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16951      +/-   ##
==========================================
- Coverage   90.99%   90.98%   -0.02%     
==========================================
  Files         162      162              
  Lines       49508    49512       +4     
==========================================
- Hits        45052    45047       -5     
- Misses       4456     4465       +9
Flag Coverage Δ
#multiple 88.76% <100%> (ø) ⬆️
#single 40.27% <40%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 92.03% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.66% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 929c66f...b86546e. Read the comment docs.

You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a
if you try to use attribute access to create a new column, it issues a ```UserWarning`` and creates a new attribute rather than a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a: starting in 0.21.0

def _set_item(self, key, value):
if callable(getattr(self, key, None)):
warnings.warn("Pandas doesn't allow attribute-like access to "
"columns whose names collide with methods",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say that this is a collision

we actually DO allow it (maybe should raise but let's start with a warning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning says that attribute-like access is not allowed. E.g.

df = pd.DataFrame({'a': [1, 2]})
df['sum'] = [3, 4]
print(df.sum)
<bound method DataFrame.sum of    a  sum
0  1    3
1  2    4>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just want to see a more verbose message that this is not recommended
we could also raise but that's a bit harsh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... how about something like:

Column name '{key}' collides with a built-in method, which will cause unexpected attribute behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

df = pd.DataFrame({'names': ['a', 'b', 'c']}, index=multi_index)
sparse_series = pd.Series([1, 2, 3]).to_sparse()
sparse_array = pd.SparseArray(np.random.randn(10))
series = pd.Series([1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can remove this

assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod)


class TestABCWarnings(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u don't need a class here
u can just do this as a function (more pytest idiomatic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything in the contributing guide about test style, so I was following the style of the code nearby.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.df['three'] = self.df.two + 1
assert len(w) == 0
assert self.df.three.sum() > self.df.two.sum()
with catch_warnings(record=True) as w:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put blank lines in between these

self.df.four = self.df.two + 2
with tm.assert_produces_warning(UserWarning):
# warn when column has same name as method
self.df['sum'] = self.df.two
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test with self.df.sum as well

return result

def _set_item(self, key, value):
if callable(getattr(self, str(key), None)):
Copy link
Contributor

@jreback jreback Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if its a string first, e.g.

if isinstance(key, compat.string_type) and callable(getattr(self, key, None))):
 ......

df = pd.DataFrame(d)

with catch_warnings(record=True) as w:
# successfully add new column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a statement that the first 3 should NOT be showing warnigns (I know you are checking which is fine)

@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

@deniederhut looks pretty good. rebase on master, some changes and you prob need to force push.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

also needs a whatsnew note; put this in the api-breaking changes section. I think we may need a full on example to highlite this.

@deniederhut
Copy link
Contributor Author

Wilco.

And I might be missing something, but we are just issuing warnings, right? This shouldn't break any clients unless they are configured to raise warnings as exceptions?

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

yeah isn't it just a warning then no big deal

however I think we may want to consider making this an error in the future (so can make an issue for that)

@deniederhut
Copy link
Contributor Author

Would you like me to add to the whatsnew that these warnings are subject to escalation in future releases?

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

no but pls open an issue to re-evaluate in the future

@deniederhut deniederhut force-pushed the fix/setting-into-attribute branch from 6216954 to 84f04d9 Compare July 19, 2017 01:18
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs look good! some small changes.

In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In[2]: df.two = [4, 5, 6]

which does not raise any obvious exceptions, but also does not create a new column:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which -> This

1 2.0
2 3.0

and creating a column whose name collides with a method or attribute already in the instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a column (use sentences if possible here).

.. code-block:: ipython

In[2]: df.two = [4, 5, 6]
UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this whole section in indexing.rst (below the attribute access section). Put a reference to it from the whatsnew (you can leave some of it here as well; usually we make the whatsnew a smaller version of what we put in the docs proper).

if (self.ndim > 1) and (is_list_like(value)):
warnings.warn("Pandas doesn't allow Series to be assigned "
"into nonexistent columns - see "
"https://pandas.pydata.org/pandas-docs/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change here

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Jul 19, 2017
@deniederhut deniederhut force-pushed the fix/setting-into-attribute branch from 06af1a1 to cf2abfd Compare July 19, 2017 12:09
@deniederhut
Copy link
Contributor Author

Okay, I think I've gotten the docs correct

@deniederhut deniederhut force-pushed the fix/setting-into-attribute branch 2 times, most recently from e7d7def to d8c1faa Compare July 29, 2017 16:48
result._set_is_copy(self, copy=is_copy)
return result

def _set_item(self, key, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a single-point-of-contact that all (most? many?) setter methods go through? i.e. will the various loc.__setitem__ paths eventually wind through here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my understanding, yes. Here is an example of setting while using .loc:

import pandas as pd
df = pd.DataFrame({'one': [0, 1, 2]})
df.loc[:, 'sum'] = df.one.sum()
UserWarning: Column name 'sum' collides with a built-in method, which will cause unexpected attribute behavior
  self._set_item(key, value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this point is only for [] setting, e.g. setting a column on a DF or an element on a Series. .loc/.iloc are handled in core/indexing.py

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some doc comments. lgtm. ping on green for a final look.

.. _whatsnew_0210.enhancements.column-creation:

Improved warnings when attempting to create columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the underline length must match the title.

1 2.0 7.0
2 3.0 9.0>

Both of these now raise a ``UserWarning`` about the potential for unexpected behavior. See `Attribute Access <https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :ref:`Attribute Access <indexing.attribute_access>`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the section in indexing.rst

return result

def _set_item(self, key, value):
if isinstance(key, str) and callable(getattr(self, key, None)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use compat.string_types rather than str here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string_types causes failures in Python2 when creating columns that have unicode characters in their names. See c90aa22.

df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes

.. _whatsnew_0210.enhancements.column-creation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this attribute_access (rather than column-creation)

As part of warning check, object type of potential attributes was checked
for subtypes of pd.compat.str_types before being checked for overlap with
methods defined on ndframes. This causes decode errors in Python2 when
users attempt to add columns with unicode column names. Fix is to compare
against `str`.
@deniederhut deniederhut force-pushed the fix/setting-into-attribute branch from d8c1faa to b86546e Compare August 4, 2017 12:16
@deniederhut
Copy link
Contributor Author

@jreback I think we might be ready for the final check

@jreback jreback merged commit 7cc0fac into pandas-dev:master Aug 7, 2017
@jreback
Copy link
Contributor

jreback commented Aug 7, 2017

thanks @deniederhut nice changes!

@jreback
Copy link
Contributor

jreback commented Aug 8, 2017

@deniederhut
see https://travis-ci.org/MacPython/pandas-wheels/jobs/262048673

this is a 32-bit python 2.7 build; there are some warnings at the end. See if you can repro (you can try using a 64-bit python 2.7 build)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 8, 2017

I have some general comments on this PR (sorry for being late with this feedback):

  • I like that you now get a warning when trying to create a new column with df.col_name = [..] (which silently fails at creating a new column), that's very useful! However, the case when the name you try to set is a builtin, eg df.mean = [...] doesn't raise a warning.
    I think it would be nice to add a warning then as well (as it is actually a more harmful case, it is not only silently failing, but also overwriting your mean() method)

  • IMO the warning when setting a new column with df['new_col'] = .. which has the same name as a builtin method is a bit overkill. I understand it is confusing from time to time that the attribute access does not always work, but IMO there are many more people who use column names like mean without any problem that would now get warnings (one example that this is common is applying functions in groupby operations that create such warnings like s.groupby(..).agg(['mean', 'median'])). So I am -1 on this change.
    See also comment above about df.mean = .. not raising a warning while df['mean] = .. does, but that's something else as it is actually overwriting the method (and this can also be easily fixed to raise I think).
    Last, if we keep this warning, I think we should also do it for attributes and not only methods? (eg currently df['shape'] does not raise a warning)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also one specific code comment

except (AttributeError, TypeError):
if isinstance(self, ABCDataFrame) and (is_list_like(value)):
warnings.warn("Pandas doesn't allow Series to be assigned "
"into nonexistent columns - see "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this a bit more generic, as it does not only raise for Series objects (and I also find it a bit confusing, as pandas certainly allows assigning Series objects, just with another syntax).

Maybe something like "pandas doesn't allow to add a new column using attribute access" ? (can certainly be improved further)

@deniederhut
Copy link
Contributor Author

Happy to dig into this some more - two questions:

  1. What is the consensus among the maintainers about warning on name collision?
  2. Should this be submitted as a separate PR?

@jreback
Copy link
Contributor

jreback commented Aug 12, 2017

new PR

@shoyer
Copy link
Member

shoyer commented Aug 12, 2017

I also just saw this PR. I agree with @jorisvandenbossche we should not raise warnings when setting column names that conflict with built-in methods. This would be a major source of annoyance for users.

I think we're pretty clear that attribute style access for columns is a convenience feature, not something to be relied on for arbitrary names. In contrast, we do guarantee that __getitem__ works for any valid column name.

@jorisvandenbossche
Copy link
Member

@deniederhut do you want to do a PR for this?

@jreback what do you think about the concerns of @shoyer and me? Fine with changing this part of the PR back?

@deniederhut
Copy link
Contributor Author

deniederhut commented Aug 16, 2017 via email

@jorisvandenbossche
Copy link
Member

@deniederhut cool!
I also just opened #17268 to keep track of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compat pandas objects compatability with Numpy or Python functions Error Reporting Incorrect or improved errors from pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GOTCHA: You can't use dot notation to add a column to a DataFrame Suggestion: Warning for DataFrame colname-methodname clash

7 participants