-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: Add warning when setting into nonexistent attribute #16951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add warning when setting into nonexistent attribute #16951
Conversation
|
Hello @deniederhut! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 04, 2017 at 12:16 Hours UTC |
doc/source/indexing.rst
Outdated
| You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; | ||
| if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a | ||
| if you try to use attribute access to create a new column, it issues a `UserWarning` and creates a new attribute rather than a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double-backticks around UserWarning
| if (self.ndim > 1) and (is_list_like(value)): | ||
| warnings.warn("Pandas doesn't allow Series to be assigned " | ||
| "into nonexistent columns - see " | ||
| "https://pandas.pydata.org/pandas-docs/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need a stacklevel=2 (or maybe higher here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed -- not sure why GH isn't picking it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use if isinstance(self, ABCDataFrame) and is_list_like(value):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change here
pandas/tests/dtypes/test_generic.py
Outdated
| with catch_warnings(record=True) as w: | ||
| self.series.not_an_index = [1, 2] | ||
| assert len(w) == 0 # fail if false warning on Series | ||
| with pytest.warns(UserWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_raises_warning
pandas/tests/dtypes/test_generic.py
Outdated
| assert isinstance(self.sparse_array, gt.ABCSparseArray) | ||
| assert isinstance(self.categorical, gt.ABCCategorical) | ||
| assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod) | ||
| with catch_warnings(record=True) as w: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you testing here?
pandas/tests/dtypes/test_generic.py
Outdated
| assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod) | ||
| with catch_warnings(record=True) as w: | ||
| self.series.not_an_index = [1, 2] | ||
| assert len(w) == 0 # fail if false warning on Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to a new test add the issue number as a comment
|
Replicate this test from the original issue (and assert each of these things). Here is the warning.IOW assignment with a pandas object to a non-existing attribute (actually would be ok with this raising I think), though this might break people downstream. |
|
also would love to warn on this: #5904 as well. |
|
Wilco. Still in |
yeah that seems reasonable |
Codecov Report
@@ Coverage Diff @@
## master #16951 +/- ##
==========================================
- Coverage 90.98% 90.96% -0.02%
==========================================
Files 161 161
Lines 49288 49290 +2
==========================================
- Hits 44846 44839 -7
- Misses 4442 4451 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16951 +/- ##
==========================================
- Coverage 90.99% 90.98% -0.02%
==========================================
Files 162 162
Lines 49508 49512 +4
==========================================
- Hits 45052 45047 -5
- Misses 4456 4465 +9
Continue to review full report at Codecov.
|
doc/source/indexing.rst
Outdated
| You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; | ||
| if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a | ||
| if you try to use attribute access to create a new column, it issues a ```UserWarning`` and creates a new attribute rather than a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a: starting in 0.21.0
pandas/core/generic.py
Outdated
| def _set_item(self, key, value): | ||
| if callable(getattr(self, key, None)): | ||
| warnings.warn("Pandas doesn't allow attribute-like access to " | ||
| "columns whose names collide with methods", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say that this is a collision
we actually DO allow it (maybe should raise but let's start with a warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning says that attribute-like access is not allowed. E.g.
df = pd.DataFrame({'a': [1, 2]})
df['sum'] = [3, 4]
print(df.sum)<bound method DataFrame.sum of a sum
0 1 3
1 2 4>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just want to see a more verbose message that this is not recommended
we could also raise but that's a bit harsh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay... how about something like:
Column name '{key}' collides with a built-in method, which will cause unexpected attribute behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
pandas/tests/dtypes/test_generic.py
Outdated
| df = pd.DataFrame({'names': ['a', 'b', 'c']}, index=multi_index) | ||
| sparse_series = pd.Series([1, 2, 3]).to_sparse() | ||
| sparse_array = pd.SparseArray(np.random.randn(10)) | ||
| series = pd.Series([1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u can remove this
pandas/tests/dtypes/test_generic.py
Outdated
| assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod) | ||
|
|
||
|
|
||
| class TestABCWarnings(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u don't need a class here
u can just do this as a function (more pytest idiomatic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see anything in the contributing guide about test style, so I was following the style of the code nearby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/dtypes/test_generic.py
Outdated
| self.df['three'] = self.df.two + 1 | ||
| assert len(w) == 0 | ||
| assert self.df.three.sum() > self.df.two.sum() | ||
| with catch_warnings(record=True) as w: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put blank lines in between these
pandas/tests/dtypes/test_generic.py
Outdated
| self.df.four = self.df.two + 2 | ||
| with tm.assert_produces_warning(UserWarning): | ||
| # warn when column has same name as method | ||
| self.df['sum'] = self.df.two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test with self.df.sum as well
pandas/core/generic.py
Outdated
| return result | ||
|
|
||
| def _set_item(self, key, value): | ||
| if callable(getattr(self, str(key), None)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if its a string first, e.g.
if isinstance(key, compat.string_type) and callable(getattr(self, key, None))):
......
pandas/tests/dtypes/test_generic.py
Outdated
| df = pd.DataFrame(d) | ||
|
|
||
| with catch_warnings(record=True) as w: | ||
| # successfully add new column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a statement that the first 3 should NOT be showing warnigns (I know you are checking which is fine)
|
@deniederhut looks pretty good. rebase on master, some changes and you prob need to force push. |
|
also needs a whatsnew note; put this in the api-breaking changes section. I think we may need a full on example to highlite this. |
|
Wilco. And I might be missing something, but we are just issuing warnings, right? This shouldn't break any clients unless they are configured to raise warnings as exceptions? |
|
yeah isn't it just a warning then no big deal however I think we may want to consider making this an error in the future (so can make an issue for that) |
|
Would you like me to add to the whatsnew that these warnings are subject to escalation in future releases? |
|
no but pls open an issue to re-evaluate in the future |
6216954 to
84f04d9
Compare
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs look good! some small changes.
doc/source/whatsnew/v0.21.0.txt
Outdated
| In[1]: df = pd.DataFrame({'one': [1., 2., 3.]}) | ||
| In[2]: df.two = [4, 5, 6] | ||
|
|
||
| which does not raise any obvious exceptions, but also does not create a new column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which -> This
doc/source/whatsnew/v0.21.0.txt
Outdated
| 1 2.0 | ||
| 2 3.0 | ||
|
|
||
| and creating a column whose name collides with a method or attribute already in the instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a column (use sentences if possible here).
doc/source/whatsnew/v0.21.0.txt
Outdated
| .. code-block:: ipython | ||
|
|
||
| In[2]: df.two = [4, 5, 6] | ||
| UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this whole section in indexing.rst (below the attribute access section). Put a reference to it from the whatsnew (you can leave some of it here as well; usually we make the whatsnew a smaller version of what we put in the docs proper).
| if (self.ndim > 1) and (is_list_like(value)): | ||
| warnings.warn("Pandas doesn't allow Series to be assigned " | ||
| "into nonexistent columns - see " | ||
| "https://pandas.pydata.org/pandas-docs/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change here
06af1a1 to
cf2abfd
Compare
|
Okay, I think I've gotten the docs correct |
e7d7def to
d8c1faa
Compare
| result._set_is_copy(self, copy=is_copy) | ||
| return result | ||
|
|
||
| def _set_item(self, key, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a single-point-of-contact that all (most? many?) setter methods go through? i.e. will the various loc.__setitem__ paths eventually wind through here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is my understanding, yes. Here is an example of setting while using .loc:
import pandas as pd
df = pd.DataFrame({'one': [0, 1, 2]})
df.loc[:, 'sum'] = df.one.sum()UserWarning: Column name 'sum' collides with a built-in method, which will cause unexpected attribute behavior
self._set_item(key, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this point is only for [] setting, e.g. setting a column on a DF or an element on a Series. .loc/.iloc are handled in core/indexing.py
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some doc comments. lgtm. ping on green for a final look.
doc/source/whatsnew/v0.21.0.txt
Outdated
| .. _whatsnew_0210.enhancements.column-creation: | ||
|
|
||
| Improved warnings when attempting to create columns | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the underline length must match the title.
doc/source/whatsnew/v0.21.0.txt
Outdated
| 1 2.0 7.0 | ||
| 2 3.0 9.0> | ||
|
|
||
| Both of these now raise a ``UserWarning`` about the potential for unexpected behavior. See `Attribute Access <https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use :ref:`Attribute Access <indexing.attribute_access>`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the section in indexing.rst
| return result | ||
|
|
||
| def _set_item(self, key, value): | ||
| if isinstance(key, str) and callable(getattr(self, key, None)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use compat.string_types rather than str here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string_types causes failures in Python2 when creating columns that have unicode characters in their names. See c90aa22.
doc/source/whatsnew/v0.21.0.txt
Outdated
| df['C'] = pd.to_numeric(df['C'], errors='coerce') | ||
| df.dtypes | ||
|
|
||
| .. _whatsnew_0210.enhancements.column-creation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this attribute_access (rather than column-creation)
As part of warning check, object type of potential attributes was checked for subtypes of pd.compat.str_types before being checked for overlap with methods defined on ndframes. This causes decode errors in Python2 when users attempt to add columns with unicode column names. Fix is to compare against `str`.
d8c1faa to
b86546e
Compare
|
@jreback I think we might be ready for the final check |
|
thanks @deniederhut nice changes! |
|
@deniederhut this is a 32-bit python 2.7 build; there are some warnings at the end. See if you can repro (you can try using a 64-bit python 2.7 build) |
|
I have some general comments on this PR (sorry for being late with this feedback):
|
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also one specific code comment
| except (AttributeError, TypeError): | ||
| if isinstance(self, ABCDataFrame) and (is_list_like(value)): | ||
| warnings.warn("Pandas doesn't allow Series to be assigned " | ||
| "into nonexistent columns - see " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write this a bit more generic, as it does not only raise for Series objects (and I also find it a bit confusing, as pandas certainly allows assigning Series objects, just with another syntax).
Maybe something like "pandas doesn't allow to add a new column using attribute access" ? (can certainly be improved further)
|
Happy to dig into this some more - two questions:
|
|
new PR |
|
I also just saw this PR. I agree with @jorisvandenbossche we should not raise warnings when setting column names that conflict with built-in methods. This would be a major source of annoyance for users. I think we're pretty clear that attribute style access for columns is a convenience feature, not something to be relied on for arbitrary names. In contrast, we do guarantee that |
|
@deniederhut do you want to do a PR for this? @jreback what do you think about the concerns of @shoyer and me? Fine with changing this part of the PR back? |
|
I'll take a look this weekend
…On Wed, Aug 16, 2017, 08:13 Joris Van den Bossche ***@***.***> wrote:
@deniederhut <https://github.com/deniederhut> do you want to do a PR for
this?
@jreback <https://github.com/jreback> what do you think about the
concerns of @shoyer <https://github.com/shoyer> and me? Fine with
changing this part of the PR back?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16951 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGjNrIJP7vd57k2d7VACiSiUdIES21mXks5sYurrgaJpZM4OZEMp>
.
|
|
@deniederhut cool! |
git diff upstream/master -u -- "*.py" | flake8 --diff