-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Allow assignment by indexing with duplicate column names #12498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
' Same': np.nan}, | ||
index=[0, 1, 2]) | ||
|
||
df.columns = [c.strip() for c in df.columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a more directly method for constructing this test case? I found this setup very confusing (using index to do implicit broadcasting across rows).
For the record, this is the resulting dataframe:
Same Same Same
0 NaN NaN 1
1 NaN NaN 1
2 NaN NaN 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Would you mind also cancelling my first Travis build then (it hasn't started yet but fixing this test added a new build)?
d550378
to
e070e80
Compare
I canceled https://travis-ci.org/pydata/pandas/builds/112754541, hopefully that was the correct one. |
@TomAugspurger : Yep, that's the right one! The one on top should always be the latest build. Thanks! |
c11051a
to
8ff3e9a
Compare
you are adding WAY too much machinery. This is a pretty simple fix, though I don't have time at the moment to look. |
this should be handled in internals. |
@jreback : yes, it is currently being handled there, but I probably took a slightly more circuitous root than needed to get there. I'll see what I can simplify. |
8ff3e9a
to
027069e
Compare
Simplified the internal machinery to route directly to |
still needs some work. you are adding function. use existing functionaility. This just makes it much harder on future readers. |
@jreback : I don't quite see how the existing functionality can properly handle duplicate columns in this context IMOH. The new functionality I have added is just five lines of code here. The rest of it is refactoring. Besides essentially taking the internals of the function I wrote and placing them in the location where the function is called, I'm not sure what else you mean by "using existing functionality" |
2a4a4ec
to
08bef91
Compare
If some could cancel this build #18021 (it's an old build), that would be great. Thanks! |
this still needs work. too much specific if/thening |
What do you mean "too much specific if/thening"? |
index = indexer[info_axis] | ||
target = self.obj[item_labels[index]] | ||
|
||
# Duplicate columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too much specific if thening. you are trying to catch a very specific case and not solving it in a more general way.
This is a more idiomatic way of creating the data
also need to test mixed setting (e.g. use int/float/string block) The |
698805f
to
fa6a78a
Compare
fa6a78a
to
7265d29
Compare
@jreback : Alright, I definitely did not understand what you meant by "using existing functionality". Thanks for the patch! I think I was too focused on resolving the issue within that block itself and didn't quite pay attention to the other conditionals listed underneath it. |
@jreback : Tests are passing with the patch you provided. Should be good to merge now. |
thanks |
closes #12344
in which assignment to columns in
DataFrame
with duplicate column names caused all columns with the same name to be reassigned. The bug was located here.When you try to index into the DataFrame using
.iloc
,pandas
will find the corresponding column name (or key) first before setting that key with the given value. Unfortunately, since all of your columns have the same key name,pandas
ends up choosing all of the columns corresponding to that name.This PR introduces a
_set_item_by_index
method forDataFrame
objects that allows you to bypass that issue by using the indices of the columns to set the columns themselves whenever there are duplicates involved.