-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
API: column ordering on get_dummies
#12010
#17612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What would "preserve the order" mean here? Could you show an example?
I don't know if we have a keyword like this anywhere else in the library,
so this would be a bit unusual. However, given that the function knows the
names of the newly created columns, while the user might not, this may be
worth adding.
…On Thu, Sep 21, 2017 at 7:41 AM, Giftlin Rajaiah ***@***.***> wrote:
Using get_dummies is moving the columns to the end. What @jreback
<https://github.com/jreback> and @TomAugspurger
<https://github.com/tomaugspurger> have commented is right. But there are
situations in which we require to preserve the order. For example, in scipy
model creation algorithms, the functions give preference for the columns
based on the order of the columns. We are having to reorder the columns
explicitly.
So, I guess we must atleast have an option to preserve the order. Let the
default be as the new columns to be at the end.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17612>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIuXv87CLlh6HfxSbzyRCdqxEX0ygks5sklmUgaJpZM4PfQjl>
.
|
categoricals already allow you to provide an order (note that these are not stricly And this works as expected.
|
@jreback no, I'm not talking about column values here. It is the column I am talking about. |
|
I think that example only works here since My initial reaction was to just have the user to def get_order(df):
order = []
for col in df.columns:
if pd.api.types.is_categorical(df[col]):
order.extend(['{}_{}'.format(col, val)
for val in df[col].cat.categories])
else:
order.append(col)
return order That doesn't handle object types, but it wouldn't be hard to fix it to do that. We could start with a cookbook recipe? And expand to a keyword argument if others want it? This is on the borderline of whether or not it's worth a keyword to me. |
You can also do something like this to preserve the exact order:
Only the |
This needs to be reopened. @jreback's response is irrelevant. What @Giftlin proposed is inserting new dummy columns where categorical column used to be, rather than appending them to the end. It has nothing to do with order of items within categoricals. I would even go as far as saying users probably expect column orders to be preserved when using |
@jreback you totally misunderstood what is going on. Please reopen this issue. |
Uh oh!
There was an error while loading. Please reload this page.
Using get_dummies is moving the columns to the end. What @jreback and @TomAugspurger have commented is right. But there are situations in which we require to preserve the order. For example, in scipy model creation algorithms, the functions give preference for the columns based on the order of the columns. We are having to reorder the columns explicitly. I get it that it is not pandas' concern. But it will be better if we have an option to preserve.
So, I guess we must atleast have an option to preserve the order. Let the default be as the new columns to be at the end.
The text was updated successfully, but these errors were encountered: